Project

General

Profile

Actions

Bug #17373

open

Ruby 3.0 is slower at Discourse bench than Ruby 2.7

Added by sam.saffron (Sam Saffron) 5 months ago. Updated 4 months ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:101276]

Description

We have a continuous effort to keep https://rubybench.org/ up to date. Unfortunately we have lapsed a bit on updates so I decided to run Discourse bench by hand.

On my local system I am noticing the timing are much slower on Discourse bench than they were on Ruby 2.7:

98c9120cc00ba691b4abcc13a49a30fa54638535

---
categories:
  50: 34
  75: 35
  90: 36
  99: 40
home:
  50: 47
  75: 50
  90: 56
  99: 92
topic:
  50: 53
  75: 55
  90: 58
  99: 92
categories_admin:
  50: 33
  75: 34
  90: 38
  99: 68
home_admin:
  50: 45
  75: 47
  90: 51
  99: 95
topic_admin:
  50: 51
  75: 52
  90: 54
  99: 59
timings:
  load_rails: 1449
ruby-version: 3.0.0-p-1
rss_kb: 293808
pss_kb: 284244

2.7.2


---
categories:
  50: 29
  75: 30
  90: 33
  99: 43
home:
  50: 42
  75: 43
  90: 47
  99: 73
topic:
  50: 47
  75: 47
  90: 52
  99: 54
categories_admin:
  50: 30
  75: 31
  90: 36
  99: 56
home_admin:
  50: 41
  75: 43
  90: 48
  99: 71
topic_admin:
  50: 47
  75: 48
  90: 52
  99: 55
timings:
  load_rails: 1339
ruby-version: 2.7.2-p137
rss_kb: 296836
pss_kb: 287310


2.6.6


---
categories:
  50: 30
  75: 30
  90: 31
  99: 51
home:
  50: 40
  75: 42
  90: 60
  99: 79
topic:
  50: 47
  75: 48
  90: 53
  99: 71
categories_admin:
  50: 30
  75: 31
  90: 35
  99: 50
home_admin:
  50: 41
  75: 43
  90: 62
  99: 100
topic_admin:
  50: 48
  75: 49
  90: 65
  99: 93
timings:
  load_rails: 1464
ruby-version: 2.6.6-p146
rss_kb: 332268
pss_kb: 322865

Concretely this means a typical "homepage" view is taking 47 milliseconds now, when in the past it was taking 42 or even 40 ms.

There may be some issues with the bench, you can see it here:

https://github.com/discourse/discourse/tree/ruby-3

The script to run the bench is ruby script/bench.rb

I re-tested against a cleanly built Ruby 2.7 direct from the Git branch to ensure compilation was not at fault.

This is pretty concerning, Discourse bench usually tracks Rails performance pretty accurately.


Files


Related issues

Related to Ruby master - Feature #16614: New method cache mechanism for GuildClosedko1 (Koichi Sasada)Actions
Related to Ruby master - Bug #17383: 3.0 recursion memory|speed issuesOpenActions

Updated by k0kubun (Takashi Kokubun) 5 months ago

Could you, or anybody reading this ticket, use the benchmark to find revisions which seem to have contributed to the performance decrease? Detecting which change was bad has been the most time-consuming part even while we have a good benchmark. If we know what revision should be fixed, at the very least we'd have an option to revert it.

I'm aware of the performance trend and I've actually spent some time for looking at graphs like https://rubybench.org/ruby/ruby/commits?result_type=Optcarrot%20Lan_Master.nes&display_count=2000, but when we benchmark a suspicious revision and its previous one found from the graph locally at the same time, results are often flipped and we've failed to find problematic commits that way.

Updated by tenderlovemaking (Aaron Patterson) 5 months ago

Here is what I've found so far:

I've been testing with RailsBench. I ran this command:

RAILS_ENV=production INTERVAL=100 WARMUP=1 BENCHMARK=10000 perf record -F 99 --call-graph dwarf (which ruby) bin/bench

Then I used this command to find where time is spent:

perf report -g graph --no-children

Here is the report on Ruby 2.7:

Ruby 2.7 report

Here is the report on Ruby 3.0 (master):

Ruby 3.0 report

rb_id_table_lookup is a new line in the Ruby 3.0 report. Looking at the callers, it looks like these are coming from the slow path method lookup:

Slow path lookup

b9007b6c54 touched these methods, so I ran the benchmarks on that commit:

aaron@whiteclaw ~/g/railsbench (ruby-3.0)> RAILS_ENV=production INTERVAL=100 WARMUP=1 BENCHMARK=10000 ruby -v bin/bench
ruby 2.8.0dev (2020-02-22T00:58:59Z pz-mvh-updated-mas.. b9007b6c54) [x86_64-linux]
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32454: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32615: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32651: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32740: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32756: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32822: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32863: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32888: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:31984: warning: assigned but unused variable - testEof
/home/aaron/.gem/ruby/2.7.3/gems/sassc-rails-2.1.2/lib/sassc/rails/functions.rb:7: warning: method redefined; discarding old asset_data_url
/home/aaron/.gem/ruby/2.7.3/gems/sprockets-4.0.1/lib/sprockets/sass_processor.rb:274: warning: previous definition of asset_data_url was here
/home/aaron/.gem/ruby/2.7.3/gems/sassc-rails-2.1.2/lib/sassc/rails/compressor.rb:7: warning: method redefined; discarding old initialize
/home/aaron/.gem/ruby/2.7.3/gems/sprockets-4.0.1/lib/sprockets/sass_compressor.rb:39: warning: previous definition of initialize was here
/home/aaron/.gem/ruby/2.7.3/gems/sassc-rails-2.1.2/lib/sassc/rails/compressor.rb:17: warning: method redefined; discarding old call
/home/aaron/.gem/ruby/2.7.3/gems/sprockets-4.0.1/lib/sprockets/sass_compressor.rb:49: warning: previous definition of call was here
Warming up...
Warmup: 1 requests
Benchmark: 10000 requests

Request per second: 993.5 [#/s] (mean)

Percentage of the requests served within a certain time (ms)
  50%    0.98
  66%    0.99
  75%    0.99
  80%    1.00
  90%    1.01
  95%    1.03
  98%    1.60
  99%    2.14
 100%   10.14

Mean request per second is 993.5.

If I run with the previous commit f2286925f0 here are the results:

aaron@whiteclaw ~/g/railsbench (ruby-3.0)> RAILS_ENV=production INTERVAL=100 WARMUP=1 BENCHMARK=10000 ruby -v bin/bench
ruby 2.8.0dev (2020-02-22T00:58:59Z pz-mvh-updated-mas.. f2286925f0) [x86_64-linux]
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32454: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32615: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32651: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32740: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32756: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32822: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32863: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32888: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:31984: warning: assigned but unused variable - testEof
/home/aaron/.gem/ruby/2.7.3/gems/sassc-rails-2.1.2/lib/sassc/rails/functions.rb:7: warning: method redefined; discarding old asset_data_url
/home/aaron/.gem/ruby/2.7.3/gems/sprockets-4.0.1/lib/sprockets/sass_processor.rb:274: warning: previous definition of asset_data_url was here
/home/aaron/.gem/ruby/2.7.3/gems/sassc-rails-2.1.2/lib/sassc/rails/compressor.rb:7: warning: method redefined; discarding old initialize
/home/aaron/.gem/ruby/2.7.3/gems/sprockets-4.0.1/lib/sprockets/sass_compressor.rb:39: warning: previous definition of initialize was here
/home/aaron/.gem/ruby/2.7.3/gems/sassc-rails-2.1.2/lib/sassc/rails/compressor.rb:17: warning: method redefined; discarding old call
/home/aaron/.gem/ruby/2.7.3/gems/sprockets-4.0.1/lib/sprockets/sass_compressor.rb:49: warning: previous definition of call was here
Warming up...
Warmup: 1 requests
Benchmark: 10000 requests

Request per second: 1164.9 [#/s] (mean)

Percentage of the requests served within a certain time (ms)
  50%    0.85
  66%    0.85
  75%    0.85
  80%    0.86
  90%    0.86
  95%    0.87
  98%    0.89
  99%    1.76
 100%    3.44

Mean RPS is 1164.

I think b9007b6c54 introduced the performance problem, but I'm not sure why. The caller lookup_method_table didn't change, it just got moved from here to here, and rb_id_table_lookup is exactly the same too.

My current guess is that we're getting cache misses more frequently for some reason. I will look in to that now, but I wanted to share what I've found so far.

Updated by tenderlovemaking (Aaron Patterson) 5 months ago

I backported the mc_search debug counter. On Ruby master, search_method is called 2,480,052 times, but on f2286925f08406bc857f7b03ad6779a5d61443ae it's only called 501,903 times. I think this is the source of our performance degradation, and it seems related to cache misses.

Actions #4

Updated by Eregon (Benoit Daloze) 5 months ago

  • Related to Feature #16614: New method cache mechanism for Guild added

Updated by tenderlovemaking (Aaron Patterson) 5 months ago

It looks like the global method cache was removed.

On f2286925f0 the debug counters looked like this:

[RUBY_DEBUG_COUNTER]    mc_inline_hit                       54,445,333
[RUBY_DEBUG_COUNTER]    mc_inline_miss                       1,800,036
[RUBY_DEBUG_COUNTER]    mc_global_hit                       14,489,677
[RUBY_DEBUG_COUNTER]    mc_global_miss                         790,656

mc_global_hit is the number of times the global method cache was hit. I think for Rails, the problem is with respond_to. It looks like Object#respond_to? used to use the global method cache:

callers

Here is a benchmark that demonstrates the impact of removing the global method cache:

require "benchmark/ips"

class Base
  def bar; end
end

class Foo
  30.times do |i|
    class_eval "module M#{i}; end; include M#{i}"
  end
end

foo = Foo.new

Benchmark.ips do |x|
  x.report("respond to") { foo.respond_to?(:bar) }
end

On master:

aaron@whiteclaw ~/git> ruby/ruby -v test.rb
ruby 3.0.0dev (2020-12-07T17:10:03Z master 146b3695c0) [x86_64-linux]
Warming up --------------------------------------
          respond to   585.997k i/100ms
Calculating -------------------------------------
          respond to      5.869M (± 0.9%) i/s -     29.886M in   5.092708s

On f2286925f0:

aaron@whiteclaw ~/git> ruby/ruby -v test.rb
ruby 2.8.0dev (2020-02-22T00:58:59Z before-imc f2286925f0) [x86_64-linux]
Warming up --------------------------------------
          respond to     1.686M i/100ms
Calculating -------------------------------------
          respond to     16.888M (± 0.6%) i/s -     85.971M in   5.090695s

f2286925f0 is about 3x faster than master.

It seems that the global method cache was important for Rails benchmarks. I guess we can't have a global method cache because of Ractors? Would it be possible to make a per-Ractor "global" cache?

Updated by k0kubun (Takashi Kokubun) 5 months ago

https://github.com/ruby/ruby/pull/2583 was an attempt to reduce method cache misses on Discourse, and IIRC https://github.com/ruby/ruby/pull/2888 removed it. It might be related at least for some extent.

Updated by sam.saffron (Sam Saffron) 5 months ago

I made this tiny script, which you can use to test (in the branch in the first post)

#!/usr/bin/env ruby
# frozen_string_literal: true

ENV['RAILS_ENV'] = 'profile'

require File.expand_path("../../config/environment", __FILE__)

def make_request
  req = {
    "REQUEST_METHOD" => "GET",
    "SCRIPT_NAME" => "",
    "PATH_INFO" => "/",
    "QUERY_STRING" => "",
    "SERVER_NAME" => "127.0.0.1",
    "SERVER_PORT" => "80",
    "REQUEST_PATH" => "/",
    "REMOTE_ADDR" => "127.0.0.1",
    "HTTP_VERSION" => "HTTP/1.0",
    "rack.input" => StringIO.new(""),
    "rack.version" => [1, 2],
    "rack.url_scheme" => "http"
  }

  Rails.application.call(req)
end

10.times do
  make_request
end

3.times do
  GC.start(full_mark: true, immediate_sweep: true)
end

start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
RubyVM.reset_debug_counters
make_request
RubyVM.show_debug_counters
elapsed = Process.clock_gettime(Process::CLOCK_MONOTONIC) - start

puts "Duration was #{elapsed}"
gets

Results for 2.7 are:

[RUBY_DEBUG_COUNTER]    417420 method call
[RUBY_DEBUG_COUNTER]    mc_inline_hit                         407,308
[RUBY_DEBUG_COUNTER]    mc_inline_miss                         17,502
[RUBY_DEBUG_COUNTER]    mc_global_hit                          86,554
[RUBY_DEBUG_COUNTER]    mc_global_miss                            207
[RUBY_DEBUG_COUNTER]    mc_global_state_miss                    2,762
[RUBY_DEBUG_COUNTER]    mc_class_serial_miss                   14,740
[RUBY_DEBUG_COUNTER]    mc_cme_complement                      10,191
[RUBY_DEBUG_COUNTER]    mc_cme_complement_hit                  10,191
[RUBY_DEBUG_COUNTER]    mc_search_super                         2,770
[RUBY_DEBUG_COUNTER]    mc_miss_by_nome                             0
[RUBY_DEBUG_COUNTER]    mc_miss_by_distinct                    10,348
[RUBY_DEBUG_COUNTER]    mc_miss_by_refine                           0
[RUBY_DEBUG_COUNTER]    mc_miss_by_visi                           218
[RUBY_DEBUG_COUNTER]    mc_miss_spurious                        6,936
[RUBY_DEBUG_COUNTER]    mc_miss_reuse_call                      2,245
[RUBY_DEBUG_COUNTER]    ccf_general                            16,704
[RUBY_DEBUG_COUNTER]    ccf_iseq_setup                         43,129
[RUBY_DEBUG_COUNTER]    ccf_iseq_setup_0start                   1,055
[RUBY_DEBUG_COUNTER]    ccf_iseq_setup_tailcall_0start              0
[RUBY_DEBUG_COUNTER]    ccf_iseq_fix                          134,778
[RUBY_DEBUG_COUNTER]    ccf_iseq_opt                           14,039
[RUBY_DEBUG_COUNTER]    ccf_iseq_kw1                              124
[RUBY_DEBUG_COUNTER]    ccf_iseq_kw2                              195
[RUBY_DEBUG_COUNTER]    ccf_cfunc                             139,250
[RUBY_DEBUG_COUNTER]    ccf_ivar                               51,208
[RUBY_DEBUG_COUNTER]    ccf_attrset                               698
[RUBY_DEBUG_COUNTER]    ccf_method_missing                        524
[RUBY_DEBUG_COUNTER]    ccf_zsuper                                  0
[RUBY_DEBUG_COUNTER]    ccf_bmethod                            18,238
[RUBY_DEBUG_COUNTER]    ccf_opt_send                            8,025
[RUBY_DEBUG_COUNTER]    ccf_opt_call                              857
[RUBY_DEBUG_COUNTER]    ccf_opt_block_call                          0
[RUBY_DEBUG_COUNTER]    ccf_super_method                       13,155
[RUBY_DEBUG_COUNTER]    frame_push                            392,112
[RUBY_DEBUG_COUNTER]    frame_push_method                     193,320
[RUBY_DEBUG_COUNTER]    frame_push_block                       35,436
[RUBY_DEBUG_COUNTER]    frame_push_class                            0
[RUBY_DEBUG_COUNTER]    frame_push_top                              0
[RUBY_DEBUG_COUNTER]    frame_push_cfunc                      159,590
[RUBY_DEBUG_COUNTER]    frame_push_ifunc                        3,764
[RUBY_DEBUG_COUNTER]    frame_push_eval                             0
[RUBY_DEBUG_COUNTER]    frame_push_rescue                           2
[RUBY_DEBUG_COUNTER]    frame_push_dummy                            0
[RUBY_DEBUG_COUNTER]    frame_R2R                             204,735
[RUBY_DEBUG_COUNTER]    frame_R2C                             148,739
[RUBY_DEBUG_COUNTER]    frame_C2C                              14,615
[RUBY_DEBUG_COUNTER]    frame_C2R                              24,023
[RUBY_DEBUG_COUNTER]    ivar_get_ic_hit                       105,173
[RUBY_DEBUG_COUNTER]    ivar_get_ic_miss                       20,240
[RUBY_DEBUG_COUNTER]    ivar_get_ic_miss_serial                28,221
[RUBY_DEBUG_COUNTER]    ivar_get_ic_miss_unset                 11,963
[RUBY_DEBUG_COUNTER]    ivar_get_ic_miss_noobject                   0
[RUBY_DEBUG_COUNTER]    ivar_set_ic_hit                        10,802
[RUBY_DEBUG_COUNTER]    ivar_set_ic_miss                       13,473
[RUBY_DEBUG_COUNTER]    ivar_set_ic_miss_serial                 8,262
[RUBY_DEBUG_COUNTER]    ivar_set_ic_miss_unset                      4
[RUBY_DEBUG_COUNTER]    ivar_set_ic_miss_oorange                3,924
[RUBY_DEBUG_COUNTER]    ivar_set_ic_miss_noobject               1,283
[RUBY_DEBUG_COUNTER]    ivar_get_base                          21,572
[RUBY_DEBUG_COUNTER]    ivar_set_base                          13,494
[RUBY_DEBUG_COUNTER]    lvar_get                              355,050
[RUBY_DEBUG_COUNTER]    lvar_get_dynamic                       40,160
[RUBY_DEBUG_COUNTER]    lvar_set                               78,325
[RUBY_DEBUG_COUNTER]    lvar_set_dynamic                        1,252
[RUBY_DEBUG_COUNTER]    lvar_set_slowpath                         101
[RUBY_DEBUG_COUNTER]    gc_count                                    0
[RUBY_DEBUG_COUNTER]    gc_minor_newobj                             0
[RUBY_DEBUG_COUNTER]    gc_minor_malloc                             0
[RUBY_DEBUG_COUNTER]    gc_minor_method                             0
[RUBY_DEBUG_COUNTER]    gc_minor_capi                               0
[RUBY_DEBUG_COUNTER]    gc_minor_stress                             0
[RUBY_DEBUG_COUNTER]    gc_major_nofree                             0
[RUBY_DEBUG_COUNTER]    gc_major_oldgen                             0
[RUBY_DEBUG_COUNTER]    gc_major_shady                              0
[RUBY_DEBUG_COUNTER]    gc_major_force                              0
[RUBY_DEBUG_COUNTER]    gc_major_oldmalloc                          0
[RUBY_DEBUG_COUNTER]    gc_isptr_trial                              0
[RUBY_DEBUG_COUNTER]    gc_isptr_range                              0
[RUBY_DEBUG_COUNTER]    gc_isptr_align                              0
[RUBY_DEBUG_COUNTER]    gc_isptr_maybe                              0
[RUBY_DEBUG_COUNTER]    obj_newobj                             58,178
[RUBY_DEBUG_COUNTER]    obj_newobj_slowpath                     1,517
[RUBY_DEBUG_COUNTER]    obj_newobj_wb_unprotected                 877
[RUBY_DEBUG_COUNTER]    obj_free                                    0
[RUBY_DEBUG_COUNTER]    obj_promote                                 0
[RUBY_DEBUG_COUNTER]    obj_wb_unprotect                            0
[RUBY_DEBUG_COUNTER]    obj_obj_embed                               0
[RUBY_DEBUG_COUNTER]    obj_obj_transient                           0
[RUBY_DEBUG_COUNTER]    obj_obj_ptr                                 0
[RUBY_DEBUG_COUNTER]    obj_str_ptr                                 0
[RUBY_DEBUG_COUNTER]    obj_str_embed                               0
[RUBY_DEBUG_COUNTER]    obj_str_shared                              0
[RUBY_DEBUG_COUNTER]    obj_str_nofree                              0
[RUBY_DEBUG_COUNTER]    obj_str_fstr                                0
[RUBY_DEBUG_COUNTER]    obj_ary_embed                               0
[RUBY_DEBUG_COUNTER]    obj_ary_transient                           0
[RUBY_DEBUG_COUNTER]    obj_ary_ptr                                 1
[RUBY_DEBUG_COUNTER]    obj_ary_extracapa                           0
[RUBY_DEBUG_COUNTER]    obj_ary_shared_create                      81
[RUBY_DEBUG_COUNTER]    obj_ary_shared                              0
[RUBY_DEBUG_COUNTER]    obj_ary_shared_root_occupied                0
[RUBY_DEBUG_COUNTER]    obj_hash_empty                              0
[RUBY_DEBUG_COUNTER]    obj_hash_1                                  0
[RUBY_DEBUG_COUNTER]    obj_hash_2                                  0
[RUBY_DEBUG_COUNTER]    obj_hash_3                                  0
[RUBY_DEBUG_COUNTER]    obj_hash_4                                  0
[RUBY_DEBUG_COUNTER]    obj_hash_5_8                                0
[RUBY_DEBUG_COUNTER]    obj_hash_g8                                 0
[RUBY_DEBUG_COUNTER]    obj_hash_null                               0
[RUBY_DEBUG_COUNTER]    obj_hash_ar                                 0
[RUBY_DEBUG_COUNTER]    obj_hash_st                                 0
[RUBY_DEBUG_COUNTER]    obj_hash_transient                          0
[RUBY_DEBUG_COUNTER]    obj_hash_force_convert                      0
[RUBY_DEBUG_COUNTER]    obj_struct_embed                            0
[RUBY_DEBUG_COUNTER]    obj_struct_transient                        0
[RUBY_DEBUG_COUNTER]    obj_struct_ptr                              0
[RUBY_DEBUG_COUNTER]    obj_data_empty                              0
[RUBY_DEBUG_COUNTER]    obj_data_xfree                              0
[RUBY_DEBUG_COUNTER]    obj_data_imm_free                           0
[RUBY_DEBUG_COUNTER]    obj_data_zombie                             0
[RUBY_DEBUG_COUNTER]    obj_match_under4                            0
[RUBY_DEBUG_COUNTER]    obj_match_ge4                               0
[RUBY_DEBUG_COUNTER]    obj_match_ge8                               0
[RUBY_DEBUG_COUNTER]    obj_match_ptr                               0
[RUBY_DEBUG_COUNTER]    obj_iclass_ptr                              0
[RUBY_DEBUG_COUNTER]    obj_class_ptr                               0
[RUBY_DEBUG_COUNTER]    obj_module_ptr                              0
[RUBY_DEBUG_COUNTER]    obj_bignum_ptr                              0
[RUBY_DEBUG_COUNTER]    obj_bignum_embed                            0
[RUBY_DEBUG_COUNTER]    obj_float                                   0
[RUBY_DEBUG_COUNTER]    obj_complex                                 0
[RUBY_DEBUG_COUNTER]    obj_rational                                0
[RUBY_DEBUG_COUNTER]    obj_regexp_ptr                              0
[RUBY_DEBUG_COUNTER]    obj_file_ptr                                0
[RUBY_DEBUG_COUNTER]    obj_symbol                                  0
[RUBY_DEBUG_COUNTER]    obj_imemo_ment                              0
[RUBY_DEBUG_COUNTER]    obj_imemo_iseq                              0
[RUBY_DEBUG_COUNTER]    obj_imemo_env                               0
[RUBY_DEBUG_COUNTER]    obj_imemo_tmpbuf                            0
[RUBY_DEBUG_COUNTER]    obj_imemo_ast                               0
[RUBY_DEBUG_COUNTER]    obj_imemo_cref                              0
[RUBY_DEBUG_COUNTER]    obj_imemo_svar                              0
[RUBY_DEBUG_COUNTER]    obj_imemo_throw_data                        0
[RUBY_DEBUG_COUNTER]    obj_imemo_ifunc                             0
[RUBY_DEBUG_COUNTER]    obj_imemo_memo                              0
[RUBY_DEBUG_COUNTER]    obj_imemo_parser_strterm                    0
[RUBY_DEBUG_COUNTER]    artable_hint_hit                       23,024
[RUBY_DEBUG_COUNTER]    artable_hint_miss                         174
[RUBY_DEBUG_COUNTER]    artable_hint_notfound                  27,875
[RUBY_DEBUG_COUNTER]    heap_xmalloc                            7,642
[RUBY_DEBUG_COUNTER]    heap_xrealloc                             153
[RUBY_DEBUG_COUNTER]    heap_xfree                                539
[RUBY_DEBUG_COUNTER]    theap_alloc                             9,295
[RUBY_DEBUG_COUNTER]    theap_alloc_fail                            0
[RUBY_DEBUG_COUNTER]    theap_evacuate                              0
[RUBY_DEBUG_COUNTER]    mjit_exec                                   0
[RUBY_DEBUG_COUNTER]    mjit_exec_not_added                         0
[RUBY_DEBUG_COUNTER]    mjit_exec_not_added_add_iseq                0
[RUBY_DEBUG_COUNTER]    mjit_exec_not_ready                         0
[RUBY_DEBUG_COUNTER]    mjit_exec_not_compiled                      0
[RUBY_DEBUG_COUNTER]    mjit_exec_call_func                         0
[RUBY_DEBUG_COUNTER]    mjit_frame_VM2VM                            0
[RUBY_DEBUG_COUNTER]    mjit_frame_VM2JT                            0
[RUBY_DEBUG_COUNTER]    mjit_frame_JT2JT                            0
[RUBY_DEBUG_COUNTER]    mjit_frame_JT2VM                            0
[RUBY_DEBUG_COUNTER]    mjit_cancel                                 0
[RUBY_DEBUG_COUNTER]    mjit_cancel_ivar_inline                     0
[RUBY_DEBUG_COUNTER]    mjit_cancel_send_inline                     0
[RUBY_DEBUG_COUNTER]    mjit_cancel_opt_insn                        0
[RUBY_DEBUG_COUNTER]    mjit_cancel_invalidate_all                  0
[RUBY_DEBUG_COUNTER]    mjit_length_unit_queue                      0
[RUBY_DEBUG_COUNTER]    mjit_length_active_units                    0
[RUBY_DEBUG_COUNTER]    mjit_length_compact_units                   0
[RUBY_DEBUG_COUNTER]    mjit_length_stale_units                     0
[RUBY_DEBUG_COUNTER]    mjit_compile_failures                       0

Results for 3.0 are:

[RUBY_DEBUG_COUNTER]    417140 show_debug_counters
[RUBY_DEBUG_COUNTER]    mc_inline_hit                         383,937
[RUBY_DEBUG_COUNTER]    mc_inline_miss_klass                   37,470
[RUBY_DEBUG_COUNTER]    mc_inline_miss_invalidated                  0
[RUBY_DEBUG_COUNTER]    mc_cme_complement                         166
[RUBY_DEBUG_COUNTER]    mc_cme_complement_hit                     166
[RUBY_DEBUG_COUNTER]    mc_search                               7,248
[RUBY_DEBUG_COUNTER]    mc_search_notfound                      4,254
[RUBY_DEBUG_COUNTER]    mc_search_super                       107,104
[RUBY_DEBUG_COUNTER]    obj_respond_to                            613
[RUBY_DEBUG_COUNTER]    ci_packed                              13,043
[RUBY_DEBUG_COUNTER]    ci_kw                                       0
[RUBY_DEBUG_COUNTER]    ci_nokw                                     0
[RUBY_DEBUG_COUNTER]    ci_runtime                             13,043
[RUBY_DEBUG_COUNTER]    cc_new                                    228
[RUBY_DEBUG_COUNTER]    cc_temp                                     0
[RUBY_DEBUG_COUNTER]    cc_found_ccs                           36,718
[RUBY_DEBUG_COUNTER]    cc_ent_invalidate                           0
[RUBY_DEBUG_COUNTER]    cc_cme_invalidate                           0
[RUBY_DEBUG_COUNTER]    cc_invalidate_leaf                          0
[RUBY_DEBUG_COUNTER]    cc_invalidate_leaf_ccs                      0
[RUBY_DEBUG_COUNTER]    cc_invalidate_leaf_callable                 0
[RUBY_DEBUG_COUNTER]    cc_invalidate_tree                          0
[RUBY_DEBUG_COUNTER]    cc_invalidate_tree_cme                      0
[RUBY_DEBUG_COUNTER]    cc_invalidate_tree_callable                 0
[RUBY_DEBUG_COUNTER]    ccs_free                                    0
[RUBY_DEBUG_COUNTER]    ccs_maxlen                                  2
[RUBY_DEBUG_COUNTER]    ccs_found                              57,960
[RUBY_DEBUG_COUNTER]    iseq_num                                    0
[RUBY_DEBUG_COUNTER]    iseq_cd_num                                 0
[RUBY_DEBUG_COUNTER]    ccf_general                            11,291
[RUBY_DEBUG_COUNTER]    ccf_iseq_setup                         36,053
[RUBY_DEBUG_COUNTER]    ccf_iseq_setup_0start                   1,068
[RUBY_DEBUG_COUNTER]    ccf_iseq_setup_tailcall_0start              0
[RUBY_DEBUG_COUNTER]    ccf_iseq_fix                          154,419
[RUBY_DEBUG_COUNTER]    ccf_iseq_opt                           16,076
[RUBY_DEBUG_COUNTER]    ccf_iseq_kw1                              124
[RUBY_DEBUG_COUNTER]    ccf_iseq_kw2                              231
[RUBY_DEBUG_COUNTER]    ccf_cfunc                               1,878
[RUBY_DEBUG_COUNTER]    ccf_cfunc_with_frame                  123,495
[RUBY_DEBUG_COUNTER]    ccf_ivar                               51,189
[RUBY_DEBUG_COUNTER]    ccf_attrset                               698
[RUBY_DEBUG_COUNTER]    ccf_method_missing                        524
[RUBY_DEBUG_COUNTER]    ccf_zsuper                                  0
[RUBY_DEBUG_COUNTER]    ccf_bmethod                            18,238
[RUBY_DEBUG_COUNTER]    ccf_opt_send                            8,025
[RUBY_DEBUG_COUNTER]    ccf_opt_call                              857
[RUBY_DEBUG_COUNTER]    ccf_opt_block_call                          0
[RUBY_DEBUG_COUNTER]    ccf_super_method                           32
[RUBY_DEBUG_COUNTER]    frame_push                            390,819
[RUBY_DEBUG_COUNTER]    frame_push_method                     207,971
[RUBY_DEBUG_COUNTER]    frame_push_block                       35,437
[RUBY_DEBUG_COUNTER]    frame_push_class                            0
[RUBY_DEBUG_COUNTER]    frame_push_top                              0
[RUBY_DEBUG_COUNTER]    frame_push_cfunc                      143,645
[RUBY_DEBUG_COUNTER]    frame_push_ifunc                        3,764
[RUBY_DEBUG_COUNTER]    frame_push_eval                             0
[RUBY_DEBUG_COUNTER]    frame_push_rescue                           2
[RUBY_DEBUG_COUNTER]    frame_push_dummy                            0
[RUBY_DEBUG_COUNTER]    frame_R2R                             219,554
[RUBY_DEBUG_COUNTER]    frame_R2C                             132,840
[RUBY_DEBUG_COUNTER]    frame_C2C                              14,569
[RUBY_DEBUG_COUNTER]    frame_C2R                              23,856
[RUBY_DEBUG_COUNTER]    ivar_get_ic_hit                       106,326
[RUBY_DEBUG_COUNTER]    ivar_get_ic_miss                       20,239
[RUBY_DEBUG_COUNTER]    ivar_get_ic_miss_serial                27,800
[RUBY_DEBUG_COUNTER]    ivar_get_ic_miss_unset                 10,739
[RUBY_DEBUG_COUNTER]    ivar_get_ic_miss_noobject                   0
[RUBY_DEBUG_COUNTER]    ivar_set_ic_hit                        17,837
[RUBY_DEBUG_COUNTER]    ivar_set_ic_miss                        1,283
[RUBY_DEBUG_COUNTER]    ivar_set_ic_miss_serial                 5,148
[RUBY_DEBUG_COUNTER]    ivar_set_ic_miss_unset                      7
[RUBY_DEBUG_COUNTER]    ivar_set_ic_miss_iv_hit                 5,155
[RUBY_DEBUG_COUNTER]    ivar_set_ic_miss_noobject               1,283
[RUBY_DEBUG_COUNTER]    ivar_get_base                          21,570
[RUBY_DEBUG_COUNTER]    ivar_set_base                           1,304
[RUBY_DEBUG_COUNTER]    lvar_get                              354,236
[RUBY_DEBUG_COUNTER]    lvar_get_dynamic                       40,161
[RUBY_DEBUG_COUNTER]    lvar_set                               78,173
[RUBY_DEBUG_COUNTER]    lvar_set_dynamic                        1,252
[RUBY_DEBUG_COUNTER]    lvar_set_slowpath                         101
[RUBY_DEBUG_COUNTER]    gc_count                                    0
[RUBY_DEBUG_COUNTER]    gc_minor_newobj                             0
[RUBY_DEBUG_COUNTER]    gc_minor_malloc                             0
[RUBY_DEBUG_COUNTER]    gc_minor_method                             0
[RUBY_DEBUG_COUNTER]    gc_minor_capi                               0
[RUBY_DEBUG_COUNTER]    gc_minor_stress                             0
[RUBY_DEBUG_COUNTER]    gc_major_nofree                             0
[RUBY_DEBUG_COUNTER]    gc_major_oldgen                             0
[RUBY_DEBUG_COUNTER]    gc_major_shady                              0
[RUBY_DEBUG_COUNTER]    gc_major_force                              0
[RUBY_DEBUG_COUNTER]    gc_major_oldmalloc                          0
[RUBY_DEBUG_COUNTER]    gc_isptr_trial                              0
[RUBY_DEBUG_COUNTER]    gc_isptr_range                              0
[RUBY_DEBUG_COUNTER]    gc_isptr_align                              0
[RUBY_DEBUG_COUNTER]    gc_isptr_maybe                              0
[RUBY_DEBUG_COUNTER]    obj_newobj                             57,746
[RUBY_DEBUG_COUNTER]    obj_newobj_slowpath                       873
[RUBY_DEBUG_COUNTER]    obj_newobj_wb_unprotected                 733
[RUBY_DEBUG_COUNTER]    obj_free                                    0
[RUBY_DEBUG_COUNTER]    obj_promote                                 0
[RUBY_DEBUG_COUNTER]    obj_wb_unprotect                            0
[RUBY_DEBUG_COUNTER]    obj_obj_embed                               0
[RUBY_DEBUG_COUNTER]    obj_obj_transient                           0
[RUBY_DEBUG_COUNTER]    obj_obj_ptr                                 0
[RUBY_DEBUG_COUNTER]    obj_str_ptr                                 0
[RUBY_DEBUG_COUNTER]    obj_str_embed                               0
[RUBY_DEBUG_COUNTER]    obj_str_shared                              0
[RUBY_DEBUG_COUNTER]    obj_str_nofree                              0
[RUBY_DEBUG_COUNTER]    obj_str_fstr                                0
[RUBY_DEBUG_COUNTER]    obj_ary_embed                               0
[RUBY_DEBUG_COUNTER]    obj_ary_transient                           0
[RUBY_DEBUG_COUNTER]    obj_ary_ptr                                 1
[RUBY_DEBUG_COUNTER]    obj_ary_extracapa                           0
[RUBY_DEBUG_COUNTER]    obj_ary_shared_create                      81
[RUBY_DEBUG_COUNTER]    obj_ary_shared                              0
[RUBY_DEBUG_COUNTER]    obj_ary_shared_root_occupied                0
[RUBY_DEBUG_COUNTER]    obj_hash_empty                              0
[RUBY_DEBUG_COUNTER]    obj_hash_1                                  0
[RUBY_DEBUG_COUNTER]    obj_hash_2                                  0
[RUBY_DEBUG_COUNTER]    obj_hash_3                                  0
[RUBY_DEBUG_COUNTER]    obj_hash_4                                  0
[RUBY_DEBUG_COUNTER]    obj_hash_5_8                                0
[RUBY_DEBUG_COUNTER]    obj_hash_g8                                 0
[RUBY_DEBUG_COUNTER]    obj_hash_null                               0
[RUBY_DEBUG_COUNTER]    obj_hash_ar                                 0
[RUBY_DEBUG_COUNTER]    obj_hash_st                                 0
[RUBY_DEBUG_COUNTER]    obj_hash_transient                          0
[RUBY_DEBUG_COUNTER]    obj_hash_force_convert                      0
[RUBY_DEBUG_COUNTER]    obj_struct_embed                            0
[RUBY_DEBUG_COUNTER]    obj_struct_transient                        0
[RUBY_DEBUG_COUNTER]    obj_struct_ptr                              0
[RUBY_DEBUG_COUNTER]    obj_data_empty                              0
[RUBY_DEBUG_COUNTER]    obj_data_xfree                              0
[RUBY_DEBUG_COUNTER]    obj_data_imm_free                           0
[RUBY_DEBUG_COUNTER]    obj_data_zombie                             0
[RUBY_DEBUG_COUNTER]    obj_match_under4                            0
[RUBY_DEBUG_COUNTER]    obj_match_ge4                               0
[RUBY_DEBUG_COUNTER]    obj_match_ge8                               0
[RUBY_DEBUG_COUNTER]    obj_match_ptr                               0
[RUBY_DEBUG_COUNTER]    obj_iclass_ptr                              0
[RUBY_DEBUG_COUNTER]    obj_class_ptr                               0
[RUBY_DEBUG_COUNTER]    obj_module_ptr                              0
[RUBY_DEBUG_COUNTER]    obj_bignum_ptr                              0
[RUBY_DEBUG_COUNTER]    obj_bignum_embed                            0
[RUBY_DEBUG_COUNTER]    obj_float                                   0
[RUBY_DEBUG_COUNTER]    obj_complex                                 0
[RUBY_DEBUG_COUNTER]    obj_rational                                0
[RUBY_DEBUG_COUNTER]    obj_regexp_ptr                              0
[RUBY_DEBUG_COUNTER]    obj_file_ptr                                0
[RUBY_DEBUG_COUNTER]    obj_symbol                                  0
[RUBY_DEBUG_COUNTER]    obj_imemo_ment                              0
[RUBY_DEBUG_COUNTER]    obj_imemo_iseq                              0
[RUBY_DEBUG_COUNTER]    obj_imemo_env                               0
[RUBY_DEBUG_COUNTER]    obj_imemo_tmpbuf                            0
[RUBY_DEBUG_COUNTER]    obj_imemo_ast                               0
[RUBY_DEBUG_COUNTER]    obj_imemo_cref                              0
[RUBY_DEBUG_COUNTER]    obj_imemo_svar                              0
[RUBY_DEBUG_COUNTER]    obj_imemo_throw_data                        0
[RUBY_DEBUG_COUNTER]    obj_imemo_ifunc                             0
[RUBY_DEBUG_COUNTER]    obj_imemo_memo                              0
[RUBY_DEBUG_COUNTER]    obj_imemo_parser_strterm                    0
[RUBY_DEBUG_COUNTER]    obj_imemo_callinfo                          0
[RUBY_DEBUG_COUNTER]    obj_imemo_callcache                         0
[RUBY_DEBUG_COUNTER]    artable_hint_hit                       22,575
[RUBY_DEBUG_COUNTER]    artable_hint_miss                          70
[RUBY_DEBUG_COUNTER]    artable_hint_notfound                  27,631
[RUBY_DEBUG_COUNTER]    heap_xmalloc                            6,942
[RUBY_DEBUG_COUNTER]    heap_xrealloc                             987
[RUBY_DEBUG_COUNTER]    heap_xfree                                423
[RUBY_DEBUG_COUNTER]    theap_alloc                             7,715
[RUBY_DEBUG_COUNTER]    theap_alloc_fail                            0
[RUBY_DEBUG_COUNTER]    theap_evacuate                              0
[RUBY_DEBUG_COUNTER]    mjit_exec                                   0
[RUBY_DEBUG_COUNTER]    mjit_exec_not_added                         0
[RUBY_DEBUG_COUNTER]    mjit_exec_not_ready                         0
[RUBY_DEBUG_COUNTER]    mjit_exec_not_compiled                      0
[RUBY_DEBUG_COUNTER]    mjit_exec_call_func                         0
[RUBY_DEBUG_COUNTER]    mjit_add_iseq_to_process                    0
[RUBY_DEBUG_COUNTER]    mjit_unload_units                           0
[RUBY_DEBUG_COUNTER]    mjit_frame_VM2VM                            0
[RUBY_DEBUG_COUNTER]    mjit_frame_VM2JT                            0
[RUBY_DEBUG_COUNTER]    mjit_frame_JT2JT                            0
[RUBY_DEBUG_COUNTER]    mjit_frame_JT2VM                            0
[RUBY_DEBUG_COUNTER]    mjit_cancel                                 0
[RUBY_DEBUG_COUNTER]    mjit_cancel_ivar_inline                     0
[RUBY_DEBUG_COUNTER]    mjit_cancel_exivar_inline                   0
[RUBY_DEBUG_COUNTER]    mjit_cancel_send_inline                     0
[RUBY_DEBUG_COUNTER]    mjit_cancel_opt_insn                        0
[RUBY_DEBUG_COUNTER]    mjit_cancel_invalidate_all                  0
[RUBY_DEBUG_COUNTER]    mjit_cancel_leave                           0
[RUBY_DEBUG_COUNTER]    mjit_length_unit_queue                      0
[RUBY_DEBUG_COUNTER]    mjit_length_active_units                    0
[RUBY_DEBUG_COUNTER]    mjit_length_compact_units                   0
[RUBY_DEBUG_COUNTER]    mjit_length_stale_units                     0
[RUBY_DEBUG_COUNTER]    mjit_compile_failures                       0

I patched Ruby to also count obj_respond_to calls.

This is our "profile mode" home page, it takes about 40ms or so and is very typical of Rails performance.

A single page will issue 613 obj_respond_to calls, I guess this is not properly cached.

You can run it by running Discourse bench on 2.7 (which will prep the DB) then execute the script.

Actions #8

Updated by marcandre (Marc-Andre Lafortune) 5 months ago

  • Related to Bug #17383: 3.0 recursion memory|speed issues added

Updated by tenderlovemaking (Aaron Patterson) 5 months ago

@xrxr and I made a patch that implements a negative lookup cache. We also changed respond_to? to use the per class method cache.

This PR has the "respond_to" patch: https://github.com/ruby/ruby/pull/3873

This PR has the negative cache in addition to the "respond_to" patch: https://github.com/ruby/ruby/pull/3872

Neither of these patches seemed to have a large impact on RailsBench. Maybe they will help Discourse, but I'm not sure.

I do want to talk about the problems with the negative cache patch though. First, I think some of the invalidation is broken (and that's why the CI is red), but that's not really the biggest issue.

The pCMC is effectively a weak reference cache. In our strategy, we just insert a NULL in the cache for that method name. The fact that there is a cache entry, but the entry is NULL allows us to avoid lookups for the negative case.

The problem though: If we cache negative values (in this case it caches a NULL), how long should those negative values live? Right now, they get GC'd extremely quickly. For example, here is a perf report before adding a negative cache:

perf 1

You can see rb_id_table_lookup is pretty high in the report. Now compare to a perf report after adding the negative cache:

perf 2

Now the number 2 entry is cc_table_free_i. Now we're spending time freeing entries in the pCMC because the negative cache entries aren't real objects. I guess the question is: how long should these negative entries live? They can't live forever.

Numbers for RailsBench were approximately the same with this patch presumably because of the extra GC time.

Anyway, I don't have good answers right now, but I wanted to share progress. Also, I'm sorry these screenshots look so big in the webui, I don't know how to make them smaller 😅

Updated by ioquatix (Samuel Williams) 4 months ago

In my early benchmarks:

Async

ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-darwin19]

Wrapper#wait_readable
                        114.385k (± 1.0%) i/s -    573.250k in   5.012051s
    Reactor#register    137.206k (± 1.0%) i/s -    693.906k in   5.057946s

ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin20]

Wrapper#wait_readable
                        110.108k (± 1.4%) i/s -    552.034k in   5.014519s
    Reactor#register    131.360k (± 1.7%) i/s -    665.193k in   5.065527s

Async::HTTP

ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-darwin19]

Running 2s test @ http://127.0.0.1:9294/
  8 threads and 8 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   228.46us   86.09us   3.82ms   83.40%
    Req/Sec     4.32k     1.07k    5.80k    50.60%
  72205 requests in 2.10s, 3.44MB read
Requests/sec:  34380.26
Transfer/sec:      1.64MB

ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin20]

Running 2s test @ http://127.0.0.1:9294/
  8 threads and 8 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   252.91us  107.48us   1.15ms   69.00%
    Req/Sec     3.90k     1.46k    5.87k    39.88%
  65106 requests in 2.10s, 3.10MB read
Requests/sec:  31001.91
Transfer/sec:      1.48MB

Updated by sam.saffron (Sam Saffron) 4 months ago

Good news, ko1 (Koichi Sasada) and @tenderlove managed to recover a fair bit of performance leading to release.

3.0 remains slower in our bench than 2.7 but we did improve over the first post.

That said, the picture as of release is:

Ruby 2.6 -> Ruby 2.7 -> Ruby 3.0

categories page / median: 26 -> 27 -> 28 (slow down 1ms from 2.7 ... 2ms from 2.6)
home page: 50 -> 51 -> 53
topic page: 44 -> 46 -> 49
cat admin: 26 -> 27 -> 28
home admin: 50 -> 53 -> 53
topic admin: 44 -> 45 -> 48

Interestingly when I test this all from the same process some of this slowdown is no longer observed, so some of the slowdown could be due to IO changes or GC.

Actions

Also available in: Atom PDF