Bug #21201
openPerformance regression when defining methods inside `refine` blocks
Description
After the change introduced in PR #10037, a significant performance regression has been observed when defining methods within refine blocks.
The PR correctly fixes a bug where callcache
invalidation was missing when methods are defined inside a refinement. To fix this, it now invokes rb_clear_all_refinement_method_cache()
at the time of method definition within refine
. However, this function performs a full object space traversal via rb_objspace_each_objects
, making it extremely expensive.
In our team's Rails application, refinements are used heavily. During application code reload (triggered in development mode), rb_clear_all_refinement_method_cache()
is called 1425 times, 1142 of which originate via rb_clear_method_cache()
. As a result, the code reload time has increased from 5 seconds to 15 seconds after the patch. Since reloading occurs every time the code changes, this degrades development experience severely.
Reproduction script:¶
require "bundler/inline"
gemfile(true) do
source "https://rubygems.org"
gem "benchmark-ips"
end
mod = Module.new
klass = Class.new
Benchmark.ips do |x|
x.report(RUBY_VERSION) do
mod.send(:refine, klass) do
def call_1 = nil
def call_2 = nil
def call_3 = nil
end
end
x.save! "/tmp/performance_regression_refine.bench"
x.compare!
end
Benchmark results:¶
Comparison:
3.2.7: 1500316.7 i/s
3.3.7: 158.0 i/s - 9496.46x slower
3.5.0: 124.6 i/s - 12041.43x slower
3.4.2: 119.5 i/s - 12553.50x slower
Related Slack discussion: https://ruby-jp.slack.com/archives/CLWSHA76V/p1741932018811539
I totally understand the necessity and value of the past PR that fixed the callcache bugs. That said, I was wondering if there might be any ideas to make it faster, or to handle the invalidation more efficiently.
Updated by byroot (Jean Boussier) 25 days ago
I was wondering if there might be any ideas to make it faster, or to handle the invalidation more efficiently.
The solution is likely to keep the list (set) of all existing call caches that belong to a refinement, as to save the object space traversal.
But yes, this performance issue is why at work we disallowed usage of refinement using rubocop.
Updated by tenderlovemaking (Aaron Patterson) 25 days ago
byroot (Jean Boussier) wrote in #note-1:
I was wondering if there might be any ideas to make it faster, or to handle the invalidation more efficiently.
The solution is likely to keep the list (set) of all existing call caches that belong to a refinement, as to save the object space traversal.
But yes, this performance issue is why at work we disallowed usage of refinement using rubocop.
I guess if refined methods are rare enough, maybe it's not worthwhile to allocate "refinement inline cache"? In other words, refined methods are always an IC miss?
Updated by byroot (Jean Boussier) 25 days ago
refined methods are always an IC miss?
I like that idea.
Updated by byroot (Jean Boussier) 25 days ago
The downside however is that I think we'd need to lock the VM to lookup methods? So it doesn't help with making Ractor lock less.
Updated by jhawthorn (John Hawthorn) 25 days ago
· Edited
tenderlovemaking (Aaron Patterson) wrote in #note-2:
I guess if refined methods are rare enough, maybe it's not worthwhile to allocate "refinement inline cache"? In other words, refined methods are always an IC miss?
I believe that was the case before Ruby 3.3 https://github.com/ruby/ruby/commit/cfd7729ce7a31c8b6ec5dd0e99c67b2932de4732
Updated by alpaca-tc (Hiroyuki Ishii) 14 days ago
byroot (Jean Boussier) wrote in #note-1:
The solution is likely to keep the list (set) of all existing call caches that belong to a refinement, as to save the object space traversal.
I created a pull request based on this approach. https://github.com/ruby/ruby/pull/13077
Indeed, the performance improvement is remarkable — reloading our Rails locally now completes in under 3 seconds.
Many thanks to everyone in this thread for your helpful insights and suggestions.
Updated by byroot (Jean Boussier) 14 days ago
Your patch look really good.
I wonder if it would be possible to do like the vm->constant_cache
table, have the key be the method name, so that you wouldn't need to clear absolutely everything.
But your patch as-is is already a nice improvement I think, the only downside being the one extra lock point for ractors.