Bug #17031

`Kernel#caller_locations(m, n)` should be optimized

Added by marcandre (Marc-Andre Lafortune) 24 days ago. Updated 15 days ago.

Target version:


Kernel#caller_locations(1, 1) currently appears to needlessly allocate memory for the whole backtrace.

It allocates ~20kB for a 800-deep stacktrace, vs 1.6 kB for a shallow backtrace.
It is also much slower for long stacktraces: about 7x slower for a 800-deep backtrace than for a shallow one.

Test used:

def do_something
  location = caller_locations(1, 1).first

def test(depth, trigger)
  do_something if depth == trigger

  test(depth - 1, trigger) unless depth == 0

require 'benchmark/ips'

Benchmark.ips do |x| (:short_backtrace    )    {test(800,800)} (:long_backtrace     )    {test(800,  0)} (:no_caller_locations)    {test(800, -1)}

require 'memory_profiler' { test(800,800) }.pretty_print(scale_bytes: true, detailed_report: false) { test(800,  0) }.pretty_print(scale_bytes: true, detailed_report: false)

Found when checking memory usage on RuboCop.

Updated by Eregon (Benoit Daloze) 23 days ago

Could you post the results of running that on your computer?
Then it's easier to see your point without needing to reproduce.

Updated by marcandre (Marc-Andre Lafortune) 23 days ago


Calculating -------------------------------------
     short_backtrace     28.315k (± 7.1%) i/s -    141.984k in   5.044733s
      long_backtrace     24.168k (± 8.7%) i/s -    120.900k in   5.050243s
 no_caller_locations     29.288k (± 2.5%) i/s -    148.359k in   5.068723s
Total allocated: 1.58 kB (3 objects)
Total retained:  0 B (0 objects)

Total allocated: 19.58 kB (3 objects)
Total retained:  0 B (0 objects)

I got a factor 6.2 this time: (1/24.168-1/29.288)/(1/28.315-1/29.288)

Updated by jeremyevans0 (Jeremy Evans) 21 days ago

Reviewing the related code in vm_backtrace.c, you are correct. This occurs both for caller and caller_locations. The entire internal backtrace object is generated by rb_ec_backtrace_object, and then passed to a function that looks at specific parts of it to generate the strings or Thread::Backtrace::Location objects. To fix this would require changing the logic so that rb_ec_backtrace_object was passed the starting level and number of frames.

Updated by jeremyevans0 (Jeremy Evans) 15 days ago

I've added a pull request that addresses this issue:

Also available in: Atom PDF