Bug #18053


Crashes and infinite loops when generating partial backtraces in Ruby 3.0+

Added by jhawthorn (John Hawthorn) 6 months ago. Updated 6 months ago.

Target version:
ruby -v:
ruby 3.1.0dev (2021-07-29T22:19:12Z master 9931e2f509) [x86_64-linux]


In Ruby 3.0+ there are some cases where generating a partial backtrace will hang (due to an infinite loop) or segfault. This is also an issue in Ruby master.

def foo
  caller_locations(2, 1).inspect # this will segv
  # caller_locations(2, 1)[0].path # this will infinite loop
end { { foo } }
$ ./ruby test_caller_locations.rb
test_caller_locations.rb:4: [BUG] Segmentation fault at 0x00000000000027e1
ruby 3.1.0dev (2021-07-29T22:19:12Z master 9931e2f509) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0015 p:---- s:0049 e:000048 CFUNC  :inspect
c:0014 p:---- s:0046 e:000045 CFUNC  :inspect
c:0013 p:0008 s:0042 e:000041 METHOD test_caller_locations.rb:4
c:0012 p:0003 s:0038 e:000037 BLOCK  test_caller_locations.rb:8 [FINISH]
c:0011 p:---- s:0035 e:000034 IFUNC
c:0010 p:---- s:0032 e:000031 CFUNC  :times
c:0009 p:---- s:0029 e:000028 CFUNC  :each
c:0008 p:---- s:0026 e:000025 CFUNC  :map
c:0007 p:0006 s:0022 e:000021 BLOCK  test_caller_locations.rb:8 [FINISH]
c:0006 p:---- s:0019 e:000018 IFUNC
c:0005 p:---- s:0016 e:000015 CFUNC  :times
c:0004 p:---- s:0013 e:000012 CFUNC  :each
c:0003 p:---- s:0010 e:000009 CFUNC  :map
c:0002 p:0009 s:0006 e:000005 EVAL   test_caller_locations.rb:8 [FINISH]
c:0001 p:0000 s:0003 E:0002b0 (none) [FINISH]

At least part of what seems to be happening is that the backtrace locations are generated with prev_loc pointing to itself. I added some assertions to catch this, but had a hard time finding what the intended behaviour should have been from there.

cc jeremyevans0 (Jeremy Evans)


output.txt (21.2 KB) output.txt Full output of crash jhawthorn (John Hawthorn), 07/29/2021 11:00 PM
0001-Add-assertions-for-bad-backtrace-locations.patch (2.51 KB) 0001-Add-assertions-for-bad-backtrace-locations.patch Patch with additional assertions jhawthorn (John Hawthorn), 07/29/2021 11:08 PM

Updated by jeremyevans0 (Jeremy Evans) 6 months ago

Thanks for the report. I can confirm both of the issues. I checked with my rewrite of the backtrace generation algorithm (, and confirmed that it fixes both of these issues, in addition to a separate issue I identified. The new algorithm works from the most recent frame towards the oldest frame instead of the oldest frame to the most current frame, which makes more sense for partial backtraces, and is about twice as fast.

As to what to do about 3.0, I can try to fix this, or we can revert the optimization. After the previous bugfixes (see and, the optimization is very complex, and would probably need to be made more complex to fix this and the other issue I identified. The new algorithm, in addition to being faster, is much simpler.

At this point, I think reverting the optimization in 3.0 is the best way forward. We can switch to the new algorithm in 3.1. However, if people would like to keep the optimization in 3.0, I can work on fixing it.

Updated by jhawthorn (John Hawthorn) 6 months ago

jeremyevans0 (Jeremy Evans) wrote in #note-1:

At this point, I think reverting the optimization in 3.0 is the best way forward. We can switch to the new algorithm in 3.1. However, if people would like to keep the optimization in 3.0, I can work on fixing it.

Thanks for taking a look! Reverting would be fine for us.

Actions #3

Updated by jeremyevans (Jeremy Evans) 6 months ago

  • Status changed from Open to Closed

Applied in changeset git|1a05dc03f953830564c272665c47a61e53550f3e.

Make backtrace generation work outward from current frame

This fixes multiple bugs found in the partial backtrace
optimization added in 3b24b7914c16930bfadc89d6aff6326a51c54295.
These bugs occurs when passing a start argument to caller where
the start argument lands on a iseq frame without a pc.

Before this commit, the following code results in the same
line being printed twice, both for the #each method.

def a; [1].group_by { b } end
def b; puts(caller(2, 1).first, caller(3, 1).first) end

After this commit and in Ruby 2.7, the lines are different,
with the first line being for each and the second for group_by.

Before this commit, the following code can either segfault or
result in an infinite loop:

def foo
  caller_locations(2, 1).inspect # segfault
  caller_locations(2, 1)[0].path # infinite loop
end { { foo } }

After this commit, this code works correctly.

This commit completely refactors the backtrace handling.
Instead of processing the backtrace from the outermost
frame working in, process it from the innermost frame
working out. This is much faster for partial backtraces,
since you only access the control frames you need to in
order to construct the backtrace.

To handle cfunc frames in the new design, they start
out with no location information. We increment a counter
for each cfunc frame added. When an iseq frame with pc
is accessed, after adding the iseq backtrace location,
we use the location for the iseq backtrace location for
all of the directly preceding cfunc backtrace locations.

If the last backtrace line is a cfunc frame, we continue
scanning for iseq frames until the end control frame, and
use the location information from the first one for the
trailing cfunc frames in the backtrace.

As only rb_ec_partial_backtrace_object uses the new
backtrace implementation, remove all of the function
pointers and inline the functions. This makes the
process easier to understand.

Restore the Ruby 2.7 implementation of backtrace_each and
use it for all the other functions that called
backtrace_each other than rb_ec_partial_backtrace_object.
All other cases requested the entire backtrace, so there
is no advantage of using the new algorithm for those.
Additionally, there are implicit assumptions in the other
code that the backtrace processing works inward instead
of outward.

Remove the cfunc/iseq union in rb_backtrace_location_t,
and remove the prev_loc member for cfunc. Both cfunc and
iseq types can now have iseq and pc entries, so the
location information can be accessed the same way for each.
This avoids the need for a extra backtrace location entry
to store an iseq backtrace location if the final entry in
the backtrace is a cfunc. This is also what fixes the
segfault and infinite loop issues in the above bugs.

Here's Ruby pseudocode for the new algorithm, where start
and length are the arguments to caller or caller_locations:

end_cf =
cf = VM.start_control_frame
size = VM.num_control_frames - 2
bt = []
cfunc_counter = 0

if length.nil? || length > size
  length = size

while cf != end_cf && bt.size != length
  if cf.iseq?
    if cf.instruction_pointer?
      if start > 0
        start -= 1
        bt << cf.iseq_backtrace_entry
        cf_counter.times do |i|
          bt[-1 - i].loc = cf.loc
        cfunc_counter = 0
  elsif cf.cfunc?
    if start > 0
      start -= 1
      bt << cf.cfunc_backtrace_entry
      cfunc_counter += 1

  cf = cf.prev

if cfunc_counter > 0
  while cf != end_cf
    if (cf.iseq? && cf.instruction_pointer?)
      cf_counter.times do |i|
        bt[-i].loc = cf.loc
    cf = cf.prev

With the following benchmark, which uses a call depth of
around 100 (common in many Ruby applications):

class T
  def test(depth, &block)
    if depth == 0
      yield self
      test(depth - 1, &block)
  def array
  def first
    caller_locations(1, 1)
  def full

t =
t.test((ARGV.first || 100).to_i) do
  Benchmark.ips do |x| ('caller_loc(1, 1)') {t.first} ('caller_loc') {t.full} ('') {t.array}!

Results before commit:

Calculating -------------------------------------
    caller_loc(1, 1)    281.159k (_ 0.7%) i/s -      1.426M in   5.073055s
          caller_loc     15.836k (_ 2.1%) i/s -     79.450k in   5.019426s
       1.852M (_ 2.5%) i/s -      9.296M in   5.022511s

   1852297.5 i/s
    caller_loc(1, 1):   281159.1 i/s - 6.59x  (_ 0.00) slower
          caller_loc:    15835.9 i/s - 116.97x  (_ 0.00) slower

Results after commit:

Calculating -------------------------------------
    caller_loc(1, 1)    562.286k (_ 0.8%) i/s -      2.858M in   5.083249s
          caller_loc     16.402k (_ 1.0%) i/s -     83.200k in   5.072963s
       1.853M (_ 0.1%) i/s -      9.278M in   5.007523s

   1852776.5 i/s
    caller_loc(1, 1):   562285.6 i/s - 3.30x  (_ 0.00) slower
          caller_loc:    16402.3 i/s - 112.96x  (_ 0.00) slower

This shows that the speed of caller_locations(1, 1) has roughly
doubled, and the speed of caller_locations with no arguments
has improved slightly. So this new algorithm is significant faster,
much simpler, and fixes bugs in the previous algorithm.

Fixes [Bug #18053]

Updated by jeremyevans0 (Jeremy Evans) 6 months ago

  • Backport changed from 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN to 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED

I committed my rewrite of backtrace generation (1a05dc03f953830564c272665c47a61e53550f3e), which fixes these issues in master/3.1. I took the examples from this issue and added them as tests in that commit.

For Ruby 3.0, there are a few options:

  1. Backout partial backtrace optimization
  2. Merge rewrite of backtrace generation
  3. Fix partial backtrace optimization.

The issue with option 1 is that cffdacb15a363321e1c1879aa7d94924acafd1cf was merged after the optimization was committed, and it is nontrivial to backout the partial backtrace optimization and still keep that feature. I tried, but my attempt could not pass CI. Additionally, even if the issues with the backout could be fixed, using the backout approach would make partial backtraces slower.

The issue with option 2 is that 1a05dc03f953830564c272665c47a61e53550f3e is fairly invasive. I'm fairly sure it would apply cleanly to 3.0, but I'm not sure if nagachika (Tomoyuki Chikanaga) would be comfortable backporting such a large change.

So I worked on implementing option 3. It took many hours to make the new test cases pass without breaking existing test cases, but I eventually did get it to work. The way I got it to work was using a similar strategy as was used for 1a05dc03f953830564c272665c47a61e53550f3e. Basically, scan the backtrace from the current frame toward the earliest frame instead of from the earlier frame to the current frame. When we come to the frame that will complete the partial backtrace, we use that as the starting frame for backtrace generation, and backtrace generation still works from earlier frames towards the current frame. I submitted a pull request for this option:

Updated by nagachika (Tomoyuki Chikanaga) 6 months ago

Hello Jeremy,
Thank you for your great work on the partial backtrace optimization and deep understanding for the stable branch maintenance. I'd like to adopt your 3rd option. I will merge your PR after confirming the CI passed with it.

Actions #6

Updated by nagachika (Tomoyuki Chikanaga) 6 months ago

  • Backport changed from 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED to 2.6: DONTNEED, 2.7: DONTNEED, 3.0: DONE

ruby_3_0 c07545bbf82068f4fd92a5ccfa2b09bb96b39adb.

Updated by jhawthorn (John Hawthorn) 6 months ago

Thanks Jeremy! Just confirmed that the fix in ruby_3_0 solves all the crashes and hangs we were seeing.


Also available in: Atom PDF