Bug #18053
closedCrashes and infinite loops when generating partial backtraces in Ruby 3.0+
Description
In Ruby 3.0+ there are some cases where generating a partial backtrace will hang (due to an infinite loop) or segfault. This is also an issue in Ruby master.
def foo
caller_locations(2, 1).inspect # this will segv
# caller_locations(2, 1)[0].path # this will infinite loop
end
1.times.map { 1.times.map { foo } }
$ ./ruby test_caller_locations.rb
test_caller_locations.rb:4: [BUG] Segmentation fault at 0x00000000000027e1
ruby 3.1.0dev (2021-07-29T22:19:12Z master 9931e2f509) [x86_64-linux]
-- Control frame information -----------------------------------------------
c:0015 p:---- s:0049 e:000048 CFUNC :inspect
c:0014 p:---- s:0046 e:000045 CFUNC :inspect
c:0013 p:0008 s:0042 e:000041 METHOD test_caller_locations.rb:4
c:0012 p:0003 s:0038 e:000037 BLOCK test_caller_locations.rb:8 [FINISH]
c:0011 p:---- s:0035 e:000034 IFUNC
c:0010 p:---- s:0032 e:000031 CFUNC :times
c:0009 p:---- s:0029 e:000028 CFUNC :each
c:0008 p:---- s:0026 e:000025 CFUNC :map
c:0007 p:0006 s:0022 e:000021 BLOCK test_caller_locations.rb:8 [FINISH]
c:0006 p:---- s:0019 e:000018 IFUNC
c:0005 p:---- s:0016 e:000015 CFUNC :times
c:0004 p:---- s:0013 e:000012 CFUNC :each
c:0003 p:---- s:0010 e:000009 CFUNC :map
c:0002 p:0009 s:0006 e:000005 EVAL test_caller_locations.rb:8 [FINISH]
c:0001 p:0000 s:0003 E:0002b0 (none) [FINISH]
At least part of what seems to be happening is that the backtrace locations are generated with prev_loc
pointing to itself. I added some assertions to catch this, but had a hard time finding what the intended behaviour should have been from there.
Files
Updated by jeremyevans0 (Jeremy Evans) over 3 years ago
Thanks for the report. I can confirm both of the issues. I checked with my rewrite of the backtrace generation algorithm (https://github.com/ruby/ruby/pull/4671), and confirmed that it fixes both of these issues, in addition to a separate issue I identified. The new algorithm works from the most recent frame towards the oldest frame instead of the oldest frame to the most current frame, which makes more sense for partial backtraces, and is about twice as fast.
As to what to do about 3.0, I can try to fix this, or we can revert the optimization. After the previous bugfixes (see https://github.com/ruby/ruby/pull/4120 and https://github.com/ruby/ruby/pull/4237), the optimization is very complex, and would probably need to be made more complex to fix this and the other issue I identified. The new algorithm, in addition to being faster, is much simpler.
At this point, I think reverting the optimization in 3.0 is the best way forward. We can switch to the new algorithm in 3.1. However, if people would like to keep the optimization in 3.0, I can work on fixing it.
Updated by jhawthorn (John Hawthorn) over 3 years ago
jeremyevans0 (Jeremy Evans) wrote in #note-1:
At this point, I think reverting the optimization in 3.0 is the best way forward. We can switch to the new algorithm in 3.1. However, if people would like to keep the optimization in 3.0, I can work on fixing it.
Thanks for taking a look! Reverting would be fine for us.
Updated by jeremyevans (Jeremy Evans) over 3 years ago
- Status changed from Open to Closed
Applied in changeset git|1a05dc03f953830564c272665c47a61e53550f3e.
Make backtrace generation work outward from current frame
This fixes multiple bugs found in the partial backtrace
optimization added in 3b24b7914c16930bfadc89d6aff6326a51c54295.
These bugs occurs when passing a start argument to caller where
the start argument lands on a iseq frame without a pc.
Before this commit, the following code results in the same
line being printed twice, both for the #each method.
def a; [1].group_by { b } end
def b; puts(caller(2, 1).first, caller(3, 1).first) end
a
After this commit and in Ruby 2.7, the lines are different,
with the first line being for each and the second for group_by.
Before this commit, the following code can either segfault or
result in an infinite loop:
def foo
caller_locations(2, 1).inspect # segfault
caller_locations(2, 1)[0].path # infinite loop
end
1.times.map { 1.times.map { foo } }
After this commit, this code works correctly.
This commit completely refactors the backtrace handling.
Instead of processing the backtrace from the outermost
frame working in, process it from the innermost frame
working out. This is much faster for partial backtraces,
since you only access the control frames you need to in
order to construct the backtrace.
To handle cfunc frames in the new design, they start
out with no location information. We increment a counter
for each cfunc frame added. When an iseq frame with pc
is accessed, after adding the iseq backtrace location,
we use the location for the iseq backtrace location for
all of the directly preceding cfunc backtrace locations.
If the last backtrace line is a cfunc frame, we continue
scanning for iseq frames until the end control frame, and
use the location information from the first one for the
trailing cfunc frames in the backtrace.
As only rb_ec_partial_backtrace_object uses the new
backtrace implementation, remove all of the function
pointers and inline the functions. This makes the
process easier to understand.
Restore the Ruby 2.7 implementation of backtrace_each and
use it for all the other functions that called
backtrace_each other than rb_ec_partial_backtrace_object.
All other cases requested the entire backtrace, so there
is no advantage of using the new algorithm for those.
Additionally, there are implicit assumptions in the other
code that the backtrace processing works inward instead
of outward.
Remove the cfunc/iseq union in rb_backtrace_location_t,
and remove the prev_loc member for cfunc. Both cfunc and
iseq types can now have iseq and pc entries, so the
location information can be accessed the same way for each.
This avoids the need for a extra backtrace location entry
to store an iseq backtrace location if the final entry in
the backtrace is a cfunc. This is also what fixes the
segfault and infinite loop issues in the above bugs.
Here's Ruby pseudocode for the new algorithm, where start
and length are the arguments to caller or caller_locations:
end_cf = VM.end_control_frame.next
cf = VM.start_control_frame
size = VM.num_control_frames - 2
bt = []
cfunc_counter = 0
if length.nil? || length > size
length = size
end
while cf != end_cf && bt.size != length
if cf.iseq?
if cf.instruction_pointer?
if start > 0
start -= 1
else
bt << cf.iseq_backtrace_entry
cf_counter.times do |i|
bt[-1 - i].loc = cf.loc
end
cfunc_counter = 0
end
end
elsif cf.cfunc?
if start > 0
start -= 1
else
bt << cf.cfunc_backtrace_entry
cfunc_counter += 1
end
end
cf = cf.prev
end
if cfunc_counter > 0
while cf != end_cf
if (cf.iseq? && cf.instruction_pointer?)
cf_counter.times do |i|
bt[-i].loc = cf.loc
end
end
cf = cf.prev
end
end
With the following benchmark, which uses a call depth of
around 100 (common in many Ruby applications):
class T
def test(depth, &block)
if depth == 0
yield self
else
test(depth - 1, &block)
end
end
def array
Array.new
end
def first
caller_locations(1, 1)
end
def full
caller_locations
end
end
t = T.new
t.test((ARGV.first || 100).to_i) do
Benchmark.ips do |x|
x.report ('caller_loc(1, 1)') {t.first}
x.report ('caller_loc') {t.full}
x.report ('Array.new') {t.array}
x.compare!
end
end
Results before commit:
Calculating -------------------------------------
caller_loc(1, 1) 281.159k (_ 0.7%) i/s - 1.426M in 5.073055s
caller_loc 15.836k (_ 2.1%) i/s - 79.450k in 5.019426s
Array.new 1.852M (_ 2.5%) i/s - 9.296M in 5.022511s
Comparison:
Array.new: 1852297.5 i/s
caller_loc(1, 1): 281159.1 i/s - 6.59x (_ 0.00) slower
caller_loc: 15835.9 i/s - 116.97x (_ 0.00) slower
Results after commit:
Calculating -------------------------------------
caller_loc(1, 1) 562.286k (_ 0.8%) i/s - 2.858M in 5.083249s
caller_loc 16.402k (_ 1.0%) i/s - 83.200k in 5.072963s
Array.new 1.853M (_ 0.1%) i/s - 9.278M in 5.007523s
Comparison:
Array.new: 1852776.5 i/s
caller_loc(1, 1): 562285.6 i/s - 3.30x (_ 0.00) slower
caller_loc: 16402.3 i/s - 112.96x (_ 0.00) slower
This shows that the speed of caller_locations(1, 1) has roughly
doubled, and the speed of caller_locations with no arguments
has improved slightly. So this new algorithm is significant faster,
much simpler, and fixes bugs in the previous algorithm.
Fixes [Bug #18053]
Updated by jeremyevans0 (Jeremy Evans) over 3 years ago
- Backport changed from 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN to 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED
I committed my rewrite of backtrace generation (1a05dc03f953830564c272665c47a61e53550f3e), which fixes these issues in master/3.1. I took the examples from this issue and added them as tests in that commit.
For Ruby 3.0, there are a few options:
- Backout partial backtrace optimization
- Merge rewrite of backtrace generation
- Fix partial backtrace optimization.
The issue with option 1 is that cffdacb15a363321e1c1879aa7d94924acafd1cf was merged after the optimization was committed, and it is nontrivial to backout the partial backtrace optimization and still keep that feature. I tried, but my attempt could not pass CI. Additionally, even if the issues with the backout could be fixed, using the backout approach would make partial backtraces slower.
The issue with option 2 is that 1a05dc03f953830564c272665c47a61e53550f3e is fairly invasive. I'm fairly sure it would apply cleanly to 3.0, but I'm not sure if @nagachika (Tomoyuki Chikanaga) would be comfortable backporting such a large change.
So I worked on implementing option 3. It took many hours to make the new test cases pass without breaking existing test cases, but I eventually did get it to work. The way I got it to work was using a similar strategy as was used for 1a05dc03f953830564c272665c47a61e53550f3e. Basically, scan the backtrace from the current frame toward the earliest frame instead of from the earlier frame to the current frame. When we come to the frame that will complete the partial backtrace, we use that as the starting frame for backtrace generation, and backtrace generation still works from earlier frames towards the current frame. I submitted a pull request for this option: https://github.com/ruby/ruby/pull/4714
Updated by nagachika (Tomoyuki Chikanaga) over 3 years ago
Hello Jeremy,
Thank you for your great work on the partial backtrace optimization and deep understanding for the stable branch maintenance. I'd like to adopt your 3rd option. I will merge your PR after confirming the CI passed with it.
Updated by nagachika (Tomoyuki Chikanaga) over 3 years ago
- Backport changed from 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED to 2.6: DONTNEED, 2.7: DONTNEED, 3.0: DONE
ruby_3_0 c07545bbf82068f4fd92a5ccfa2b09bb96b39adb.
Updated by jhawthorn (John Hawthorn) over 3 years ago
Thanks Jeremy! Just confirmed that the fix in ruby_3_0 solves all the crashes and hangs we were seeing.