Bug #19112
openRactor garbage collection breaks/disables all active tracepoints
Description
Howdy 👋! I work for Datadog on the ddtrace
gem and I found an issue with Ractors + TracePoint.
Background¶
For tracking time spent in Garbage Collection on our Ruby profiler, we use a tracepoint that listens to the internal GC events. While doing other experiments around Ractors and profiling, I noticed that our tracepoint would suddenly stop working after Ractors were used in the application. I was able to reduce this down to the testcase below.
How to reproduce (Ruby version & script)¶
I was able to reproduce this on any Ruby 3 version, including master (ruby 3.2.0dev (2022-11-08T08:08:48Z master cdb3ec3af8) [x86_64-linux]
).
puts RUBY_DESCRIPTION
if ENV['RUN_RACTOR'] == 'true'
Ractor.new { 10 }.take
puts "Ractor ran!"
else
puts "Ractor DID NOT ran!"
end
puts "Enabling tracepoint"
tracepoint = TracePoint.new(:line) { |tp| p [tp.path, tp.lineno, tp.event] }.tap(&:enable)
5.times { puts "Triggering gc!"; GC.start }
puts "Tracepoint state is enabled? #{tracepoint.enabled?}"
puts "Re-enabling tracepoint"
tracepoint.enable
puts "Finished!"
Expectation and result¶
Without the Ractor, the tracepoint works as intended:
# RUN_RACTOR=false ./ruby ../ruby-3.1.2/triggergc.rb
ruby 3.2.0dev (2022-11-08T08:08:48Z master cdb3ec3af8) [x86_64-linux]
Ractor DID NOT ran!
Enabling tracepoint
["<internal:kernel>", 91, :line]
["../ruby-3.1.2/triggergc.rb", 14, :line]
["../ruby-3.1.2/triggergc.rb", 14, :line]
Triggering gc!
["<internal:gc>", 34, :line]
["../ruby-3.1.2/triggergc.rb", 14, :line]
Triggering gc!
["<internal:gc>", 34, :line]
["../ruby-3.1.2/triggergc.rb", 14, :line]
Triggering gc!
["<internal:gc>", 34, :line]
["../ruby-3.1.2/triggergc.rb", 14, :line]
Triggering gc!
["<internal:gc>", 34, :line]
["../ruby-3.1.2/triggergc.rb", 14, :line]
Triggering gc!
["<internal:gc>", 34, :line]
["../ruby-3.1.2/triggergc.rb", 16, :line]
["<internal:trace_point>", 257, :line]
Tracepoint state is enabled? true
["../ruby-3.1.2/triggergc.rb", 17, :line]
Re-enabling tracepoint
["../ruby-3.1.2/triggergc.rb", 18, :line]
["<internal:trace_point>", 213, :line]
["../ruby-3.1.2/triggergc.rb", 20, :line]
["../ruby-3.1.2/triggergc.rb", 20, :line]
Finished!
...but once the Ractor runs and is garbage collected, we see that the tracepoint breaks:
# RUN_RACTOR=true ./ruby ../ruby-3.1.2/triggergc.rb
ruby 3.2.0dev (2022-11-08T08:08:48Z master cdb3ec3af8) [x86_64-linux]
../ruby-3.1.2/triggergc.rb:5: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
Ractor ran!
Enabling tracepoint
["<internal:kernel>", 91, :line]
["../ruby-3.1.2/triggergc.rb", 14, :line]
["../ruby-3.1.2/triggergc.rb", 14, :line]
Triggering gc!
["<internal:gc>", 34, :line]
Triggering gc!
Triggering gc!
Triggering gc!
Triggering gc!
Tracepoint state is enabled? true
Re-enabling tracepoint
["../ruby-3.1.2/triggergc.rb", 20, :line]
["../ruby-3.1.2/triggergc.rb", 20, :line]
Finished
Additional notes¶
This is a somewhat annoying bug, because it can break all gems that use tracepoints, including ddtrace
. In particular for ddtrace
, our plan is to disable the feature that uses tracepoints for Ruby 3 users, to avoid any issues from our tracepoints getting disabled.
Updated by luke-gru (Luke Gruber) almost 2 years ago
I was just looking into this bug and it's really strange, I initially thought it had to do with single ractor mode vs multi, but if we sleep 100
inside the ractor everything behaves as it should, so it has to do with the ractor finishing. Once it finishes the tracepoints behave weirdly... I'll look into it some more when I get the chance.
Weirdly enough when the ractor/thread is garbage collected, the issue goes away:
puts RUBY_DESCRIPTION
if ENV['RUN_RACTOR'] == 'true'
Ractor.new { 10 }.take
puts "Ractor ran!"
else
puts "Ractor DID NOT ran!"
end
# For some reason this seems to be the magic number of GCs, and then it works again (most of the time).
GC.start
GC.start
GC.start
puts "Enabling tracepoint"
tracepoint = TracePoint.new(:line) { |tp| p [tp.path, tp.lineno, tp.event] }.tap(&:enable)
5.times { puts "Triggering gc!"; GC.start }
puts "Tracepoint state is enabled? #{tracepoint.enabled?}"
puts "Re-enabling tracepoint"
tracepoint.enable
puts "Finished!"
Updated by luke-gru (Luke Gruber) almost 2 years ago
To make sure it's collected:
puts RUBY_DESCRIPTION
ractor_id = nil
if ENV['RUN_RACTOR'] == '1'
r = Ractor.new { 10 }
ractor_id = r.object_id
r.take
puts "Ractor ran!"
else
puts "Ractor DID NOT ran!"
end
gc_times = 0
r = nil # allow gc for object
if ENV['RUN_RACTOR'] == '1'
until (ObjectSpace._id2ref(ractor_id) rescue nil).nil?
gc_times += 1
puts "GC x #{gc_times}"
GC.start
end
end
puts "Enabling tracepoint"
tracepoint = TracePoint.new(:line) { |tp| p [tp.path, tp.lineno, tp.event] }.tap(&:enable)
5.times { puts "Triggering gc!"; GC.start }
or this works too, strangely:
if ENV['RUN_RACTOR'] == '1'
r = Ractor.new { 10 } # simply having a reference to the object fixes the issue.
r.take
puts "Ractor ran!"
else
puts "Ractor DID NOT ran!"
end
Updated by luke-gru (Luke Gruber) almost 2 years ago
Pull request is here if you're interested: https://github.com/ruby/ruby/pull/7184