Project

General

Profile

Actions

Bug #19112

open

Ractor garbage collection breaks/disables all active tracepoints

Added by ivoanjo (Ivo Anjo) about 2 years ago. Updated almost 2 years ago.

Status:
Open
Assignee:
-
Target version:
-
ruby -v:
ruby 3.2.0dev (2022-11-08T08:08:48Z master cdb3ec3af8) [x86_64-linux]
[ruby-core:110656]

Description

Howdy 👋! I work for Datadog on the ddtrace gem and I found an issue with Ractors + TracePoint.

Background

For tracking time spent in Garbage Collection on our Ruby profiler, we use a tracepoint that listens to the internal GC events. While doing other experiments around Ractors and profiling, I noticed that our tracepoint would suddenly stop working after Ractors were used in the application. I was able to reduce this down to the testcase below.

How to reproduce (Ruby version & script)

I was able to reproduce this on any Ruby 3 version, including master (ruby 3.2.0dev (2022-11-08T08:08:48Z master cdb3ec3af8) [x86_64-linux]).

puts RUBY_DESCRIPTION

if ENV['RUN_RACTOR'] == 'true'
  Ractor.new { 10 }.take
  puts "Ractor ran!"
else
  puts "Ractor DID NOT ran!"
end

puts "Enabling tracepoint"
tracepoint = TracePoint.new(:line) { |tp| p [tp.path, tp.lineno, tp.event] }.tap(&:enable)

5.times { puts "Triggering gc!"; GC.start }

puts "Tracepoint state is enabled? #{tracepoint.enabled?}"
puts "Re-enabling tracepoint"
tracepoint.enable

puts "Finished!"

Expectation and result

Without the Ractor, the tracepoint works as intended:

# RUN_RACTOR=false ./ruby ../ruby-3.1.2/triggergc.rb
ruby 3.2.0dev (2022-11-08T08:08:48Z master cdb3ec3af8) [x86_64-linux]
Ractor DID NOT ran!
Enabling tracepoint
["<internal:kernel>", 91, :line]
["../ruby-3.1.2/triggergc.rb", 14, :line]
["../ruby-3.1.2/triggergc.rb", 14, :line]
Triggering gc!
["<internal:gc>", 34, :line]
["../ruby-3.1.2/triggergc.rb", 14, :line]
Triggering gc!
["<internal:gc>", 34, :line]
["../ruby-3.1.2/triggergc.rb", 14, :line]
Triggering gc!
["<internal:gc>", 34, :line]
["../ruby-3.1.2/triggergc.rb", 14, :line]
Triggering gc!
["<internal:gc>", 34, :line]
["../ruby-3.1.2/triggergc.rb", 14, :line]
Triggering gc!
["<internal:gc>", 34, :line]
["../ruby-3.1.2/triggergc.rb", 16, :line]
["<internal:trace_point>", 257, :line]
Tracepoint state is enabled? true
["../ruby-3.1.2/triggergc.rb", 17, :line]
Re-enabling tracepoint
["../ruby-3.1.2/triggergc.rb", 18, :line]
["<internal:trace_point>", 213, :line]
["../ruby-3.1.2/triggergc.rb", 20, :line]
["../ruby-3.1.2/triggergc.rb", 20, :line]
Finished!

...but once the Ractor runs and is garbage collected, we see that the tracepoint breaks:

# RUN_RACTOR=true ./ruby ../ruby-3.1.2/triggergc.rb
ruby 3.2.0dev (2022-11-08T08:08:48Z master cdb3ec3af8) [x86_64-linux]
../ruby-3.1.2/triggergc.rb:5: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
Ractor ran!
Enabling tracepoint
["<internal:kernel>", 91, :line]
["../ruby-3.1.2/triggergc.rb", 14, :line]
["../ruby-3.1.2/triggergc.rb", 14, :line]
Triggering gc!
["<internal:gc>", 34, :line]
Triggering gc!
Triggering gc!
Triggering gc!
Triggering gc!
Tracepoint state is enabled? true
Re-enabling tracepoint
["../ruby-3.1.2/triggergc.rb", 20, :line]
["../ruby-3.1.2/triggergc.rb", 20, :line]
Finished

Additional notes

This is a somewhat annoying bug, because it can break all gems that use tracepoints, including ddtrace. In particular for ddtrace, our plan is to disable the feature that uses tracepoints for Ruby 3 users, to avoid any issues from our tracepoints getting disabled.

Updated by luke-gru (Luke Gruber) almost 2 years ago

I was just looking into this bug and it's really strange, I initially thought it had to do with single ractor mode vs multi, but if we sleep 100 inside the ractor everything behaves as it should, so it has to do with the ractor finishing. Once it finishes the tracepoints behave weirdly... I'll look into it some more when I get the chance.

Weirdly enough when the ractor/thread is garbage collected, the issue goes away:

puts RUBY_DESCRIPTION

if ENV['RUN_RACTOR'] == 'true'
  Ractor.new { 10 }.take
  puts "Ractor ran!"
else
  puts "Ractor DID NOT ran!"
end

# For some reason this seems to be the magic number of GCs, and then it works again (most of the time).
GC.start
GC.start
GC.start

puts "Enabling tracepoint"
tracepoint = TracePoint.new(:line) { |tp| p [tp.path, tp.lineno, tp.event] }.tap(&:enable)

5.times { puts "Triggering gc!"; GC.start }

puts "Tracepoint state is enabled? #{tracepoint.enabled?}"
puts "Re-enabling tracepoint"
tracepoint.enable

puts "Finished!"

Updated by luke-gru (Luke Gruber) almost 2 years ago

To make sure it's collected:

puts RUBY_DESCRIPTION

ractor_id = nil
if ENV['RUN_RACTOR'] == '1'
  r = Ractor.new { 10 }
  ractor_id = r.object_id
  r.take
  puts "Ractor ran!"
else
  puts "Ractor DID NOT ran!"
end
gc_times = 0
r = nil # allow gc for object
if ENV['RUN_RACTOR'] == '1'
  until (ObjectSpace._id2ref(ractor_id) rescue nil).nil?
    gc_times += 1
    puts "GC x #{gc_times}"
    GC.start
  end
end

puts "Enabling tracepoint"
tracepoint = TracePoint.new(:line) { |tp| p [tp.path, tp.lineno, tp.event] }.tap(&:enable)

5.times { puts "Triggering gc!"; GC.start }

or this works too, strangely:

if ENV['RUN_RACTOR'] == '1'
  r = Ractor.new { 10 } # simply having a reference to the object fixes the issue.
  r.take
  puts "Ractor ran!"
else
  puts "Ractor DID NOT ran!"
end

Updated by luke-gru (Luke Gruber) almost 2 years ago

Pull request is here if you're interested: https://github.com/ruby/ruby/pull/7184

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0