Bug #21926: Thread#value on popen3 wait thread hangs in finalizer - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #21926

closed

Thread#value on popen3 wait thread hangs in finalizer

Bug #21926: Thread#value on popen3 wait thread hangs in finalizer

Added by stevecrozz (Stephen Crosby) 3 months ago. Updated 4 days ago.

Status:

Closed

Assignee:

luke-gru (Luke Gruber)

Target version:

ruby -v:

3.3.7

Backport:

3.2: UNKNOWN, 3.3: REQUIRED, 3.4: DONTNEED, 4.0: DONE

[ruby-core:124887]

Description

Calling Thread#value on an Open3.popen3 wait thread from a finalizer completes in Ruby 3.2 but hangs in Ruby 3.3+. See repro.rb below. When the Ruby process hangs in these conditions, it no longer responds to signals and it seems to be unable to run any other threads.

This affects the schmooze gem (and potentially other code using Open3.popen3 with finalizers), causing test suites to hang intermittently.

# repro.rb
require 'open3'

class ProcessWrapper
  def initialize
    @stdin, @stdout, @stderr, @wait_thread = Open3.popen3("cat")
    ObjectSpace.define_finalizer(self, self.class.make_finalizer(@stdin, @stdout, @stderr, @wait_thread))
  end
                                                       
  def self.make_finalizer(stdin, stdout, stderr, wait_thread)
    proc do
      stdin.close rescue nil
      stdout.close rescue nil
      stderr.close rescue nil
      wait_thread.value  # Hangs here in Ruby 3.3+
    end
  end
end
                  
100.times { ProcessWrapper.new } 
GC.stress = true
1000.times { Object.new } 
puts "done"

Environment¶

Linux x86_64
Tested on Ruby 3.2.7, 3.3.7, 3.4.8

Updated by luke-gru (Luke Gruber) 3 months ago · Edited Actions
Copy link
#1 [ruby-core:124936]

Backport changed from 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN to 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: REQUIRED, 4.0: REQUIRED

Thank you for the nice reproduction! We have a fix coming, and it will probably be backported to 3.3, 3.4 and 4.0.

The fix is at https://github.com/ruby/ruby/pull/16307.

Updated by luke-gru (Luke Gruber) 3 months ago Actions
Copy link
#2

Assignee set to luke-gru (Luke Gruber)

Updated by luke-gru (Luke Gruber) 3 months ago Actions
Copy link
#3

Backport changed from 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: REQUIRED, 4.0: REQUIRED to 3.2: UNKNOWN, 3.3: REQUIRED, 3.4: REQUIRED, 4.0: REQUIRED

Updated by Anonymous 3 months ago Actions
Copy link
#4

Status changed from Open to Closed

Applied in changeset git|08372635f7ec09f7115bd254246ebd637499651c.

Fix race condition right after ubf registration

Registering a ubf was considered problematic in some cases because it could
result in lock ordering inversions with the ubf function itself. I believe
this is the reason that in patch be1bbd5b7d, the ubf was registered outside of
the thread_sched_lock.

For example, thread_sched_to_waiting_until_wakeup() (native_sleep)
should register the ubf with the thread_sched_lock (TSL) taken. The ordering
is (TSL -> UBF lock). During the ubf call itself, since the UBF lock is
always acquired during the call, the ordering is (UBF lock -> TSL). This
inversion was avoided by registering the ubf outside the TSL.

This inversion is benign in this case, though. Since there can be no ubf on the
thread before the call to ubf_set, and this ubf is the only place where the
(UBF lock -> TSL lock) ordering is done, it is okay to register it with the TSL.
In fact registering the ubf outside the TSL can result in problamatic race conditions,
so we now register the ubf with the TSL lock.

This fixes a race condition whereby a newly registered ubf is called before the thing it's
protecting (the sleep) is actually run. The sleeping thread can never be interrupted if this
happens.

Fixes [Bug #21926]

Updated by luke-gru (Luke Gruber) 3 months ago Actions
Copy link
#5 [ruby-core:124974]

A note to whoever backports this (which could be me, I'm just waiting ~ 1 week and checking various CIs after the merge):

https://github.com/ruby/ruby/pull/16362 should be backported as well, they are really 1 fix in 2 separate commits.

Updated by k0kubun (Takashi Kokubun) 2 months ago Actions
Copy link
#6 [ruby-core:125019]

Backport changed from 3.2: UNKNOWN, 3.3: REQUIRED, 3.4: REQUIRED, 4.0: REQUIRED to 3.2: UNKNOWN, 3.3: REQUIRED, 3.4: REQUIRED, 4.0: DONE

ruby_4_0 a601b899a35c796775309dca01a6d5e64be14c44 merged revision(s) 08372635f7ec09f7115bd254246ebd637499651c.

Updated by jhawthorn (John Hawthorn) 2 months ago Actions
Copy link
#7 [ruby-core:125139]

Just for the record, this code should not hang and the fix is good, but the pattern in the reproduction isn't safe.

The defined finalizer could potentially be run on any thread including the wait_thread, which would result in the error Target thread must not be current thread (ThreadError).

Updated by nagachika (Tomoyuki Chikanaga) 4 days ago Actions
Copy link
#8 [ruby-core:125566]

Backport changed from 3.2: UNKNOWN, 3.3: REQUIRED, 3.4: REQUIRED, 4.0: DONE to 3.2: UNKNOWN, 3.3: REQUIRED, 3.4: DONTNEED, 4.0: DONE

I suppose this issue isn't reproducible on ruby_3_4.

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Bug #21926

Thread#value on popen3 wait thread hangs in finalizer

Environment¶

Updated by luke-gru (Luke Gruber) 3 months ago · Edited Actions
Copy link
#1 [ruby-core:124936]

Updated by luke-gru (Luke Gruber) 3 months ago Actions
Copy link
#2

Updated by luke-gru (Luke Gruber) 3 months ago Actions
Copy link
#3

Updated by Anonymous 3 months ago Actions
Copy link
#4

Updated by luke-gru (Luke Gruber) 3 months ago Actions
Copy link
#5 [ruby-core:124974]

Updated by k0kubun (Takashi Kokubun) 2 months ago Actions
Copy link
#6 [ruby-core:125019]

Updated by jhawthorn (John Hawthorn) 2 months ago Actions
Copy link
#7 [ruby-core:125139]

Updated by nagachika (Tomoyuki Chikanaga) 4 days ago Actions
Copy link
#8 [ruby-core:125566]

Project

General

Profile

Ruby

Custom queries

Bug #21926

Thread#value on popen3 wait thread hangs in finalizer

Environment¶

Updated by luke-gru (Luke Gruber) 3 months ago · Edited ActionsCopy link #1 [ruby-core:124936]

Updated by luke-gru (Luke Gruber) 3 months ago ActionsCopy link #2

Updated by luke-gru (Luke Gruber) 3 months ago ActionsCopy link #3

Updated by Anonymous 3 months ago ActionsCopy link #4

Updated by luke-gru (Luke Gruber) 3 months ago ActionsCopy link #5 [ruby-core:124974]

Updated by k0kubun (Takashi Kokubun) 2 months ago ActionsCopy link #6 [ruby-core:125019]

Updated by jhawthorn (John Hawthorn) 2 months ago ActionsCopy link #7 [ruby-core:125139]

Updated by nagachika (Tomoyuki Chikanaga) 4 days ago ActionsCopy link #8 [ruby-core:125566]

Updated by luke-gru (Luke Gruber) 3 months ago · Edited Actions
Copy link
#1 [ruby-core:124936]

Updated by luke-gru (Luke Gruber) 3 months ago Actions
Copy link
#2

Updated by luke-gru (Luke Gruber) 3 months ago Actions
Copy link
#3

Updated by Anonymous 3 months ago Actions
Copy link
#4

Updated by luke-gru (Luke Gruber) 3 months ago Actions
Copy link
#5 [ruby-core:124974]

Updated by k0kubun (Takashi Kokubun) 2 months ago Actions
Copy link
#6 [ruby-core:125019]

Updated by jhawthorn (John Hawthorn) 2 months ago Actions
Copy link
#7 [ruby-core:125139]

Updated by nagachika (Tomoyuki Chikanaga) 4 days ago Actions
Copy link
#8 [ruby-core:125566]