Project

General

Profile

Actions

Bug #20670

closed

fork deadlocks in child process due to timer thread

Added by jhawthorn (John Hawthorn) 8 months ago. Updated 5 days ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 3.3.4 (2024-07-09 revision be1089c8ec) [x86_64-linux]
[ruby-core:118823]

Description

We've been seeing an occasional failure in the Rails CI related to a test which forks and I managed to reduce it to the following reproduction.

Thread.new do
  loop { sleep 0.0001 }
end

1000.times do
  pid = fork{}
  Process.waitpid(pid)
rescue Exception
  Process.kill(:KILL, pid)
  raise
end

This hangs on Ruby 3.3 and HEAD (fairly reliably), but completes always on Ruby 3.2

In a debugger it seems like the timer thread acquires vm->ractor.sched.lock in the parent process just as the process is forking. The child process then ends up stuck inside of thread_sched_atfork when trying to acquire the same lock.

I've opened https://github.com/ruby/ruby/pull/11356 with a fix


Related issues 1 (0 open1 closed)

Related to Ruby - Bug #19395: Process forking within non-main Ractor hits rb_bug()ClosedActions
Actions #1

Updated by jhawthorn (John Hawthorn) 8 months ago

  • Related to Bug #19395: Process forking within non-main Ractor hits rb_bug() added

Updated by ko1 (Koichi Sasada) 8 months ago

Thank you!

Updated by jhawthorn (John Hawthorn) 8 months ago

  • Status changed from Open to Closed
  • Backport changed from 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN to 3.1: DONTNEED, 3.2: DONTNEED, 3.3: REQUIRED

Updated by k0kubun (Takashi Kokubun) 7 months ago

  • Backport changed from 3.1: DONTNEED, 3.2: DONTNEED, 3.3: REQUIRED to 3.1: DONTNEED, 3.2: DONTNEED, 3.3: DONE

Updated by ioquatix (Samuel Williams) 5 days ago

IIUC, the tests introduced here are now failing again on master:

btest-ruby
   Fstderr output is not empty
     bootstraptest.test_fork.rb_78_287.rb:16:in 'block in <main>': failed (RuntimeError)
             from <internal:numeric>:257:in 'Integer#times'
             from bootstraptest.test_fork.rb_78_287.rb:10:in '<main>'
  #287 test_fork.rb:78: 
       def now = Process.clock_gettime(Process::CLOCK_MONOTONIC)
     
       Thread.new do
         loop { sleep 0.0001 }
       end
     
       10.times do
         pid = fork{ exit!(0) }
         deadline = now + 10
         until Process.waitpid(pid, Process::WNOHANG)
           if now > deadline
             Process.kill(:KILL, pid)
             raise "failed"
           end
           sleep 0.001
         end
       rescue NotImplementedError
       end
       :ok
    #=> "" (expected "ok")  [Bug #20670]
  FAIL 1/1910 tests failed
  make: *** [uncommon.mk:894: yes-btest-ruby] Error 1
  /tmp/tmp.xWdROBpKaN /tmp/tmp.xWdROBpKaN /github/workspace/src /github/workspace

https://github.com/ruby/ruby/actions/runs/14010663767/job/39230065916#step:9:1815

Maybe there has been a regression?

I'm just checking if it's a timeout issue: https://github.com/ruby/ruby/pull/12962 however I suspect it's a deadlock.

Actions

Also available in: Atom PDF

Like1
Like0Like1Like0Like0Like0