Bug #20181
closedProcess.wait(-1) doesn't report exited child processes if WAITPID_USE_SIGCHLD is enabled
Description
From Ruby 2.6 to 3.2, Process.wait(-1)
doesn't return in a timely manner if a spawned, detached process is still running. The following script exits immediately with Ruby 3.3, but hangs for 10 minutes (the length of the sleep
) in Ruby 2.6 to 3.2:
#!/bin/env ruby
Process.spawn({}, "sh -c 'sleep 600'").tap do |pid|
puts "detaching PID #{pid}"
Process.detach(pid)
end
forked_pid = fork do
loop { sleep 1 }
end
child_waiter = Thread.new do
puts "Waiting for child process to die..."
# This works
# puts Process.wait2(forked_pid)
# The spawned process has to exit before this returns in Ruby 3.1 and 3.2
pid, status = Process.wait2(-1)
puts "Exited PID: #{pid}, status: #{status}"
end
process_killer = Thread.new do
puts "Killing #{forked_pid}"
system("kill #{forked_pid}")
end
child_waiter.join
process_killer.join
In Ruby 3.2, we see:
detaching PID 8
Waiting for child process to die...
Killing 11
<process hangs here>
In Ruby 3.3, this exits immediately:
detaching PID 9
Waiting for child process to die...
Killing 11
Exited PID: 11, status: pid 11 SIGTERM (signal 15)
However, if I switch the Process.wait(-1)
to Process.wait(forked_pid)
, Ruby 3.2 works fine.
I've validated that this problem goes away if I disable WAITPID_USE_SIGCHLD
:
diff --git a/vm_core.h b/vm_core.h
index 1cc0659700..0e7d1643fe 100644
--- a/vm_core.h
+++ b/vm_core.h
@@ -126,7 +126,7 @@
#endif
/* define to 0 to test old code path */
-#define WAITPID_USE_SIGCHLD (RUBY_SIGCHLD || SIGCHLD_LOSSY)
+#define WAITPID_USE_SIGCHLD 0
#if defined(SIGSEGV) && defined(HAVE_SIGALTSTACK) && defined(SA_SIGINFO) && !defined(__NetBSD__)
# define USE_SIGALTSTACK
This was first reported in the Puma issue tracker (https://github.com/puma/puma/issues/3313), and another contributor documented long-standing issues with Process.wait
in the past: https://github.com/dentarg/gists/tree/master/gists/ruby-bug-15499#ruby--puma-bug
In Ruby 2.6, https://github.com/ruby/ruby/commit/054a412d540e7ed2de63d68da753f585ea6616c3 introduced a mechanism for rb_waitpid
that uses SIGCHLD
for blocking wait
calls, and this might have introduced this bug. Ruby 2.5 doesn't appear to have this problem.
In Ruby 3.3, this SIGCHLD
implementation was dropped in https://github.com/ruby/ruby/pull/7476 and https://github.com/ruby/ruby/pull/7527, so Ruby 3.3 no longer appears affected.