Bug #20181
closedProcess.wait(-1) doesn't report exited child processes if WAITPID_USE_SIGCHLD is enabled
Description
From Ruby 2.6 to 3.2, Process.wait(-1)
doesn't return in a timely manner if a spawned, detached process is still running. The following script exits immediately with Ruby 3.3, but hangs for 10 minutes (the length of the sleep
) in Ruby 2.6 to 3.2:
#!/bin/env ruby
Process.spawn({}, "sh -c 'sleep 600'").tap do |pid|
puts "detaching PID #{pid}"
Process.detach(pid)
end
forked_pid = fork do
loop { sleep 1 }
end
child_waiter = Thread.new do
puts "Waiting for child process to die..."
# This works
# puts Process.wait2(forked_pid)
# The spawned process has to exit before this returns in Ruby 3.1 and 3.2
pid, status = Process.wait2(-1)
puts "Exited PID: #{pid}, status: #{status}"
end
process_killer = Thread.new do
puts "Killing #{forked_pid}"
system("kill #{forked_pid}")
end
child_waiter.join
process_killer.join
In Ruby 3.2, we see:
detaching PID 8
Waiting for child process to die...
Killing 11
<process hangs here>
In Ruby 3.3, this exits immediately:
detaching PID 9
Waiting for child process to die...
Killing 11
Exited PID: 11, status: pid 11 SIGTERM (signal 15)
However, if I switch the Process.wait(-1)
to Process.wait(forked_pid)
, Ruby 3.2 works fine.
I've validated that this problem goes away if I disable WAITPID_USE_SIGCHLD
:
diff --git a/vm_core.h b/vm_core.h
index 1cc0659700..0e7d1643fe 100644
--- a/vm_core.h
+++ b/vm_core.h
@@ -126,7 +126,7 @@
#endif
/* define to 0 to test old code path */
-#define WAITPID_USE_SIGCHLD (RUBY_SIGCHLD || SIGCHLD_LOSSY)
+#define WAITPID_USE_SIGCHLD 0
#if defined(SIGSEGV) && defined(HAVE_SIGALTSTACK) && defined(SA_SIGINFO) && !defined(__NetBSD__)
# define USE_SIGALTSTACK
This was first reported in the Puma issue tracker (https://github.com/puma/puma/issues/3313), and another contributor documented long-standing issues with Process.wait
in the past: https://github.com/dentarg/gists/tree/master/gists/ruby-bug-15499#ruby--puma-bug
In Ruby 2.6, https://github.com/ruby/ruby/commit/054a412d540e7ed2de63d68da753f585ea6616c3 introduced a mechanism for rb_waitpid
that uses SIGCHLD
for blocking wait
calls, and this might have introduced this bug. Ruby 2.5 doesn't appear to have this problem.
In Ruby 3.3, this SIGCHLD
implementation was dropped in https://github.com/ruby/ruby/pull/7476 and https://github.com/ruby/ruby/pull/7527, so Ruby 3.3 no longer appears affected.
Updated by stanhu (Stan Hu) 10 months ago
This might be a duplicate of https://bugs.ruby-lang.org/issues/19322.
Updated by kjtsanaktsidis (KJ Tsanaktsidis) 10 months ago
Actually I think this is a duplicate of https://bugs.ruby-lang.org/issues/19837. Does this describe your issue?
The fix for this was backported into the Ruby 3.2 and 3.1 branches, but I don't think a release of either 3.2 or 3.1 has been performed since then. Does the problem go away if you compile Ruby from the ruby_3_2
directly?
Updated by stanhu (Stan Hu) 10 months ago
Yes, thanks, this definitely looks like the same issue. Thanks for filing that issue and getting the patches merged.
I tested ruby_3_2
, and it appears that the patch fixes the problem. I thought it wasn't working initially, but I may have been using the wrong Ruby interpreter.
Updated by byroot (Jean Boussier) 10 months ago
- Related to Bug #19837: Concurrent calls to Process.waitpid2 misbehave on Ruby 3.1 & 3.2 added
Updated by byroot (Jean Boussier) 10 months ago
- Status changed from Open to Closed