Bug #14659
closedsegfault in ConditionVariable#broadcast and ConditionVariable#signal
Description
I'm encountering a consistent segfault within a Rails application running Phusion Passenger on Ruby 2.5.0 and Ruby 2.5.1 when invoking either the #broadcast or #signal method on a ConditionVariable.
Here's what the code that interacts with the ConditionVariable looks like:
def add(event)
unless @job.alive?
@lock.synchronize do
unless @job.alive?
@job = Thread.new { process }
end
end
end
@lock.synchronize do
@events << event
# this invocation causes the segfault
@ready.broadcast
end
end
I can swap out the call to #broadcast with #signal and it causes the same segfault. What I've narrowed it down to based on the C level backtrace is that when using #broadcast, it segfaults on this line of code in thread_sync.c inside the wakeup_all(struct list_head *head) function:
if (cur->th->status != THREAD_KILLED) {
When using #signal, it segfaults on similar line of code, but from the wakeup_one(struct list_head *head) function:
if (cur->th->status != THREAD_KILLED) {
Here are the links to those lines of code in GitHub (for reference) https://github.com/ruby/ruby/blob/trunk/thread_sync.c#L46, https://github.com/ruby/ruby/blob/trunk/thread_sync.c#L30.
In looking at the changes, it looks like the condition variable code was notably rewritten with this commit (https://github.com/ruby/ruby/commit/ea1ce47fd7f2bc9023e9a1391dbadcfaf9e892ce) and that only made it into the 2.5.x line of code. This segfault doesn't happen with anything prior to 2.5.0.
I'm wondering if there is some relation to how Phusion Passenger forks processes for the purposes of Smart Spawning (https://www.phusionpassenger.com/library/indepth/ruby/spawn_methods/). My code is restarting any dead threads, but I don't think there's anything else it can do.
I'm still working on trimming the code down to get a very small, reproducible example, but wanted to post all of this information, as I was hoping someone that's familiar with the internals of thread_sync.c might be able to point out some additional evidence.
Updated by normalperson (Eric Wong) over 6 years ago
nbeyer@gmail.com wrote:
Bug #14659: segfault in ConditionVariable#broadcast and ConditionVariable#signal
https://bugs.ruby-lang.org/issues/14659
I suspect this is identical to https://bugs.ruby-lang.org/issues/14634
Can you try backporting at least r62934?
r62934 a2d63ea2fb84c962abddae877e9493fc57cfce1a
thread_sync.c: avoid reaching across stacks of dead threads
Optional, but the following should be more fixes and improvements:
r63124 85e9f2879373aa496c5c3c7f900ba4869a9ca3f7
offsetof(type, foo.bar) is (arguably) a GCCism
r63210 b456eab2ea77eda51c8d5c06c24d195a6a2932e1
variable.c: fix thread + fork errors in autoload
r63215 af72dcd91b70e94b0f8299ab0464dafe884d2695
thread_sync: redo r62934 to use fork_gen
Thanks.
Updated by nbeyer@gmail.com (Nathan Beyer) over 6 years ago
I've been unable to create a reduced example. I have not tried the suggested backports mentioned above, but if I get a chance, I will.
I did want to post that I worked around the issue by using Phusion Passengers smart spawning hooks: https://www.phusionpassenger.com/library/indepth/ruby/spawn_methods/#smart-spawning-hooks. In the bit of code where I start the sub-system, which in turn creates the threads that were at issue, the code now registers with the smart spawning hook and when the hook is invoked, it restarts the sub-system, which in turn restarts the threads. I'm not sure why the hooks are required to restart the threads, rather than just dynamically doing so when detecting the thread is dead, but it resolves the issue for now.
Updated by jmgarnier (Jean-Michel Garnier) over 6 years ago
I had the exact the same situation as Nathan.
Segfaults happen in https://github.com/codegram/futuroscope gem which initalizes a pool of threads at startup. When passenger smart spawning forks the process, it segfaults.
I tested the same code with ruby 2.6 preview and no segfaults.
Updated by normalperson (Eric Wong) over 6 years ago
Backport request for 2.5.2 is here, sorry for the breakage:
https://bugs.ruby-lang.org/issues/14852
Updated by jeremyevans0 (Jeremy Evans) about 5 years ago
- Status changed from Open to Closed