Project

General

Profile

Bug #14659

segfault in ConditionVariable#broadcast and ConditionVariable#signal

Added by nbeyer@gmail.com (Nathan Beyer) 7 months ago. Updated 4 months ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-linux]
[ruby-core:86436]

Description

I'm encountering a consistent segfault within a Rails application running Phusion Passenger on Ruby 2.5.0 and Ruby 2.5.1 when invoking either the #broadcast or #signal method on a ConditionVariable.

Here's what the code that interacts with the ConditionVariable looks like:

    def add(event)
      unless @job.alive?
        @lock.synchronize do
          unless @job.alive?
            @job = Thread.new { process }
          end
        end
      end

      @lock.synchronize do
        @events << event

        # this invocation causes the segfault
        @ready.broadcast
      end
    end

I can swap out the call to #broadcast with #signal and it causes the same segfault. What I've narrowed it down to based on the C level backtrace is that when using #broadcast, it segfaults on this line of code in thread_sync.c inside the wakeup_all(struct list_head *head) function:

    if (cur->th->status != THREAD_KILLED) {

When using #signal, it segfaults on similar line of code, but from the wakeup_one(struct list_head *head) function:

    if (cur->th->status != THREAD_KILLED) {

Here are the links to those lines of code in GitHub (for reference) https://github.com/ruby/ruby/blob/trunk/thread_sync.c#L46, https://github.com/ruby/ruby/blob/trunk/thread_sync.c#L30.

In looking at the changes, it looks like the condition variable code was notably rewritten with this commit (https://github.com/ruby/ruby/commit/ea1ce47fd7f2bc9023e9a1391dbadcfaf9e892ce) and that only made it into the 2.5.x line of code. This segfault doesn't happen with anything prior to 2.5.0.

I'm wondering if there is some relation to how Phusion Passenger forks processes for the purposes of Smart Spawning (https://www.phusionpassenger.com/library/indepth/ruby/spawn_methods/). My code is restarting any dead threads, but I don't think there's anything else it can do.

I'm still working on trimming the code down to get a very small, reproducible example, but wanted to post all of this information, as I was hoping someone that's familiar with the internals of thread_sync.c might be able to point out some additional evidence.

History

#1 [ruby-core:86635] Updated by normalperson (Eric Wong) 6 months ago

nbeyer@gmail.com wrote:

Bug #14659: segfault in ConditionVariable#broadcast and ConditionVariable#signal
https://bugs.ruby-lang.org/issues/14659

I suspect this is identical to https://bugs.ruby-lang.org/issues/14634

Can you try backporting at least r62934?

r62934 a2d63ea2fb84c962abddae877e9493fc57cfce1a
thread_sync.c: avoid reaching across stacks of dead threads

Optional, but the following should be more fixes and improvements:

r63124 85e9f2879373aa496c5c3c7f900ba4869a9ca3f7
offsetof(type, foo.bar) is (arguably) a GCCism

r63210 b456eab2ea77eda51c8d5c06c24d195a6a2932e1
variable.c: fix thread + fork errors in autoload

r63215 af72dcd91b70e94b0f8299ab0464dafe884d2695
thread_sync: redo r62934 to use fork_gen

Thanks.

#2 [ruby-core:87401] Updated by nbeyer@gmail.com (Nathan Beyer) 4 months ago

I've been unable to create a reduced example. I have not tried the suggested backports mentioned above, but if I get a chance, I will.

I did want to post that I worked around the issue by using Phusion Passengers smart spawning hooks: https://www.phusionpassenger.com/library/indepth/ruby/spawn_methods/#smart-spawning-hooks. In the bit of code where I start the sub-system, which in turn creates the threads that were at issue, the code now registers with the smart spawning hook and when the hook is invoked, it restarts the sub-system, which in turn restarts the threads. I'm not sure why the hooks are required to restart the threads, rather than just dynamically doing so when detecting the thread is dead, but it resolves the issue for now.

#3 [ruby-core:87641] Updated by jmgarnier (Jean-Michel Garnier) 4 months ago

I had the exact the same situation as Nathan.

Segfaults happen in https://github.com/codegram/futuroscope gem which initalizes a pool of threads at startup. When passenger smart spawning forks the process, it segfaults.

I tested the same code with ruby 2.6 preview and no segfaults.

#4 [ruby-core:87646] Updated by normalperson (Eric Wong) 4 months ago

Backport request for 2.5.2 is here, sorry for the breakage:
https://bugs.ruby-lang.org/issues/14852

Also available in: Atom PDF