Project

General

Profile

Bug #14432

Ruby crashes with "[BUG] pthread_mutex_destroy: Device or resource busy (EBUSY)"

Added by gfx (Goro FUJI) 10 months ago. Updated 8 months ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.4.2p198 (2017-09-14 revision 59899) [x86_64-linux]
[ruby-core:85337]

Description

We notice that sidekiq (sidekiq pro) workers crash by rb_bug_errno().

The very line that the issue raised is here: https://github.com/ruby/ruby/blob/v2_4_2/thread_pthread.c#L260

static void
native_mutex_destroy(pthread_mutex_t *lock)
{
    int r = pthread_mutex_destroy(lock);
    mutex_debug("destroy", lock);
    if (r != 0) {
    rb_bug_errno("pthread_mutex_destroy", r); // HERE!
    }
}

log/sidekiq.log:

[BUG] pthread_mutex_destroy: Device or resource busy (EBUSY)
ruby 2.4.2p198 (2017-09-14 revision 59899) [x86_64-linux]
-- Control frame information -----------------------------------------------
-- C level backtrace information -------------------------------------------
/usr/local/rbenv/versions/2.4.2/bin/ruby(rb_vm_bugreport+0xcf4) [0x558eb7577584] vm_dump.c:684
/usr/local/rbenv/versions/2.4.2/bin/ruby(rb_bug+0xd0) [0x558eb756bcf0] error.c:491
/usr/local/rbenv/versions/2.4.2/bin/ruby(rb_bug_errno+0x3a) [0x558eb756be9a] error.c:520
/usr/local/rbenv/versions/2.4.2/bin/ruby(thread_start_func_2+0x98d) [0x558eb74a0e9d] thread_pthread.c:260
/usr/local/rbenv/versions/2.4.2/bin/ruby(thread_start_func_1+0xd0) [0x558eb74a0fd0] thread_pthread.c:887
/lib/x86_64-linux-gnu/libpthread.so.0(start_thread+0xca) [0x7f9c1bf536ba]
/lib/x86_64-linux-gnu/libc.so.6(__clone+0x6d) [0x7f9c1b54441d]
-- Other runtime information -----------------------------------------------
* Loaded script: sidekiq 5.0.5 kibela [1 of 5 busy]
* Loaded features:
    0 enumerator.so
    1 thread.rb
    2 rational.so
    3 complex.so
    4 /usr/local/rbenv/versions/2.4.2/lib/ruby/2.4.0/x86_64-linux/enc/encdb.so
    5 /usr/local/rbenv/versions/2.4.2/lib/ruby/2.4.0/x86_64-linux/enc/trans/transdb.so
    6 /usr/local/rbenv/versions/2.4.2/lib/ruby/2.4.0/unicode_normalize.rb
    7 /usr/local/rbenv/versions/2.4.2/lib/ruby/2.4.0/x86_64-linux/rbconfig.rb
    8 /usr/local/rbenv/versions/2.4.2/lib/ruby/2.4.0/rubygems/compatibility.rb
    9 /usr/local/rbenv/versions/2.4.2/lib/ruby/2.4.0/rubygems/defaults.rb
    ... omit about 5000 lines

Unfortunately, it occurs only in the production environment, so the condition is not clear. We also issued it to the Sidekiq Pro support team, but it seems a bug in cruby.

versions:

  • ruby 2.4.2
  • rails 5.1.4
  • sidekiq 5.0.5
  • sidekiq-pro 3.7.0

sidekiq's concurrency: 1 (see https://github.com/mperham/sidekiq/wiki/Advanced-Options for details)

History

#1 Updated by gfx (Goro FUJI) 10 months ago

  • Description updated (diff)

#2 Updated by gfx (Goro FUJI) 10 months ago

  • Description updated (diff)

#3 [ruby-core:85339] Updated by normalperson (Eric Wong) 10 months ago

gfuji@cpan.org wrote:

The very line that the issue raised is here: https://github.com/ruby/ruby/blob/v2_4_2/thread_pthread.c#L260

OK, that means a pthread_mutex is held while destroy is
happening. Unfortunately we can't look at the Sidekiq-pro
source code...

If there's many Mutex objects being destroyed and you can try
Ruby 2.5, that might fix the problem because I reimplemented the
Ruby Mutex class to not rely on native mutexes at all.

Otherwise, there's th->interrupt_lock which might get destroyed
when the thread dies.

Can you get a core dump and see if it's th->interrupt_lock or
something else being destroyed while locked?

Do you get this during normal running or during shutdown?

Are there a lot of signals hitting the process when this happens?

/usr/local/rbenv/versions/2.4.2/bin/ruby(thread_start_func_2+0x98d) [0x558eb74a0e9d] thread_pthread.c:260
/usr/local/rbenv/versions/2.4.2/bin/ruby(thread_start_func_1+0xd0) [0x558eb74a0fd0] thread_pthread.c:887
/lib/x86_64-linux-gnu/libpthread.so.0(start_thread+0xca) [0x7f9c1bf536ba]

Not enough info there about what's getting destroyed, unfortunately...

#4 [ruby-core:86375] Updated by gfx (Goro FUJI) 8 months ago

This issue is no longer reproduced after upgrading Ruby to v2.5.0, as normalperson predicted. Thanks.

Also available in: Atom PDF