Bug #4266

Timeouts in threads cause "ThreadError: deadlock; recursive locking"

Added by Christopher Bottaro over 3 years ago. Updated over 1 year ago.

[ruby-core:34362]
Status:Closed
Priority:High
Assignee:Motohiro KOSAKI
Category:core
Target version:2.0.0
ruby -v:ruby 1.9.2p136 (2010-12-25 revision 30365) [x86_64-darwin10.6.0] Backport:

Description

=begin
Run the attached file (or this pastie http://pastie.org/1448542) a few times and you'll eventually get:

ThreadError: deadlock; recursive locking:
internal:prelude:8:in lock'
<internal:prelude>:8:in
synchronize'
bin/deadlocktest.rb:86:in block (4 levels) in <main>'
/Users/cjbottaro/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/timeout.rb:57:in
timeout'
bin/deadlock
test.rb:85:in block (3 levels) in <main>'
bin/deadlock_test.rb:83:in
times'
bin/deadlock_test.rb:83:in `block (2 levels) in '

I've had the script run successfully over 5 times in a row before getting the errors, so if it doesn't happen the first few times... keep trying.

The problem doesn't happen in 1.8.7 or Jruby, but does happen in 1.9.1.
=end

deadlock_test.rb Magnifier (554 Bytes) Christopher Bottaro, 01/12/2011 01:59 AM

diffdiff - Add Mutex.own? method (1.38 KB) Motohiro KOSAKI, 01/12/2011 02:45 PM

mutex-synchronize-use-c-implementation.patch Magnifier (1.28 KB) Motohiro KOSAKI, 01/12/2011 04:45 PM

monitor-synchronize-use-c-implementation.patch Magnifier - Fix for MonitorMixin/monitor synchronize using similar approach as mutex synchronize patch (4.6 KB) Brian Gugliemetti, 05/05/2011 09:01 AM


Related issues

Related to ruby-trunk - Bug #4285: Ruby don't have asynchrounous exception safe syntax and I... Closed 01/17/2011
Related to ruby-trunk - Bug #4289: Timeouts in threads cause SEGV Closed 01/18/2011

Associated revisions

Revision 37724
Added by Motohiro KOSAKI over 1 year ago

  • prelude.rb: Moved Mutex#synchronize to
  • thread.c (rbmutexsynchronize_m): here. [Bug #4266]

History

#1 Updated by Charles Nutter over 3 years ago

=begin
FYI, JRuby does not use timeout.rb anymore (as of JRuby 1.3ish I think). We use a native implementation based on Java's timed, thread-pooling executor in java.util.concurrent. The original timeout has many race conditions that are very difficult to fix.
=end

#2 Updated by Charles Nutter over 3 years ago

=begin
A question was raised about whether #to should understand context, as in knowing a top-level object being coerced to YAML should include the --- header. The answer seems to come from Marshal.

Marshal.dump(object) knows how to produce the marshal header, linking, and so on. Marshal is the master of this format. But Marshal defers to the objects themselves for actual content to go into that marshaled output, calling dump or marshaldump on each object in turn. With the #to protocol, Marshal.dump could be implemented as:

def Marshal.dump(obj)
emit_header
obj.to(Marshal)
end

In the same way that marshal_dump is used today.

Another concern raised is whether #to should be expected on all objects all the time, rather than just always calling MyClass.coerce(obj). The answer is simple: you want individual source types to control how they coerce to a target type, rather than expecting the target type to know about all possible sources. The default protocol where #to calls type.coerce would simply be a default behavior.
=end

#3 Updated by Charles Nutter over 3 years ago

=begin
Sorry about that, wrong bug. Too many bug windows open.
=end

#4 Updated by Charles Nutter over 3 years ago

=begin
Sorry about that, wrong bug. Too many bug windows open.
=end

#5 Updated by Motohiro KOSAKI over 3 years ago

=begin
This test program also makes ruby crash on my envronment (trunk + linux x86_64).

Related [Bug #3880]?

internal:prelude:8: [BUG] Segmentation fault
ruby 1.9.3dev (2011-01-09 trunk 30500) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0011 p:---- s:0036 b:0036 l:000035 d:000035 CFUNC :exception
c:0010 p:---- s:0034 b:0034 l:000033 d:000033 CFUNC :lock
c:0009 p:0011 s:0031 b:0031 l:000030 d:000030 METHOD internal:prelude:8
c:0008 p:0011 s:0028 b:0028 l:000bb8 d:000027 BLOCK deadlocktest.rb:21
c:0007 p:0109 s:0026 b:0026 l:001ad0 d:001ad0 METHOD /usr/local/lib/ruby/1.9.1/timeout.rb:57
c:0006 p:0019 s:0014 b:0014 l:000bb8 d:002580 BLOCK deadlock
test.rb:20
c:0005 p:---- s:0011 b:0011 l:000010 d:000010 FINISH
c:0004 p:---- s:0009 b:0009 l:000008 d:000008 CFUNC :times
c:0003 p:0010 s:0006 b:0006 l:000bb8 d:0012c8 BLOCK deadlock_test.rb:18
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:---- s:0002 b:0002 l:000001 d:000001 TOP

-- Ruby level backtrace information ----------------------------------------
deadlocktest.rb:18:in block (2 levels) in <main>'
deadlock_test.rb:18:in
times'
deadlock
test.rb:20:in block (3 levels) in <main>'
/usr/local/lib/ruby/1.9.1/timeout.rb:57:in
timeout'
deadlock_test.rb:21:in block (4 levels) in <main>'
<internal:prelude>:8:in
synchronize'
internal:prelude:8:in lock'
<internal:prelude>:8:in
exception'

-- C level backtrace information -------------------------------------------
./ruby() [0x51ff25] vmdump.c:797
./ruby() [0x564296] error.c:249
./ruby(rb
bug+0xb1) [0x564431] error.c:266
./ruby() [0x4aec70] signal.c:624
/lib64/libpthread.so.0() [0x382140f440]
./ruby(stlookup+0xe) [0x4b63ce] st.c:326
./ruby() [0x50d3af] vm
method.c:402
./ruby(rbfuncall2+0x2e) [0x51ccee] vmeval.c:227
./ruby(rbclassnewinstance+0x30) [0x44f490] object.c:1570
./ruby() [0x516e33] vm
eval.c:79
./ruby() [0x517304] vmeval.c:290
./ruby() [0x417691] eval.c:564
./ruby(rb
excraise+0x26) [0x417886] eval.c:473
./ruby() [0x52487a] thread.c:1301
./ruby(rb
mutexlock+0x44a) [0x527cfa] thread.c:3266
./ruby() [0x50f0b7] vm
insnhelper.c:403
./ruby() [0x510a36] insns.def:1010
./ruby() [0x5157eb] vm.c:1150
./ruby(rbyield+0x66) [0x51e116] vm.c:591
./ruby() [0x4469d1] numeric.c:3217
./ruby() [0x50f0b7] vm
insnhelper.c:403
./ruby() [0x510a36] insns.def:1010
./ruby() [0x5157eb] vm.c:1150
./ruby() [0x5186df] vm.c:607
./ruby() [0x5272d1] thread.c:453
./ruby() [0x5273e0] thread_pthread.c:522
/lib64/libpthread.so.0() [0x3821407761]
/lib64/libc.so.6(clone+0x6d) [0x3820ce151d]

=end

#6 Updated by Motohiro KOSAKI over 3 years ago

=begin
Hm, the crash issue seems different and unrelated one from this. becuase ruby19_2 can reproduce
recursive deadlock issue, but can't reproduce crash.

=end

#7 Updated by Motohiro KOSAKI over 3 years ago

  • File diffdiff added
  • Priority changed from Normal to High

=begin
Old days, C# had similar issue. and they changed Monitor.enter(= our Mutex.lock) semantics.

http://www.bluebytesoftware.com/blog/2007/01/30/MonitorEnterThreadAbortsAndOrphanedLocks.aspx

So, Can we take similar way? The attached patch kill a race in mutex.synchronize. At least,
my ruby19_2 branch don't reproduce the issue anymore.

Thanks.
=end

#8 Updated by Motohiro KOSAKI over 3 years ago

  • Assignee set to Koichi Sasada

=begin
And, I think asynchronous exception should be blocked and delayed while running ensure block.
Otherwise Mutex.unlock can be bypassed.

ko1, What do you think?
=end

#9 Updated by Motohiro KOSAKI over 3 years ago

=begin
That said, our Mutex.synchronize() method has two unsafe point.


def synchronize
self.lock
// (1)
begin
yield
ensure
// (2)
self.unlock rescue nil
end
end


If the mutex got thread#raise at (1), it doesn't call unlock because
it haven't entered -begin- block. And if got at (2), it also doesn't call
unlock because we are no longer in -begin- block.

My patch fixed only (1).

Thanks.
=end

#10 Updated by Motohiro KOSAKI over 3 years ago

=begin
Alternative one is here. This one fixes both (1) and (2). But it only fixes Mutex#synchronize. That said, timeout{} still might not call ruby level ensure block.

=end

#11 Updated by Motohiro KOSAKI about 3 years ago

=begin
Hi,

I plan to commit mutex-synchronize-use-c-implementation.patch at this weekend. So if anyone have objection, please let me know soon.

=end

#12 Updated by Shyouhei Urabe about 3 years ago

  • Status changed from Open to Assigned

#13 Updated by Eric Wong about 3 years ago

=begin
I have a short-term fix for the related issue in Bug #4289
http://redmine.ruby-lang.org/attachments/1572/0001-timeout.rb-avoid-introducing-new-class-for-every-tim.patch

It works both with and without the mutex-synchronize-use-c-implementation.patch
posted here.

=end

#14 Updated by Brian Gugliemetti almost 3 years ago

=begin
This issue also applies to MonitorMixin (monitor.rb) as it uses the same lock/begin yield/ensure block.
=end

#16 Updated by Hiroshi Nakamura almost 3 years ago

  • Target version changed from 1.9.2 to 1.9.3

#17 Updated by Koichi Sasada almost 3 years ago

  • Target version changed from 1.9.3 to 2.0.0

Let's talk it after 1.9.3 release.

#18 Updated by Jon Tara over 2 years ago

Brian Gugliemetti wrote:

File monitor-synchronize-use-c-implementation.patch added

There is a parameter mis-match between the declaration and definition of rbmonsynchronize. Compilation fails on at least OSX.

#19 Updated by Ming Vonsalis almost 2 years ago

jtara (Jon Tara) wrote:

Brian Gugliemetti wrote:

File monitor-synchronize-use-c-implementation.patch added

There is a parameter mis-match between the declaration and definition of rbmonsynchronize. Compilation fails on at least OSX.

http://originsofinuitart.posterous.com/

#20 Updated by Christian Parpart over 1 year ago

We just ran into this very ugly bug in production, too. We would really appreciate a fix in 1.9.4, a pure bugfix release of the 1.9 branch since this is a major bug. :)

#21 Updated by Koichi Sasada over 1 year ago

  • Assignee changed from Koichi Sasada to Motohiro KOSAKI

I passed this ticket to thread specialist.

#22 Updated by Motohiro KOSAKI over 1 year ago

Brian,

Can you please provide us a reproducer for you monitor.rb issue? And can you please make a new ticket for your issue to prevent cross line discussion? I plan to close this ticket soon and I also hope to don't forget your issue.

Thanks.

#23 Updated by Motohiro KOSAKI over 1 year ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r37724.
Christopher, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • prelude.rb: Moved Mutex#synchronize to
  • thread.c (rbmutexsynchronize_m): here. [Bug #4266]

Also available in: Atom PDF