Bug #6634

Deadlock with join and ConditionVariable

Added by meh. I don't care almost 2 years ago. Updated 12 months ago.

[ruby-core:45798]
Status:Rejected
Priority:Normal
Assignee:Yusuke Endoh
Category:core
Target version:2.0.0
ruby -v:ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-linux] Backport:

Description

I'm getting a fatal deadlock in one of my gems, it's a simple threadpool implementation.

The library works both in Rubinius and JRuby, so I guess it's a bug.

The gem is here: https://github.com/meh/ruby-threadpool

The example that crashes is attached.

Basically it raises a fatal deadlock if you join a thread and then call ConditionVariable#wait, I'm not 100% sure if the bug is in the ConditionVariable or what, all I know is that it happens in that situation and that it works on Rubinius and JRuby.

lol.rb Magnifier (134 Bytes) meh. I don't care, 06/23/2012 11:49 PM

noname (500 Bytes) Anonymous, 06/24/2012 05:54 AM

reduced.rb Magnifier - Reduced testcase (170 Bytes) meh. I don't care, 06/26/2012 01:59 AM

lol2.rb Magnifier (189 Bytes) Yusuke Endoh, 11/06/2012 12:16 AM

thread_deadlock_error_test.rb Magnifier - Test code to show how this error occurs (1.47 KB) nhm tanveeer hossain khan, 04/30/2013 12:12 AM

History

#1 Updated by Anonymous almost 2 years ago

On Sat, Jun 23, 2012 at 11:49:14PM +0900, meh. (meh. I don't care) wrote:

Issue #6634 has been reported by meh. (meh. I don't care).


Bug #6634: Deadlock with join and ConditionVariable
https://bugs.ruby-lang.org/issues/6634

Author: meh. (meh. I don't care)
Status: Open
Priority: Normal
Assignee:
Category: core
Target version:
ruby -v: ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-linux]

I'm getting a fatal deadlock in one of my gems, it's a simple threadpool implementation.

The library works both in Rubinius and JRuby, so I guess it's a bug.

The gem is here: https://github.com/meh/ruby-threadpool

The example that crashes is attached.

Basically it raises a fatal deadlock if you join a thread and then call ConditionVariable#wait, I'm not 100% sure if the bug is in the ConditionVariable or what, all I know is that it happens in that situation and that it works on Rubinius and JRuby.

I can't seem to reproduce this error:

http://www.youtube.com/watch?v=8J_eBXZ7ud4

Can you reduce the error to a self contained example that reliably
fails?

--
Aaron Patterson
http://tenderlovemaking.com/

#2 Updated by meh. I don't care almost 2 years ago

Always happens, on Arch Linux x86_64.

ruby reduced.rb
reduced.rb:13:in join': deadlock detected (fatal)
from reduced.rb:13:in
'

#3 Updated by Motohiro KOSAKI almost 2 years ago

  • Status changed from Open to Assigned
  • Assignee set to Motohiro KOSAKI

#4 Updated by Motohiro KOSAKI almost 2 years ago

  • Status changed from Assigned to Feedback

thread = Thread.new {
mutex.synchronize {
cond.wait(mutex)
}
}
thread.join

This is true deadlock. The above thread.join has no chance to exit successfully.
Can you please elaborate your point?

#5 Updated by meh. I don't care almost 2 years ago

Then I can't come up with a reduced testcase, I know that it triggers a fatal deadlock in my gem when it's actually not a deadlock.

It works both in JRuby and Rubinius.

#6 Updated by Motohiro KOSAKI almost 2 years ago

Unfortunately, we don't have an esp capability. "The library works both in Rubinius and JRuby, so I guess it's a bug." don't gave me any hint. sorry.

#7 Updated by meh. I don't care almost 2 years ago

The library is just ~250 lines.

The issue is that it's thinking it's deadlocking when actually another thread is going to shutdown the threadpool (hence broadcasting on the cond and not being a deadlock).

#8 Updated by Motohiro KOSAKI over 1 year ago

  • Assignee deleted (Motohiro KOSAKI)

#9 Updated by Yusuke Endoh over 1 year ago

  • File lol2.rbMagnifier added
  • Status changed from Feedback to Assigned
  • Assignee set to Motohiro KOSAKI
  • Target version set to 2.0.0

I succeeded to reproduce the issue, by adding settracefunc to lol.rb, redirecting the output to the file, and repeating the invocation until the error occurs.
It looks very very timing sensitive issue.

$ gem install threadpool

$ ./ruby -v
ruby 2.0.0dev (2012-11-05 trunk 37474) [x86_64-linux]

$ ruby -e 'loop { system("./ruby lol2.rb > t") || break }'
/home/mame/work/local/lib/ruby/gems/2.0.0/gems/threadpool-0.1.2/lib/threadpool.rb:183:in `join': No live threads left. Deadlock?
        from /home/mame/work/local/lib/ruby/gems/2.0.0/gems/threadpool-0.1.2/lib/threadpool.rb:183:in `join'
        from lol.rb:9:in `<main>'

$ ruby -e 'loop { system("./ruby lol2.rb > t") || break }'
<internal:prelude>:8:in `lock': deadlock; recursive locking (ThreadError)
        from <internal:prelude>:8:in `synchronize'
        from /home/mame/work/local/lib/ruby/2.0.0/thread.rb:69:in `wait'
        from /home/mame/work/local/lib/ruby/gems/2.0.0/gems/threadpool-0.1.2/lib/threadpool.rb:234:in `block (3 levels) in spawn_thread'
        from <internal:prelude>:10:in `synchronize'
        from /home/mame/work/local/lib/ruby/gems/2.0.0/gems/threadpool-0.1.2/lib/threadpool.rb:222:in `block (2 levels) in spawn_thread'
        from /home/mame/work/local/lib/ruby/gems/2.0.0/gems/threadpool-0.1.2/lib/threadpool.rb:249:in `loop'
        from /home/mame/work/local/lib/ruby/gems/2.0.0/gems/threadpool-0.1.2/lib/threadpool.rb:249:in `block in spawn_thread'

I reviewed the source of threadpool gem, but I could find no problem.
Precisely, it may attempt to call undefined method named "reason"; it is clearly irrelevant.

Kosaki-san, could you try to reproduce? The core behavior looks to me indeed strange (too subtle to explain in English, sorry), but I failed to find the bug.

面倒なので日本語で。
再現性が乏しく (うちの環境で 100 回実行に 1 回くらい?) 、gdb を使いこなせないので printf debug で戦ってみたんですが、確かに core が怪しい挙動をしている気がしました。
CV 内の mutex を lock したはずなのになぜか threadpool 内の mutex が lock されているような、そうでないような。
大物のタイミングバグの予感がする (GC issue かも知れませんが) のですが、小崎さんの環境で再現できたら勝利だと思うので、試してみてもらえますでしょうか。

Yusuke Endoh mame@tsg.ne.jp

#10 Updated by Motohiro KOSAKI over 1 year ago

  • Assignee changed from Motohiro KOSAKI to Koichi Sasada

#11 Updated by Motohiro KOSAKI over 1 year ago

Hi mame-san,

ko1 found the second case (i.e. below) is a his regression since October. He told me he plan to fix soon.

$ ruby -e 'loop { system("./ruby lol2.rb > t") || break }'
internal:prelude:8:in lock': deadlock; recursive locking (ThreadError)
from <internal:prelude>:8:in
synchronize'
from /home/mame/work/local/lib/ruby/2.0.0/thread.rb:69:in `wait'

And I couldn't reproduce this issue at commit r37074 (Oct 3). So I think we haven't reproduce an original issue yet.

#12 Updated by Koichi Sasada over 1 year ago

  • Assignee changed from Koichi Sasada to Yusuke Endoh

Maybe this second problem is fixed at r37647.
mame-san, could you check it?

#13 Updated by Yusuke Endoh over 1 year ago

  • Status changed from Assigned to Feedback

Worked. Thank you!

Then, anyone can reproduce the original problem? Meh, can you still reproduce?

Yusuke Endoh mame@tsg.ne.jp

#14 Updated by Yusuke Endoh about 1 year ago

  • Status changed from Feedback to Rejected

Marking this as rejected due to lack of feedback by the submitter.

Yusuke Endoh mame@tsg.ne.jp

#15 Updated by nhm tanveeer hossain khan 12 months ago

Hi there,

I've faced similar problem with ruby 2.0.0p0 (2013-02-24 revision 39474) x86_64-darwin12.1.0

Please checkout my attached code. Let me know if I could help you more. Or if i'm doing something dumb :)

Also available in: Atom PDF