Project

General

Profile

Actions

Bug #6634

closed

Deadlock with join and ConditionVariable

Added by meh. (meh. I don't care) over 12 years ago. Updated over 10 years ago.

Status:
Rejected
Target version:
ruby -v:
ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-linux]
Backport:
[ruby-core:45798]

Description

I'm getting a fatal deadlock in one of my gems, it's a simple threadpool implementation.

The library works both in Rubinius and JRuby, so I guess it's a bug.

The gem is here: https://github.com/meh/ruby-threadpool

The example that crashes is attached.

Basically it raises a fatal deadlock if you join a thread and then call ConditionVariable#wait, I'm not 100% sure if the bug is in the ConditionVariable or what, all I know is that it happens in that situation and that it works on Rubinius and JRuby.


Files

lol.rb (134 Bytes) lol.rb meh. (meh. I don't care), 06/23/2012 11:49 PM
noname (500 Bytes) noname Anonymous, 06/24/2012 05:54 AM
reduced.rb (170 Bytes) reduced.rb Reduced testcase meh. (meh. I don't care), 06/26/2012 01:59 AM
lol2.rb (189 Bytes) lol2.rb mame (Yusuke Endoh), 11/06/2012 12:16 AM
thread_deadlock_error_test.rb (1.47 KB) thread_deadlock_error_test.rb Test code to show how this error occurs we4tech (nhm tanveeer hossain khan), 04/30/2013 12:12 AM
t.rb (996 Bytes) t.rb nikkoara (L Nicoara), 05/01/2014 09:51 PM

Updated by Anonymous over 12 years ago

On Sat, Jun 23, 2012 at 11:49:14PM +0900, meh. (meh. I don't care) wrote:

Issue #6634 has been reported by meh. (meh. I don't care).


Bug #6634: Deadlock with join and ConditionVariable
https://bugs.ruby-lang.org/issues/6634

Author: meh. (meh. I don't care)
Status: Open
Priority: Normal
Assignee:
Category: core
Target version:
ruby -v: ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-linux]

I'm getting a fatal deadlock in one of my gems, it's a simple threadpool implementation.

The library works both in Rubinius and JRuby, so I guess it's a bug.

The gem is here: https://github.com/meh/ruby-threadpool

The example that crashes is attached.

Basically it raises a fatal deadlock if you join a thread and then call ConditionVariable#wait, I'm not 100% sure if the bug is in the ConditionVariable or what, all I know is that it happens in that situation and that it works on Rubinius and JRuby.

I can't seem to reproduce this error:

http://www.youtube.com/watch?v=8J_eBXZ7ud4

Can you reduce the error to a self contained example that reliably
fails?

--
Aaron Patterson
http://tenderlovemaking.com/

Updated by meh. (meh. I don't care) over 12 years ago

Always happens, on Arch Linux x86_64.

ruby reduced.rb
reduced.rb:13:in join': deadlock detected (fatal) from reduced.rb:13:in '

Updated by kosaki (Motohiro KOSAKI) over 12 years ago

  • Status changed from Open to Assigned
  • Assignee set to kosaki (Motohiro KOSAKI)

Updated by kosaki (Motohiro KOSAKI) over 12 years ago

  • Status changed from Assigned to Feedback

thread = Thread.new {
mutex.synchronize {
cond.wait(mutex)
}
}
thread.join

This is true deadlock. The above thread.join has no chance to exit successfully.
Can you please elaborate your point?

Updated by meh. (meh. I don't care) over 12 years ago

Then I can't come up with a reduced testcase, I know that it triggers a fatal deadlock in my gem when it's actually not a deadlock.

It works both in JRuby and Rubinius.

Updated by kosaki (Motohiro KOSAKI) over 12 years ago

Unfortunately, we don't have an esp capability. "The library works both in Rubinius and JRuby, so I guess it's a bug." don't gave me any hint. sorry.

Updated by meh. (meh. I don't care) over 12 years ago

The library is just ~250 lines.

The issue is that it's thinking it's deadlocking when actually another thread is going to shutdown the threadpool (hence broadcasting on the cond and not being a deadlock).

Updated by kosaki (Motohiro KOSAKI) about 12 years ago

  • Assignee deleted (kosaki (Motohiro KOSAKI))

Updated by mame (Yusuke Endoh) about 12 years ago

  • File lol2.rb lol2.rb added
  • Status changed from Feedback to Assigned
  • Assignee set to kosaki (Motohiro KOSAKI)
  • Target version set to 2.0.0

I succeeded to reproduce the issue, by adding set_trace_func to lol.rb, redirecting the output to the file, and repeating the invocation until the error occurs.
It looks very very timing sensitive issue.

$ gem install threadpool

$ ./ruby -v
ruby 2.0.0dev (2012-11-05 trunk 37474) [x86_64-linux]

$ ruby -e 'loop { system("./ruby lol2.rb > t") || break }'
/home/mame/work/local/lib/ruby/gems/2.0.0/gems/threadpool-0.1.2/lib/threadpool.rb:183:in `join': No live threads left. Deadlock?
        from /home/mame/work/local/lib/ruby/gems/2.0.0/gems/threadpool-0.1.2/lib/threadpool.rb:183:in `join'
        from lol.rb:9:in `<main>'

$ ruby -e 'loop { system("./ruby lol2.rb > t") || break }'
<internal:prelude>:8:in `lock': deadlock; recursive locking (ThreadError)
        from <internal:prelude>:8:in `synchronize'
        from /home/mame/work/local/lib/ruby/2.0.0/thread.rb:69:in `wait'
        from /home/mame/work/local/lib/ruby/gems/2.0.0/gems/threadpool-0.1.2/lib/threadpool.rb:234:in `block (3 levels) in spawn_thread'
        from <internal:prelude>:10:in `synchronize'
        from /home/mame/work/local/lib/ruby/gems/2.0.0/gems/threadpool-0.1.2/lib/threadpool.rb:222:in `block (2 levels) in spawn_thread'
        from /home/mame/work/local/lib/ruby/gems/2.0.0/gems/threadpool-0.1.2/lib/threadpool.rb:249:in `loop'
        from /home/mame/work/local/lib/ruby/gems/2.0.0/gems/threadpool-0.1.2/lib/threadpool.rb:249:in `block in spawn_thread'

I reviewed the source of threadpool gem, but I could find no problem.
Precisely, it may attempt to call undefined method named "reason"; it is clearly irrelevant.

Kosaki-san, could you try to reproduce? The core behavior looks to me indeed strange (too subtle to explain in English, sorry), but I failed to find the bug.

面倒なので日本語で。
再現性が乏しく (うちの環境で 100 回実行に 1 回くらい?) 、gdb を使いこなせないので printf debug で戦ってみたんですが、確かに core が怪しい挙動をしている気がしました。
CV 内の mutex を lock したはずなのになぜか threadpool 内の mutex が lock されているような、そうでないような。
大物のタイミングバグの予感がする (GC issue かも知れませんが) のですが、小崎さんの環境で再現できたら勝利だと思うので、試してみてもらえますでしょうか。

--
Yusuke Endoh

Updated by kosaki (Motohiro KOSAKI) about 12 years ago

  • Assignee changed from kosaki (Motohiro KOSAKI) to ko1 (Koichi Sasada)

Updated by kosaki (Motohiro KOSAKI) about 12 years ago

Hi mame-san,

ko1 found the second case (i.e. below) is a his regression since October. He told me he plan to fix soon.

$ ruby -e 'loop { system("./ruby lol2.rb > t") || break }'
internal:prelude:8:in lock': deadlock; recursive locking (ThreadError) from <internal:prelude>:8:in synchronize'
from /home/mame/work/local/lib/ruby/2.0.0/thread.rb:69:in `wait'

And I couldn't reproduce this issue at commit r37074 (Oct 3). So I think we haven't reproduce an original issue yet.

Updated by ko1 (Koichi Sasada) about 12 years ago

  • Assignee changed from ko1 (Koichi Sasada) to mame (Yusuke Endoh)

Maybe this second problem is fixed at r37647.
mame-san, could you check it?

Updated by mame (Yusuke Endoh) about 12 years ago

  • Status changed from Assigned to Feedback

Worked. Thank you!

Then, anyone can reproduce the original problem? Meh, can you still reproduce?

--
Yusuke Endoh

Updated by mame (Yusuke Endoh) almost 12 years ago

  • Status changed from Feedback to Rejected

Marking this as rejected due to lack of feedback by the submitter.

--
Yusuke Endoh

Updated by we4tech (nhm tanveeer hossain khan) over 11 years ago

Hi there,

I've faced similar problem with ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin12.1.0] (installed with rvm)

Please checkout my attached code. Let me know if I could help you more. Or if i'm doing something dumb :)

Updated by nikkoara (L Nicoara) over 10 years ago

nhm tanveeer hossain khan wrote:

Hi there,

I've faced similar problem with ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin12.1.0] (installed with rvm)

Hey, I have the same problem. I took the test case you posted, reduced it further, and fiddled with the numbers of threads, etc. See attached. It crashed reliably for me, always right after launching it.

If we are using Ruby threads the wrong way, please let us know. If not, could you please take another look at this issue and possibly reactivate it?

Thanks.

Updated by nikkoara (L Nicoara) over 10 years ago

L Nicoara wrote:

nhm tanveeer hossain khan wrote:

Hi there,

I've faced similar problem with ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin12.1.0] (installed with rvm)

Hey, I have the same problem. I took the test case you posted, reduced it further, and fiddled with the numbers of threads, etc. See attached. It crashed reliably for me, always right after launching it.

For the record, the test case is malformed. Bummer. I think the one I based it on (from khan) is malformed as well. My apologies if you spent time on it.

Updated by kosaki (Motohiro KOSAKI) over 10 years ago

On Sat, May 3, 2014 at 8:45 AM, wrote:

Issue #6634 has been updated by L Nicoara.

L Nicoara wrote:

nhm tanveeer hossain khan wrote:

Hi there,

I've faced similar problem with ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin12.1.0] (installed with rvm)

Hey, I have the same problem. I took the test case you posted, reduced it further, and fiddled with the numbers of threads, etc. See attached. It crashed reliably for me, always right after launching it.

For the record, the test case is malformed. Bummer. I think the one I based it on (from khan) is malformed as well. My apologies if you spent time on it.

NP :)

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0