Bug #15383
closedReproducible crash: crash.rb:6: [BUG] unexpected THREAD_KILLED
Description
Hi,
I'm reporting a reliable crash of the ruby interpreter on contested mutexeses that are accessed in child processes.
I currently think that this happens as the child processes main thread, may be waiting for a parent process sibling thread that was holding the mutex at the time of the fork. After the fork is done, all sibling threads are dead, and the mutex detects the attempt to wait for a dead thread, bailing out.
This is simular, but not identical to the case here: https://bugs.ruby-lang.org/issues/14578
Here is a gist with some more test results on various platforms: https://gist.github.com/mbj/e6795ee5e0583c5541ee250e9942279a
I'm fine to get my hands dirty, but would need some pointers if my above conclusion points to the right direction.
Best,
Markus
Files
Updated by normalperson (Eric Wong) almost 6 years ago
- Backport changed from 2.4: UNKNOWN, 2.5: UNKNOWN to 2.4: UNKNOWN, 2.5: REQUIRED
Updated by normalperson (Eric Wong) almost 6 years ago
- Status changed from Open to Closed
Applied in changeset trunk|r66230.
thread_sync.c (mutex_ptr): handle mutexes held by parent threads in children
Mutexes may be held by threads which only exist in the parent
process, so their waitqueues may be populated with references
to other dead threads. We must reset them at fork.
I am a moron for introducing this bug :<
[ruby-core:90312] [Bug #15383]
Updated by normalperson (Eric Wong) almost 6 years ago
Thanks, it affects trunk; just more difficult to reproduce
because of thread cache.
I'm a moron for not noticing this when I fixed other bugs :<
r66230 should fix it in trunk and should be backported
(but r66229 is independently broken and I just reverted it for now)
Updated by mbjs (Markus Schirp) almost 6 years ago
- Subject changed from Reproducible crash: crash.sh:6: [BUG] unexpected THREAD_KILLED to Reproducible crash: crash.rb:6: [BUG] unexpected THREAD_KILLED
Updated by mbjs (Markus Schirp) almost 6 years ago
Thanks for the quick fix. Also for marking the fix to be backported.
Just curious, is there an associated CI build for these changes?
Updated by normalperson (Eric Wong) almost 6 years ago
mbj@schirp-dso.com wrote:
Just curious, is there an associated CI build for these changes?
I check https://rubyci.org/ and http://ci.rvm.jp/ (and get
automated mails from the latter).
There's also TravisCI; but I don't use JavaScript; so I rely
on others giving me URLs to the raw logs.
Updated by nagachika (Tomoyuki Chikanaga) almost 6 years ago
- Related to Bug #14634: Queue#push seems to crash after fork added
Updated by normalperson (Eric Wong) almost 6 years ago
r66230 should fix it in trunk and should be backported
No, actually. r66230 hides an existing problem in the fix
for https://bugs.ruby-lang.org/issues/14578
...
Still working on this and my head hurts :<