https://bugs.ruby-lang.org/https://bugs.ruby-lang.org/favicon.ico?17113305112011-08-03T11:52:23ZRuby Issue Tracking SystemBackport193 - Backport #5130: Thread.pass sticks on OpenBSDhttps://bugs.ruby-lang.org/issues/5130?journal_id=200252011-08-03T11:52:23Zkosaki (Motohiro KOSAKI)kosaki.motohiro@gmail.com
<ul><li><strong>Category</strong> set to <i>core</i></li><li><strong>Status</strong> changed from <i>Open</i> to <i>Assigned</i></li><li><strong>Assignee</strong> set to <i>kosaki (Motohiro KOSAKI)</i></li></ul> Backport193 - Backport #5130: Thread.pass sticks on OpenBSDhttps://bugs.ruby-lang.org/issues/5130?journal_id=200272011-08-03T12:15:26Zjeremyevans0 (Jeremy Evans)merch-redmine@jeremyevans.net
<ul></ul><p>Here's gdb output from attaching to a process (OpenBSD-current, not 4.9), and using ruby instead of miniruby:</p>
<p>$ ruby19 -e 'Thread.new{Thread.pass}'<br>
$ gdb ruby19 17940<br>
(gdb) bt<br>
#0 0x0000000205b31da7 in pthread_cond_broadcast (cond=0x20b4bb040) at /usr/src/lib/libpthread/uthread/uthread_cond.c:655<br>
#1 0x0000000204bb3709 in native_cond_broadcast () from /usr/local/lib/libruby19.so.1.91<br>
#2 0x0000000204bb4424 in rb_thread_schedule_limits () from /usr/local/lib/libruby19.so.1.91<br>
#3 0x0000000204bb476b in rb_thread_schedule () from /usr/local/lib/libruby19.so.1.91<br>
#4 0x0000000204bb4a6e in rb_thread_terminate_all () from /usr/local/lib/libruby19.so.1.91<br>
#5 0x0000000204abc8da in ruby_cleanup () from /usr/local/lib/libruby19.so.1.91<br>
<a class="issue tracker-1 status-5 priority-4 priority-default closed behind-schedule" title="Bug: sprintf() of %f on Windows(MSVCRT) (Closed)" href="https://bugs.ruby-lang.org/issues/6">#6</a> 0x0000000204abc996 in ruby_run_node () from /usr/local/lib/libruby19.so.1.91<br>
#7 0x0000000000400c42 in main ()<br>
(gdb) info threads<br>
4 process 17940, thread 0x208e0f000 (RUNNING) 0x0000000205b31da7 in pthread_cond_broadcast (cond=0x20b4bb040) at /usr/src/lib/libpthread/uthread/uthread_cond.c:655<br>
3 process 17940, thread 0x2034be000 (SELECT_WAIT) _thread_kern_sched (scp=0x0) at /usr/src/lib/libpthread/uthread/uthread_kern.c:495<br>
2 process 17940, thread 0x206059000 (COND_WAIT) _thread_kern_sched (scp=0x0) at /usr/src/lib/libpthread/uthread/uthread_kern.c:495<br>
1 process 17940, thread 0x20a045000 (RUNNING) _thread_kern_sched (scp=0x0) at /usr/src/lib/libpthread/uthread/uthread_kern.c:495</p>
<p>I'll try to spend some time debugging this tomorrow.</p> Backport193 - Backport #5130: Thread.pass sticks on OpenBSDhttps://bugs.ruby-lang.org/issues/5130?journal_id=200352011-08-03T19:14:07Zkosaki (Motohiro KOSAKI)kosaki.motohiro@gmail.com
<ul><li><strong>Status</strong> changed from <i>Assigned</i> to <i>Feedback</i></li></ul> Backport193 - Backport #5130: Thread.pass sticks on OpenBSDhttps://bugs.ruby-lang.org/issues/5130?journal_id=200362011-08-03T19:23:28Zkosaki (Motohiro KOSAKI)kosaki.motohiro@gmail.com
<ul><li><strong>ruby -v</strong> changed from <i>ruby 1.9.4dev (2011-07-31 trunk 32788) [x86_64-openbsd4.9]</i> to <i>-</i></li></ul><blockquote>
<p>$ ruby19 -e 'Thread.new{Thread.pass}'<br>
$ gdb ruby19 17940<br>
(gdb) bt<br>
#0 x0000000205b31da7 in pthread_cond_broadcast (cond=0x20b4bb040) at /usr/src/lib/libpthread/uthread/uthread_cond.c:655</p>
</blockquote>
<p>I'm surprised this stack. pthread_cond_broadcast don't have to be hang<br>
up even <em>if</em> the<br>
argument is wrong. I suspect it's OpenBSD bug.<br>
Jeremy, Could you please handle this issue?</p>
<blockquote>
<p>#1 x0000000204bb3709 in native_cond_broadcast () from /usr/local/lib/libruby19.so.1.91<br>
#2 x0000000204bb4424 in rb_thread_schedule_limits () from /usr/local/lib/libruby19.so.1.91<br>
#3 x0000000204bb476b in rb_thread_schedule () from /usr/local/lib/libruby19.so.1.91<br>
#4 x0000000204bb4a6e in rb_thread_terminate_all () from /usr/local/lib/libruby19.so.1.91</p>
</blockquote> Backport193 - Backport #5130: Thread.pass sticks on OpenBSDhttps://bugs.ruby-lang.org/issues/5130?journal_id=200642011-08-04T05:30:26Zjeremyevans0 (Jeremy Evans)merch-redmine@jeremyevans.net
<ul><li><strong>File</strong> <a href="/attachments/1959">ruby193_thread_debug.log</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/1959/ruby193_thread_debug.log">ruby193_thread_debug.log</a> added</li></ul><p>I built ruby with -DTHREAD_DEBUG, and am attaching a partial log for ruby19 -e 'Thread.new{Thread.pass}'. It's partial because it goes into an infinite loop at the end. I've included the first two iterations of the loop. I'll be doing more research on this issue, but if you have any suggestions, please let me know.</p> Backport193 - Backport #5130: Thread.pass sticks on OpenBSDhttps://bugs.ruby-lang.org/issues/5130?journal_id=200652011-08-04T07:21:08Zjeremyevans0 (Jeremy Evans)merch-redmine@jeremyevans.net
<ul><li><strong>File</strong> <a href="/attachments/1960">thread.c.diff</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/1960/thread.c.diff">thread.c.diff</a> added</li></ul><p>Looking at the thread debug log I uploaded earlier, I guessed that this problem might be caused because a thread was starting after rb_thread_terminate_all was called. rb_thread_terminate_all sets vm->inhibit_thread_creation = 1, which prohibits new threads from being created. However, it doesn't prevent created-but-not-yet-started threads from starting in thread_start_func_2. My theory is that thread_start_func_2 should return without starting a thread if vm->inhibit_thread_creation = 1.</p>
<p>The attached patch is a crude form of what I think should happen. It fixes the Thread.new{Thread.pass} case and the related bootstrap test. It currently passes all make check tests except for one. The one test failure is:</p>
<ol start="46">
<li>Error:<br>
test_signal_requiring(TestSignal):<br>
EOFError: end of file reached<br>
/usr/obj/ports/ruby-1.9.3-preview1/ruby-1.9.3-preview1/test/ruby/test_signal.rb:218:in <code>load' /usr/obj/ports/ruby-1.9.3-preview1/ruby-1.9.3-preview1/test/ruby/test_signal.rb:218:in </code>block in test_signal_requiring'<br>
/usr/obj/ports/ruby-1.9.3-preview1/ruby-1.9.3-preview1/test/ruby/test_signal.rb:204:in <code>popen' /usr/obj/ports/ruby-1.9.3-preview1/ruby-1.9.3-preview1/test/ruby/test_signal.rb:204:in </code>test_signal_requiring'</li>
</ol>
<p>The current patch is probably missing something small in terms of thread cleanup, but as I'm not familiar with the thread code, I'm not sure what that might be.</p> Backport193 - Backport #5130: Thread.pass sticks on OpenBSDhttps://bugs.ruby-lang.org/issues/5130?journal_id=203702011-08-24T09:31:15Znormalperson (Eric Wong)normalperson@yhbt.net
<ul><li><strong>File</strong> <a href="/attachments/2015">thread_2.diff</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/2015/thread_2.diff">thread_2.diff</a> added</li></ul><p>Jeremy: I think your st_delete call is incorrect, can you try my updated patch?</p>
<p>I also rearranged rb_register_sigaltstack() to build with SIGALTSTACK.</p> Backport193 - Backport #5130: Thread.pass sticks on OpenBSDhttps://bugs.ruby-lang.org/issues/5130?journal_id=203752011-08-24T11:23:08Znormalperson (Eric Wong)normalperson@yhbt.net
<ul></ul><p>Eric Wong <a href="mailto:normalperson@yhbt.net" class="email">normalperson@yhbt.net</a> wrote:</p>
<blockquote>
<p>File thread_2.diff added</p>
</blockquote>
<p>I think both thread*.diff are racy since inhibit_thread_creation is<br>
checked outside of GVL and I'm not sure if the idea with these<br>
patches is correct.</p>
<p>Unfortunately I cannot reproduce this issue on Linux, but I would<br>
add debug prints around loops where native_cond_wait() is called<br>
in gvl_yield() and also before sched_yield().</p>
<p>I also remember sched_yield() doesn't work on some OSes, maybe try to<br>
change that to nanosleep()/select()?</p> Backport193 - Backport #5130: Thread.pass sticks on OpenBSDhttps://bugs.ruby-lang.org/issues/5130?journal_id=203762011-08-24T11:25:52Zjeremyevans0 (Jeremy Evans)merch-redmine@jeremyevans.net
<ul></ul><p>Eric Wong wrote:</p>
<blockquote>
<p>Jeremy: I think your st_delete call is incorrect, can you try my updated patch?</p>
<p>I also rearranged rb_register_sigaltstack() to build with SIGALTSTACK.</p>
</blockquote>
<p>No change after replacing my patch with yours, still getting the "end of file reached" EOFError on that signal test.</p> Backport193 - Backport #5130: Thread.pass sticks on OpenBSDhttps://bugs.ruby-lang.org/issues/5130?journal_id=218422011-11-03T09:27:44Zjeremyevans0 (Jeremy Evans)merch-redmine@jeremyevans.net
<ul></ul><p>It's been a couple of months since this issue was last updated, and about three months since this issue was last updated by a ruby committer. Can a ruby committer look at either Eric's patch or my patch and let us know if this is an appropriate fix? I can't think of a reason why you would want to start a thread if the interpreter is shutting down, so the patch makes sense to me. Eric is right that this doesn't fix the race condition, but it should at least reduce the probability of it happening.</p>
<p>Looking at the thread log I sent earlier, the thread that starts after rb_thread_terminate_all is called calls rb_thread_schedule_limits, but gvl_yield apparently never changes the thread pointer, so it always yields to itself.</p>
<p>My theory is that when rb_thread_terminate_all calls terminate_i on a thread that has been created but not started, the thread never gets the terminate signal, so it continues to run indefinitely. So I think the patch makes sense. Alternatively thread_start_func_2 could check th->thrown_errinfo or th->status instead of GET_VM()->inhibit_thread_creation. I haven't tried that, but it could be a more general fix.</p> Backport193 - Backport #5130: Thread.pass sticks on OpenBSDhttps://bugs.ruby-lang.org/issues/5130?journal_id=219572011-11-07T15:07:22Znaruse (Yui NARUSE)naruse@airemix.jp
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Assigned</i></li><li><strong>ruby -v</strong> changed from <i>-</i> to <i>ruby 2.0.0dev (2011-11-07 trunk 33648) [x86_64-openbsd5.0]</i></li></ul> Backport193 - Backport #5130: Thread.pass sticks on OpenBSDhttps://bugs.ruby-lang.org/issues/5130?journal_id=220522011-11-10T01:54:03Zkosaki (Motohiro KOSAKI)kosaki.motohiro@gmail.com
<ul></ul><p>First, we can't apply your nor Eric's patch because it's racy. Your patch care following scenario.</p>
<ol>
<li>thread-A create thread-B</li>
<li>thread-A call terminate_all</li>
<li>thread-B start</li>
</ol>
<p>but, it doesn't care following scenario.</p>
<ol>
<li>thread-A create thread-B</li>
<li>thread-B start<br>
(just after)</li>
<li>thread-A call terminate_all</li>
</ol>
<p>Moreover, we shouldn't need a special care about thread starting. it should works as well.<br>
I don't plan to apply your ugly bandaid patch.</p>
<p>Second, your analysis is not accurate. The complete opnebsd failed scenario is here.<br>
(OMG, I needed to install openbsd).</p>
<a name="thread-A-thread-B"></a>
<h2 >thread-A thread-B<a href="#thread-A-thread-B" class="wiki-anchor">¶</a></h2>
<p>-> gvl_yield<br>
vm->gvl.wait_yield = 1;<br>
-> sched_yield<br>
-> gvl_yield<br>
-> cond_wait(switch_wait_cond) # wait until waking up sched_yield()<br>
-> task-state change to COND_WAIT<br>
<- sched_yield<br>
vm->gvl.wait_yield = 0;<br>
-> pthread_cond_broadcast()<br>
-> task-state of thread-B change to RUNNING<br>
<- gvl_yield</p>
<p>-> gvl_yield<br>
vm->gvl.wait_yield = 1;<br>
-> sched_yield<br>
<- cond_wait(switch_wait_cond)</p>
<pre><code> *** Oh, wait_yield is 1. try to sleep again.
-> cond_wait(switch_wait_cond)
</code></pre>
<p>That's all.<br>
rb_thread_terminate_all() called rb_thread_schedule() repeatedly. and waking-up cond_wait(switch_wait_cond) failed<br>
repeatedly. OpenBSD's user land pthread library makes semi deterministic scheduling. and then, the test code can't<br>
bail out forever.</p>
<p>btw, sched_yield() works as expected on OpenBSD. then, I don't change sched_yield() call at least now.</p> Backport193 - Backport #5130: Thread.pass sticks on OpenBSDhttps://bugs.ruby-lang.org/issues/5130?journal_id=220532011-11-10T02:10:51Zkosaki (Motohiro KOSAKI)kosaki.motohiro@gmail.com
<ul><li><strong>Status</strong> changed from <i>Assigned</i> to <i>Closed</i></li><li><strong>% Done</strong> changed from <i>0</i> to <i>100</i></li></ul><p>This issue was solved with changeset r33693.<br>
Yui, thank you for reporting this issue.<br>
Your contribution to Ruby is greatly appreciated.<br>
May Ruby be with you.</p>
<hr>
<ul>
<li>thread_pthread.c (gvl_yield): don't prevent concurrent sched_yield().<br>
[Bug <a class="issue tracker-4 status-5 priority-4 priority-default closed" title="Backport: Thread.pass sticks on OpenBSD (Closed)" href="https://bugs.ruby-lang.org/issues/5130">#5130</a>] <a href="/issues/5130">[ruby-core:38647]</a></li>
</ul> Backport193 - Backport #5130: Thread.pass sticks on OpenBSDhttps://bugs.ruby-lang.org/issues/5130?journal_id=220542011-11-10T02:12:30Zkosaki (Motohiro KOSAKI)kosaki.motohiro@gmail.com
<ul><li><strong>Tracker</strong> changed from <i>Bug</i> to <i>Backport</i></li><li><strong>Project</strong> changed from <i>Ruby master</i> to <i>Backport193</i></li><li><strong>Category</strong> deleted (<del><i>core</i></del>)</li><li><strong>Status</strong> changed from <i>Closed</i> to <i>Assigned</i></li><li><strong>Assignee</strong> changed from <i>kosaki (Motohiro KOSAKI)</i> to <i>yugui (Yuki Sonoda)</i></li></ul><p>Can anyone please review this patch?</p> Backport193 - Backport #5130: Thread.pass sticks on OpenBSDhttps://bugs.ruby-lang.org/issues/5130?journal_id=220592011-11-10T02:31:37Zjeremyevans0 (Jeremy Evans)merch-redmine@jeremyevans.net
<ul></ul><p>Motohiro KOSAKI wrote:</p>
<blockquote>
<p>First, we can't apply your nor Eric's patch because it's racy. Your patch care following scenario.</p>
<ol>
<li>thread-A create thread-B</li>
<li>thread-A call terminate_all</li>
<li>thread-B start</li>
</ol>
<p>but, it doesn't care following scenario.</p>
<ol>
<li>thread-A create thread-B</li>
<li>thread-B start<br>
(just after)</li>
<li>thread-A call terminate_all</li>
</ol>
</blockquote>
<p>True. As my earlier message says, the patch only reduces the chance losing the race.</p>
<blockquote>
<p>Moreover, we shouldn't need a special care about thread starting. it should works as well.<br>
I don't plan to apply your ugly bandaid patch.</p>
</blockquote>
<p>Fair enough. I certainly agree that my patch is an ugly bandaid.</p>
<blockquote>
<p>Second, your analysis is not accurate. The complete opnebsd failed scenario is here.<br>
(OMG, I needed to install openbsd).</p>
<a name="thread-A-thread-B"></a>
<h2 >thread-A thread-B<a href="#thread-A-thread-B" class="wiki-anchor">¶</a></h2>
<p>-> gvl_yield<br>
vm->gvl.wait_yield = 1;<br>
-> sched_yield<br>
-> gvl_yield<br>
-> cond_wait(switch_wait_cond) # wait until waking up sched_yield()<br>
-> task-state change to COND_WAIT<br>
<- sched_yield<br>
vm->gvl.wait_yield = 0;<br>
-> pthread_cond_broadcast()<br>
-> task-state of thread-B change to RUNNING<br>
<- gvl_yield</p>
<p>-> gvl_yield<br>
vm->gvl.wait_yield = 1;<br>
-> sched_yield<br>
<- cond_wait(switch_wait_cond)</p>
<pre><code> *** Oh, wait_yield is 1. try to sleep again.
-> cond_wait(switch_wait_cond)
</code></pre>
<p>That's all.<br>
rb_thread_terminate_all() called rb_thread_schedule() repeatedly. and waking-up cond_wait(switch_wait_cond) failed<br>
repeatedly. OpenBSD's user land pthread library makes semi deterministic scheduling. and then, the test code can't<br>
bail out forever.</p>
</blockquote>
<p>So the underlying problem is that wait_yield is always set to 1 when thread 2 wakes up, which causes thread 2 to sleep again? Is that a bug in ruby or OpenBSD?</p>
<blockquote>
<p>btw, sched_yield() works as expected on OpenBSD. then, I don't change sched_yield() call at least now.</p>
</blockquote>
<p>That's good.</p>
<p>Thank you for taking the time to look into the issue in more detail. If I can be of any help, please let me know.</p> Backport193 - Backport #5130: Thread.pass sticks on OpenBSDhttps://bugs.ruby-lang.org/issues/5130?journal_id=220612011-11-10T04:22:55Zjeremyevans0 (Jeremy Evans)merch-redmine@jeremyevans.net
<ul></ul><p>Motohiro KOSAKI wrote:</p>
<blockquote>
<p>Can anyone please review this patch?</p>
</blockquote>
<p>I'm not qualified to determine correctness, but I can confirm that it fixes the threading bootstrap test failure on OpenBSD amd64 and i386.</p> Backport193 - Backport #5130: Thread.pass sticks on OpenBSDhttps://bugs.ruby-lang.org/issues/5130?journal_id=220642011-11-10T07:08:22Zkosaki (Motohiro KOSAKI)kosaki.motohiro@gmail.com
<ul></ul><blockquote>
<p>So the underlying problem is that wait_yield is always set to 1 when thread 2 wakes up, which causes thread 2 to sleep again? Is that a bug<br>
in ruby or OpenBSD?</p>
</blockquote>
<p>Difficult question. OpenBSD has a posix compliance pthread implementation. Ruby has reasonable OS assumption. But.... the result was pretty nasty unfortunate thing. Aghh.<br>
Anyway, I don't think OpenBSD need to fix anything.</p>
<p>Thank you.</p> Backport193 - Backport #5130: Thread.pass sticks on OpenBSDhttps://bugs.ruby-lang.org/issues/5130?journal_id=230422012-01-03T09:09:24Zkosaki (Motohiro KOSAKI)kosaki.motohiro@gmail.com
<ul><li><strong>Status</strong> changed from <i>Assigned</i> to <i>Closed</i></li></ul><p>This issue was solved with changeset r34179.<br>
Yui, thank you for reporting this issue.<br>
Your contribution to Ruby is greatly appreciated.<br>
May Ruby be with you.</p>
<hr>
<p>merge revision(s) 33693:</p>
<pre><code>* thread_pthread.c (gvl_yield): don't prevent concurrent sched_yield().
[Bug #5130] <a href="/issues/5130">[ruby-core:38647]</a>
</code></pre>