https://bugs.ruby-lang.org/https://bugs.ruby-lang.org/favicon.ico?17113305112011-09-10T09:34:20ZRuby Issue Tracking SystemRuby master - Bug #5306: Application Hangs Due to Recent rb_thread_select Changeshttps://bugs.ruby-lang.org/issues/5306?journal_id=207092011-09-10T09:34:20Zkosaki (Motohiro KOSAKI)kosaki.motohiro@gmail.com
<ul></ul><p>I think it's duplicated with 5299.</p> Ruby master - Bug #5306: Application Hangs Due to Recent rb_thread_select Changeshttps://bugs.ruby-lang.org/issues/5306?journal_id=207112011-09-10T11:28:36Zcfis (Charlie Savage)
<ul></ul><p>Unfortunately it is not. That was the first problem - and resulted in segmentation faults. We manually backported the fix for <a class="issue tracker-4 status-5 priority-4 priority-default closed" title="Backport: Segmentation fault when using TweetStream gem in ruby 1.9.3 (Closed)" href="https://bugs.ruby-lang.org/issues/5299">#5299</a> to our local copy of ruby 193. Once we did that, it fixed the segmentation faults, but resulted in this problem.</p>
<p>So this is a new problem with that particular commit.</p> Ruby master - Bug #5306: Application Hangs Due to Recent rb_thread_select Changeshttps://bugs.ruby-lang.org/issues/5306?journal_id=207122011-09-10T12:53:06Znormalperson (Eric Wong)normalperson@yhbt.net
<ul></ul><p>Charlie Savage <a href="mailto:cfis@savagexi.com" class="email">cfis@savagexi.com</a> wrote:</p>
<blockquote>
<p>It causes eventmachine to hang on CentOS 5.5. Not sure what the issue<br>
is, but its easily reproduced by by running the test<br>
eventmachine/tests/test_epoll.rb.</p>
</blockquote>
<p>I have CentOS 5.4, x86_64, kernel 2.6.18-164.11.1.el5</p>
<p>rake compile<br>
ruby -I .:lib:tests/ tests/test_epoll.rb</p>
<p>Works for me on an unpacked eventmachine-1.0.0.beta.3 tree with<br>
ruby_1_9_3 branch. However, only 2 tests appeared enabled.</p>
<blockquote>
<p>We noticed this because it also causes the tweetstream gem to hang.</p>
<p>The same setup works on Fedora 14 and an up-to-date arch linux.<br>
Specific version information included below.</p>
</blockquote>
<p>Yes, unable to reproduce on a more modern Debian testing machine<br>
(x86_64)</p>
<blockquote>
<p>Linux app1.zerista.com 2.6.18-238.19.1.el5.centos.plus #1 SMP Mon Jul<br>
18 10:05:09 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux</p>
</blockquote>
<p>I'll try to find a machine closer to the above.</p> Ruby master - Bug #5306: Application Hangs Due to Recent rb_thread_select Changeshttps://bugs.ruby-lang.org/issues/5306?journal_id=207132011-09-10T12:59:31Zcfis (Charlie Savage)
<ul></ul><p>Hi Eric,</p>
<blockquote>
<blockquote>
<p>It causes eventmachine to hang on CentOS 5.5.</p>
</blockquote>
</blockquote>
<p>Sorry, these machines are actually CentOS 5.6. The latest patches were applied via yum update about a week ago, so its pretty up-to-date.</p>
<blockquote>
<p>I have CentOS 5.4, x86_64, kernel 2.6.18-164.11.1.el5</p>
<p>rake compile<br>
ruby -I .:lib:tests/ tests/test_epoll.rb</p>
<p>Works for me on an unpacked eventmachine-1.0.0.beta.3 tree with<br>
ruby_1_9_3 branch. However, only 2 tests appeared enabled.</p>
</blockquote>
<p>So what we see is this test hanging:</p>
<p>def test_datagrams<br>
$in = $out = ""<br>
EM.run {<br>
EM.open_datagram_socket "127.0.0.1", @port, TestDatagramServer<br>
EM.open_datagram_socket "127.0.0.1", 0, TestDatagramClient, @port<br>
}<br>
assert_equal( "1234567890", $in )<br>
assert_equal( "abcdefghij", $out )<br>
end</p>
<p>It hangs on the first EM.open_datagram_socket call.</p>
<p>Here is another one, this time from test_pure_ruby.rb (which in fact seems misnamed, it is using the C code):</p>
<p>def test_connrefused<br>
assert_nothing_raised do<br>
EM.run {<br>
setup_timeout(2)<br>
EM.connect "127.0.0.1", @port, TestConnrefused<br>
}<br>
end</p>
<p>In this one, its the EM connect call that hangs.</p>
<blockquote>
<p>I'll try to find a machine closer to the above.</p>
</blockquote>
<p>Probably a yum update will get you there...</p>
<p>Let me know if there is anything we can do to help debug this. Its happens across 8 servers (all of which are at the same CentOS release, albeit they did start as the same VM image a while back).</p>
<p>Charlie</p> Ruby master - Bug #5306: Application Hangs Due to Recent rb_thread_select Changeshttps://bugs.ruby-lang.org/issues/5306?journal_id=207152011-09-10T15:29:13Znormalperson (Eric Wong)normalperson@yhbt.net
<ul></ul><p>Charlie Savage <a href="mailto:cfis@savagexi.com" class="email">cfis@savagexi.com</a> wrote:</p>
<blockquote>
<p>Sorry, these machines are actuall CentOS 5.6. The latest patches were<br>
applied via yum update about a week ago, so its pretty up-to-date.</p>
</blockquote>
<p>OK, I'm closer with 2.6.18-238.9.1.el5xen but still can't reproduce it.</p>
<p>I don't have permission to upgrade kernels on CentOS images,<br>
unfortunately. It's the weekend so the folks that do have permission<br>
aren't around...</p>
<blockquote>
<p>So what we see is this test hanging:</p>
<p>def test_datagrams<br>
$in = $out = ""<br>
EM.run {<br>
EM.open_datagram_socket "127.0.0.1", @port, TestDatagramServer<br>
EM.open_datagram_socket "127.0.0.1", 0, TestDatagramClient, @port<br>
}<br>
assert_equal( "1234567890", $in )<br>
assert_equal( "abcdefghij", $out )<br>
end</p>
<p>It hangs on the first EM.open_datagram_socket call.</p>
</blockquote>
<p>Can you show us "strace -f -v" output from that test?</p>
<p>Maybe sprinkle some `fprintf(stderr, "%s:%d\n", <strong>FILE</strong>, <strong>LINE</strong>);'<br>
or similar inside EventMachine_t::OpenDatagramSocket and see where it<br>
gets to? It shouldn't hit gethostbyname()...</p>
<blockquote>
<p>Here is another one, this time from test_pure_ruby.rb (which in fact seems misnamed, it is using the C code):</p>
<p>def test_connrefused<br>
assert_nothing_raised do<br>
EM.run {<br>
setup_timeout(2)<br>
EM.connect "127.0.0.1", @port, TestConnrefused<br>
}<br>
end</p>
<p>In this one, its the EM connect call that hangs.</p>
</blockquote>
<p>I can't reproduce this, either...</p>
<p>Also, can you extract these tests and run with a hand-picked port?</p>
<blockquote>
<p>Let me know if there is anything we can do to help debug this. Its<br>
happens across 8 servers (all of which are at the same CentOS release,<br>
albeit they did start as the same VM image a while back).</p>
</blockquote>
<p>I assume you tried a clean build/install of Ruby to make sure all<br>
objects got rebuilt and reinstalled?</p>
<p>Can you also try running `pmap $PID' on the hung processes to make sure<br>
it's loading the correct libs + versions?</p> Ruby master - Bug #5306: Application Hangs Due to Recent rb_thread_select Changeshttps://bugs.ruby-lang.org/issues/5306?journal_id=207192011-09-10T17:00:57Zcfis (Charlie Savage)
<ul><li><strong>File</strong> <a href="/attachments/2066">strace_hangs.log</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/2066/strace_hangs.log">strace_hangs.log</a> added</li><li><strong>File</strong> <a href="/attachments/2067">strace_completes.log</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/2067/strace_completes.log">strace_completes.log</a> added</li><li><strong>File</strong> <a href="/attachments/2068">strace_pure.log</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/2068/strace_pure.log">strace_pure.log</a> added</li><li><strong>File</strong> <a href="/attachments/2069">pmap.log</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/2069/pmap.log">pmap.log</a> added</li></ul><p>Ok, on the first test, strange results. Running this command:</p>
<p>strace -f -v ruby -I.:lib:tests tests/test_epoll.rb -n test_datagrams</p>
<p>Hangs the test as expected. But running this command:</p>
<p>strace -f -v ruby -I.:lib:tests tests/test_epoll.rb -n test_datagrams &> /tmp/strace1.log</p>
<p>Causes the test runs to completion. And then annoyingly enough that one particular test works after that. If I reboot the machine, then the test hangs again.</p>
<p>I have attached 2 logs, strace_completes.log and strace_hangs.log. stace_hangs.log is only the last few hundred lines (rest scrolled off the top), but what I saw matches strace_completes.log to line 2,271. After that, the two diverge.</p>
<p>The story is different for the second test, it always hangs:</p>
<p>strace -v -v ruby -I.:lib:tests tests/test_pure.rb -n test_connrefused 2>&1 | tee /tmp/strace_pure.log</p>
<p>That log is attached.</p>
<p>As for your other questions:</p>
<blockquote>
<p>Also, can you extract these tests and run with a hand-picked port?</p>
</blockquote>
<p>Sure. The connection refused one is intentionally picking the first unused port. It turns out to be 9001.</p>
<blockquote>
<p>I assume you tried a clean build/install of Ruby to make sure all > objects got rebuilt and reinstalled?</p>
</blockquote>
<p>Yes.</p>
<p>$cd /usr/src/ruby<br>
$git pull (on the ruby 193 branch)<br>
$git clean -fx<br>
$autoconf<br>
$./configure --prefix=/usr --enable-shared=true<br>
$make<br>
$make install</p>
<blockquote>
<p>Can you also try running `pmap $PID' on the hung processes to make > sure it's loading the correct libs + versions?</p>
</blockquote>
<p>$ps -ef | grep ruby<br>
cfis 16185 15381 4 01:51 pts/1 00:00:00 ruby -I.:lib:tests</p>
<p>$pmap 16185<br>
(see attached log)</p>
<p>Hope this info helps.</p> Ruby master - Bug #5306: Application Hangs Due to Recent rb_thread_select Changeshttps://bugs.ruby-lang.org/issues/5306?journal_id=207202011-09-10T17:17:30Zcfis (Charlie Savage)
<ul></ul><p>And a bit more info. Running the datagrams test under GDB.</p>
<p>$gdb --args ruby -I.:lib:tests tests/test_epoll.rb -n test_datagrams<br>
(gdb) run</p>
<p>... hangs ...<br>
hit ctrl+c</p>
<p>Program received signal SIGINT, Interrupt.<br>
0x000000375200d91b in read () from /lib64/libpthread.so.0</p>
<p>(gdb) bt<br>
#0 0x000000375200d91b in read () from /lib64/libpthread.so.0<br>
#1 0x00002aaaae9ea3ce in EventMachine_t::_ReadLoopBreaker (this=0xd61b50)<br>
at em.cpp:998<br>
#2 0x00002aaaae9ebc9a in EventMachine_t::_RunSelectOnce (this=0xd61b50)<br>
at em.cpp:935<br>
#3 0x00002aaaae9ec4f5 in EventMachine_t::_RunOnce (this=0x9) at em.cpp:498<br>
#4 0x00002aaaae9ee183 in EventMachine_t::Run (this=0xd61b50) at em.cpp:478<br>
#5 0x00002aaaae9e86a9 in t_run_machine_without_threads (self=9)<br>
at rubymain.cpp:219<br>
<a class="issue tracker-1 status-5 priority-4 priority-default closed behind-schedule" title="Bug: sprintf() of %f on Windows(MSVCRT) (Closed)" href="https://bugs.ruby-lang.org/issues/6">#6</a> 0x00002aaaaac1b2d0 in vm_call_cfunc (th=0x602520, cfp=0x2aaaae5c7778,<br>
num=0, blockptr=0x1, flag=24, id=0, me=0x8e9f90, recv=9127040)<br>
at vm_insnhelper.c:404<br>
etc.</p>
<p>(gdb) frame 1<br>
#1 0x00002aaaae9ea3ce in EventMachine_t::_ReadLoopBreaker (this=0xd61ce0)<br>
at em.cpp:998<br>
998 read (LoopBreakerReader, buffer, sizeof(buffer));<br>
(gdb) list<br>
993 /* The loop breaker has selected readable.<br>
994 * Read it ONCE (it may block if we try to read it twice)<br>
995 * and send a loop-break event back to user code.<br>
996 */<br>
997 char buffer [1024];<br>
998 read (LoopBreakerReader, buffer, sizeof(buffer));<br>
999 if (EventCallback)</p>
<p>Running the other test, gdb --args ruby -I.:lib:tests tests/test_pure.rb -n test_connrefused, shows the same backtrace in gdb.</p> Ruby master - Bug #5306: Application Hangs Due to Recent rb_thread_select Changeshttps://bugs.ruby-lang.org/issues/5306?journal_id=207272011-09-11T09:30:15Znormalperson (Eric Wong)normalperson@yhbt.net
<ul><li><strong>File</strong> <a href="/attachments/2070">0001-thread.c-rb_thread_select-mark-original-fd_sets-prop.patch</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/2070/0001-thread.c-rb_thread_select-mark-original-fd_sets-prop.patch">0001-thread.c-rb_thread_select-mark-original-fd_sets-prop.patch</a> added</li></ul><p>Thanks for the straces, I was able to tell the EM pipe was stuck on a<br>
false-positive and calling a blocking read() on a pipe that had no data.</p>
<p>Attached is a patch which should fix the issue, sorry for the bug :x</p> Ruby master - Bug #5306: Application Hangs Due to Recent rb_thread_select Changeshttps://bugs.ruby-lang.org/issues/5306?journal_id=207292011-09-11T11:59:07Zkosaki (Motohiro KOSAKI)kosaki.motohiro@gmail.com
<ul><li><strong>ruby -v</strong> changed from <i>ruby 1.9.3dev (2011-09-09 revision 33236) [x86_64-linux]</i> to <i>-</i></li></ul><p>2011/9/11 Eric Wong <a href="mailto:normalperson@yhbt.net" class="email">normalperson@yhbt.net</a>:</p>
<blockquote>
<p>Issue <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: Application Hangs Due to Recent rb_thread_select Changes (Closed)" href="https://bugs.ruby-lang.org/issues/5306">#5306</a> has been updated by Eric Wong.</p>
<p>File 0001-thread.c-rb_thread_select-mark-original-fd_sets-prop.patch added</p>
<p>Thanks for the straces, I was able to tell the EM pipe was stuck on a<br>
false-positive and calling a blocking read() on a pipe that had no data.</p>
<p>Attached is a patch which should fix the issue, sorry for the bug :x</p>
</blockquote>
<p>Your patch will break non linux platform. I can't apply it. :x</p> Ruby master - Bug #5306: Application Hangs Due to Recent rb_thread_select Changeshttps://bugs.ruby-lang.org/issues/5306?journal_id=207302011-09-11T14:56:57Zcfis (Charlie Savage)
<ul></ul><p>Ok, some questions so I can understand this code:</p>
<p>How is the false-positive happening?</p>
<p>Why does this break on non-linux platforms?</p>
<p>And then obviously, what is the next step?</p>
<p>Thanks for looking into this and the quick responses.</p> Ruby master - Bug #5306: Application Hangs Due to Recent rb_thread_select Changeshttps://bugs.ruby-lang.org/issues/5306?journal_id=207312011-09-11T15:12:09Znormalperson (Eric Wong)normalperson@yhbt.net
<ul><li><strong>File</strong> <a href="/attachments/2071">0001-thread.c-rb_thread_select-mark-original-fd_sets-prop.patch</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/2071/0001-thread.c-rb_thread_select-mark-original-fd_sets-prop.patch">0001-thread.c-rb_thread_select-mark-original-fd_sets-prop.patch</a> added</li></ul><p>Hopefully a better patch is attached. I have no way of testing on non-Linux,<br>
but I did test successfully without HAVE_RB_FD_INIT defined. _WIN32 tester (and<br>
potential fixer) is needed.</p> Ruby master - Bug #5306: Application Hangs Due to Recent rb_thread_select Changeshttps://bugs.ruby-lang.org/issues/5306?journal_id=207322011-09-11T15:23:12Znormalperson (Eric Wong)normalperson@yhbt.net
<ul></ul><p>Charlie Savage <a href="mailto:cfis@savagexi.com" class="email">cfis@savagexi.com</a> wrote:</p>
<blockquote>
<p>Ok, some questions so I can understand this code:</p>
<p>How is the false-positive happening?</p>
</blockquote>
<p>rb_thread_select() needs to modify the arguments passed to it (and clear<br>
out not-ready descriptors). My patch fixed that for Linux and other<br>
platforms with NFDBITS && HAVE_RB_FD_INIT.</p>
<blockquote>
<p>Why does this break on non-linux platforms</p>
</blockquote>
<p>I missed the (NFDBITS && HAVE_RB_FD_INIT) code paths completely.</p>
<blockquote>
<p>And then obviously, what is the next step?</p>
</blockquote>
<p>I am testing a patch, I manually disabled the HAVE_RB_FD_INIT code<br>
paths to test, but I cannot test _WIN32 path.</p> Ruby master - Bug #5306: Application Hangs Due to Recent rb_thread_select Changeshttps://bugs.ruby-lang.org/issues/5306?journal_id=207332011-09-11T15:40:00Zcfis (Charlie Savage)
<ul></ul><p>Thanks for the explanations.</p>
<p>I can test on windows - I have mswin and mingw builds. How to test though? Are there any tests in the test suite I should run to verify? Would love to run the whole test suite, but sadly that doesn't work on windows.</p> Ruby master - Bug #5306: Application Hangs Due to Recent rb_thread_select Changeshttps://bugs.ruby-lang.org/issues/5306?journal_id=207342011-09-11T15:53:07Znormalperson (Eric Wong)normalperson@yhbt.net
<ul></ul><p>Charlie Savage <a href="mailto:cfis@savagexi.com" class="email">cfis@savagexi.com</a> wrote:</p>
<blockquote>
<p>I can test on windows - I have mswin and mingw builds. How to test<br>
though? Are there any tests in the test suite I should run to verify?<br>
Would love to run the whole test suite, but sadly that doesn't work on<br>
windows.</p>
</blockquote>
<p>./ruby -I .ext/$PLATFORM test/-ext-/old_thread_select/test_old_thread_select.rb</p>
<p>For me, I have PLATFORM=x86_64-linux</p> Ruby master - Bug #5306: Application Hangs Due to Recent rb_thread_select Changeshttps://bugs.ruby-lang.org/issues/5306?journal_id=207352011-09-11T15:56:23Zcfis (Charlie Savage)
<ul></ul><p>Hmm, I take is this is against head? On the 1.9.3 branch there is already this method (line 2384):</p>
<p>void<br>
rb_fd_copy(rb_fdset_t *dst, const fd_set *src, int max)</p>
<p>The patch then adds this right below it (line 2399):</p>
<p>static void<br>
rb_fd_rcopy(fd_set *dst, rb_fdset_t *src)</p>
<p>And then lower down (line 2690):</p>
<p>if (read) {<br>
rfds = &fdsets[0];<br>
rb_fd_init(rfds);<br>
rb_fd_copy(rfds, read, max);<br>
}</p>
<p>So that rb_fd_copy call would no longer work.</p> Ruby master - Bug #5306: Application Hangs Due to Recent rb_thread_select Changeshttps://bugs.ruby-lang.org/issues/5306?journal_id=207362011-09-11T16:23:13Znormalperson (Eric Wong)normalperson@yhbt.net
<ul></ul><p>Charlie Savage <a href="mailto:cfis@savagexi.com" class="email">cfis@savagexi.com</a> wrote:</p>
<blockquote>
<p>Issue <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: Application Hangs Due to Recent rb_thread_select Changes (Closed)" href="https://bugs.ruby-lang.org/issues/5306">#5306</a> has been updated by Charlie Savage.</p>
</blockquote>
<blockquote>
<p>Hmm, I take is this is against head? On the 1.9.3 branch there is<br>
already this method (line 2384):</p>
</blockquote>
<p>It should apply cleanly to r33236 (ruby_1_9_3)</p>
<blockquote>
<p>void<br>
rb_fd_copy(rb_fdset_t *dst, const fd_set *src, int max)</p>
<p>The patch then adds this right below it (line 2399):</p>
<p>static void<br>
rb_fd_rcopy(fd_set *dst, rb_fdset_t *src)</p>
</blockquote>
<p>The new function is "rcopy" (reverse copy). I named it based on memrchr()<br>
vs memchr(). Maybe someone can think of a better name?</p> Ruby master - Bug #5306: Application Hangs Due to Recent rb_thread_select Changeshttps://bugs.ruby-lang.org/issues/5306?journal_id=207372011-09-11T17:15:08Zcfis (Charlie Savage)
<ul></ul><p>Ah, totally missed that r - its not obvious if you aren't looking for it.</p>
<p>Patch doesn't compile on Windows:</p>
<p>thread.c<br>
./../thread.c(2466) : error C2143: syntax error : missing ')' before ';'<br>
NMAKE : fatal error U1077: '"c:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\cl.EXE"' : return code '0x2'<br>
Stop.</p>
<p>As the error says, easy to fix, its missing a ) at the end of the line.</p>
<p>The test passes on mswin. One fails on mingw:</p>
<ol>
<li>Failure:<br>
test_old_select_false_positive(TestOldThreadSelect) [ruby/test/-ext-/old_thread_select/test_old_thread_select.rb:34]:<br>
<a href="/issues/5306">[ruby-core:39435]</a>.<br>
<[5]> expected but was<br>
<[3, 5]>.<br>
4 tests, 12 assertions, 1 failures, 0 errors, 0 skips</li>
</ol> Ruby master - Bug #5306: Application Hangs Due to Recent rb_thread_select Changeshttps://bugs.ruby-lang.org/issues/5306?journal_id=207442011-09-12T02:27:18Zkosaki (Motohiro KOSAKI)kosaki.motohiro@gmail.com
<ul><li><strong>File</strong> <a href="/attachments/2072">old_thread_select.patch</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/2072/old_thread_select.patch">old_thread_select.patch</a> added</li></ul><blockquote>
<p>static void<br>
rb_fd_rcopy(fd_set *dst, rb_fdset_t *src)<br>
{<br>
size_t size = howmany(rb_fd_max(src), NFDBITS) * sizeof(fd_mask);<br>
if (size < sizeof(fd_set)) size = sizeof(fd_set);<br>
memcpy(dst, rb_fd_ptr(src), size);<br>
}</p>
</blockquote>
<p>If size > sizeof(fd_set), this code makes memory corruption.</p>
<blockquote>
<p>static void<br>
rb_fd_rcopy(fd_set *dst, rb_fdset_t *src)<br>
{<br>
memcpy(dst->fd_array, src->fdset->fd_array,<br>
dst->fd_count * sizeof(dst->fd_array[0]);<br>
dst->fd_count = src->fdset->fd_count;<br>
}</p>
</blockquote>
<p>Bad indentation of coding style violation.<br>
Also, if src->fdset->fd_count > FD_SETSIZE, we should return an error or raise an exception.</p>
<p>Attached new patch. It works both linux and windows. Can you please review it?</p> Ruby master - Bug #5306: Application Hangs Due to Recent rb_thread_select Changeshttps://bugs.ruby-lang.org/issues/5306?journal_id=207452011-09-12T08:23:08Znormalperson (Eric Wong)normalperson@yhbt.net
<ul></ul><p>Motohiro KOSAKI <a href="mailto:kosaki.motohiro@gmail.com" class="email">kosaki.motohiro@gmail.com</a> wrote:</p>
<blockquote>
<p>Attached new patch. It works both linux and windows. Can you please<br>
review it?</p>
</blockquote>
<p>Thanks! I can confirm it's good on Linux, Charlie?</p> Ruby master - Bug #5306: Application Hangs Due to Recent rb_thread_select Changeshttps://bugs.ruby-lang.org/issues/5306?journal_id=207472011-09-12T20:36:08Zkosaki (Motohiro KOSAKI)kosaki.motohiro@gmail.com
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Closed</i></li><li><strong>% Done</strong> changed from <i>0</i> to <i>100</i></li></ul><p>This issue was solved with changeset r33256.<br>
Charlie, thank you for reporting this issue.<br>
Your contribution to Ruby is greatly appreciated.<br>
May Ruby be with you.</p>
<hr>
<ul>
<li>
<p>thread.c (rb_thread_select): fix to ignore an argument<br>
modification of rb_thread_fd_select().<br>
based on a patch by Eric Wong. [Bug <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: Application Hangs Due to Recent rb_thread_select Changes (Closed)" href="https://bugs.ruby-lang.org/issues/5306">#5306</a>] <a href="/issues/5306">[ruby-core:39435]</a></p>
</li>
<li>
<p>thread.c (rb_fd_rcopy): New. for reverse fd copy.</p>
</li>
<li>
<p>test/-ext-/old_thread_select/test_old_thread_select.rb<br>
(test_old_select_false_positive): test for bug5306.</p>
</li>
<li>
<p>ext/-test-/old_thread_select/old_thread_select.c (fdset2array):<br>
New. convert fdsets to array.</p>
</li>
<li>
<p>ext/-test-/old_thread_select/old_thread_select.c (old_thread_select):<br>
return 'read', 'write', 'except' argument of rb_thread_select()<br>
to ruby script.</p>
</li>
</ul> Ruby master - Bug #5306: Application Hangs Due to Recent rb_thread_select Changeshttps://bugs.ruby-lang.org/issues/5306?journal_id=207482011-09-12T20:42:32Zkosaki (Motohiro KOSAKI)kosaki.motohiro@gmail.com
<ul></ul><p>committed both trunk and ruby_1_9_3.</p> Ruby master - Bug #5306: Application Hangs Due to Recent rb_thread_select Changeshttps://bugs.ruby-lang.org/issues/5306?journal_id=207512011-09-13T01:29:10Zcfis (Charlie Savage)
<ul><li><strong>File</strong> <a href="/attachments/2074">mingw_backtrace.txt</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/2074/mingw_backtrace.txt">mingw_backtrace.txt</a> added</li></ul><p>Thanks for all the effort. But sorry, not fixed yet. This version segfaults on MinGW. Trace attached. Will check mswin next.</p> Ruby master - Bug #5306: Application Hangs Due to Recent rb_thread_select Changeshttps://bugs.ruby-lang.org/issues/5306?journal_id=207522011-09-13T03:53:18Zusa (Usaku NAKAMURA)usa@garbagecollect.jp
<ul></ul><p>Hello,</p>
<p>In message "<a href="/issues/5306">[ruby-core:39483]</a> [Ruby 1.9 - Bug <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: Application Hangs Due to Recent rb_thread_select Changes (Closed)" href="https://bugs.ruby-lang.org/issues/5306">#5306</a>] Application Hangs Due to Recent rb_thread_select Changes"<br>
on Sep.13,2011 01:29:28, <a href="mailto:cfis@savagexi.com" class="email">cfis@savagexi.com</a> wrote:</p>
<blockquote>
<p>File mingw_backtrace.txt added</p>
<p>Thanks for all the effort. But sorry, not fixed yet. This version segfaults on MinGW. Trace attached. Will check mswin next.</p>
</blockquote>
<p>Hmm, did you do make install before running the test?<br>
C level backtrace information of your trace says that the ruby<br>
core dll is c:\MinGW\local\ruby\bin\msvcrt-ruby191.dll .<br>
I guess that your build path is c:/MinGW/local/src/ruby, and<br>
the built ruby core dll is c:/MinGW/local/src/ruby/msvcrt-ruby191.dll .</p>
<p>If you want to test safely, run as follows:<br>
make test-all TESTS="-- -ext-/old_thread_select"</p>
<p>BTW, I've checked kosaki-san's patch with x64-mswin64.<br>
No problem was reported in test.</p>
<p>P.S.<br>
kosaki-san, I want to add a guard to your patch.<br>
<br>
--- thread.c.bak 2011-09-13 03:40:05.948172400 +0900<br>
+++ thread.c 2011-09-13 03:40:24.308222500 +0900<br>
@@ -2469,7 +2469,9 @@ rb_fd_rcopy(fd_set *dst, rb_fdset_t *src<br>
{<br>
int max = rb_fd_max(src);</p>
<ul>
<li>if (max > FD_SETSIZE) {</li>
</ul>
<ul>
<li>/* we assume src is the result of select() with dst, so dst should be</li>
<li>
<pre><code>* larger or equal than src. */
</code></pre>
</li>
<li>if (max > FD_SETSIZE || max > dst->fd_count) {<br>
rb_raise(rb_eArgError, "too large fdsets");<br>
}</li>
</ul>
<h2>
<br>
Regards,</h2>
<p>U.Nakamura <a href="mailto:usa@garbagecollect.jp" class="email">usa@garbagecollect.jp</a></p> Ruby master - Bug #5306: Application Hangs Due to Recent rb_thread_select Changeshttps://bugs.ruby-lang.org/issues/5306?journal_id=207532011-09-13T05:38:55Zcfis (Charlie Savage)
<ul></ul><p>Ok, I rebuilt everything from scratch and did not encounter any errors - sorry for the false alarm. mswin also checked out fine.</p>
<p>We will next test this fix on the original servers where we encountered the problem. If any issues remain, I will reopen the ticket.</p>
<p>Thanks again for the help.</p> Ruby master - Bug #5306: Application Hangs Due to Recent rb_thread_select Changeshttps://bugs.ruby-lang.org/issues/5306?journal_id=207682011-09-13T17:30:16Znormalperson (Eric Wong)normalperson@yhbt.net
<ul></ul><p>Motohiro KOSAKI <a href="mailto:kosaki.motohiro@gmail.com" class="email">kosaki.motohiro@gmail.com</a> wrote:</p>
<blockquote>
<p>File old_thread_select.patch added</p>
<blockquote>
<p>static void<br>
rb_fd_rcopy(fd_set *dst, rb_fdset_t *src)<br>
{<br>
size_t size = howmany(rb_fd_max(src), NFDBITS) * sizeof(fd_mask);<br>
if (size < sizeof(fd_set)) size = sizeof(fd_set);<br>
memcpy(dst, rb_fd_ptr(src), size);<br>
}</p>
</blockquote>
<p>If size > sizeof(fd_set), this code makes memory corruption.</p>
</blockquote>
<p>I just thought of this again and think rb_bug() is better than<br>
rb_raise() here. While unlikely to hit either case, rb_raise()<br>
will leak memory since the rb_fd_term() call gets skipped.</p> Ruby master - Bug #5306: Application Hangs Due to Recent rb_thread_select Changeshttps://bugs.ruby-lang.org/issues/5306?journal_id=208222011-09-14T12:23:13Zkosaki (Motohiro KOSAKI)kosaki.motohiro@gmail.com
<ul></ul><blockquote>
<p>BTW, I've checked kosaki-san's patch with x64-mswin64.<br>
No problem was reported in test.</p>
<p>P.S.<br>
kosaki-san, I want to add a guard to your patch.</p>
<p>--- thread.c.bak     2011-09-13 03:40:05.948172400 +0900<br>
+++ thread.c   2011-09-13 03:40:24.308222500 +0900<br>
@@ -2469,7 +2469,9 @@ rb_fd_rcopy(fd_set *dst, rb_fdset_t *src<br>
 {<br>
  int max</p>
</blockquote> Ruby master - Bug #5306: Application Hangs Due to Recent rb_thread_select Changeshttps://bugs.ruby-lang.org/issues/5306?journal_id=208232011-09-14T12:23:13Zkosaki (Motohiro KOSAKI)kosaki.motohiro@gmail.com
<ul></ul><p>2011/9/13 Eric Wong <a href="mailto:normalperson@yhbt.net" class="email">normalperson@yhbt.net</a>:</p>
<blockquote>
<p>Motohiro KOSAKI <a href="mailto:kosaki.motohiro@gmail.com" class="email">kosaki.motohiro@gmail.com</a> wrote:</p>
<blockquote>
<p>File old_thread_select.patch added</p>
<blockquote>
<p>static void<br>
rb_fd_rcopy(fd_set *dst, rb_fdset_t *src)<br>
{<br>
  size_t size</p>
</blockquote>
</blockquote>
</blockquote>