https://bugs.ruby-lang.org/https://bugs.ruby-lang.org/favicon.ico?17113305112017-09-06T20:54:21ZRuby Issue Tracking SystemRuby master - Bug #13875: segfault in Enumerable#zip after GChttps://bugs.ruby-lang.org/issues/13875?journal_id=665162017-09-06T20:54:21Zkernigh (George Koehler)xkernigh@netscape.net
<ul></ul><p>The problem is with <code>VALUE tmp;</code> in enum.c zip_i(). The garbage collector frees tmp too early. I try to protect it with RB_GC_GUARD(tmp), but this doesn't fix the bug. Ruby still crashes.</p>
<pre><code class="diff syntaxhl" data-language="diff"><span class="gh">diff --git a/enum.c b/enum.c
index 4613ab733c..bca63dab5e 100644
</span><span class="gd">--- a/enum.c
</span><span class="gi">+++ b/enum.c
</span><span class="p">@@ -2593,6 +2593,7 @@</span> zip_i(RB_BLOCK_CALL_FUNC_ARGLIST(val, memoval))
}
RB_GC_GUARD(args);
<span class="gi">+ RB_GC_GUARD(tmp);
</span>
return Qnil;
}
</code></pre>
<p>I removed my RB_GC_GUARD(tmp) and added a trick with rb_ivar_set() to make a reference from another Ruby object to tmp. This seems to prevent the bug. Ruby doesn't crash.</p>
<pre><code class="diff syntaxhl" data-language="diff"><span class="gh">diff --git a/enum.c b/enum.c
index 4613ab733c..5e5c50e37d 100644
</span><span class="gd">--- a/enum.c
</span><span class="gi">+++ b/enum.c
</span><span class="p">@@ -2568,6 +2568,7 @@</span> zip_i(RB_BLOCK_CALL_FUNC_ARGLIST(val, memoval))
int i;
tmp = rb_ary_new2(RARRAY_LEN(args) + 1);
<span class="gi">+ rb_ivar_set(args, rb_intern("@tmp"), tmp);
</span> rb_ary_store(tmp, 0, rb_enum_values_pack(argc, argv));
for (i=0; i<RARRAY_LEN(args); i++) {
if (NIL_P(RARRAY_AREF(args, i))) {
</code></pre>
<p>But this trick might not be the correct fix. I fear a problem with fibers, because there is a fiber switch when zip_i() calls Enumerator#next. Perhaps the GC can't find tmp while the other fiber is running.</p> Ruby master - Bug #13875: segfault in Enumerable#zip after GChttps://bugs.ruby-lang.org/issues/13875?journal_id=665182017-09-06T23:08:26Znormalperson (Eric Wong)normalperson@yhbt.net
<ul></ul><p><a href="mailto:xkernigh@netscape.net" class="email">xkernigh@netscape.net</a> wrote:</p>
<blockquote>
<p><a href="https://bugs.ruby-lang.org/issues/13875#change-66516" class="external">https://bugs.ruby-lang.org/issues/13875#change-66516</a></p>
</blockquote>
<blockquote>
<p>But this trick might not be the correct fix. I fear a problem<br>
with fibers, because there is a fiber switch when zip_i()<br>
calls Enumerator#next. Perhaps the GC can't find tmp while the<br>
other fiber is running.</p>
</blockquote>
<p>Is FIBER_USE_NATIVE enabled in cont.c?<br>
Does the problem go away if you flip that?</p>
<p>Also, which compiler + non-standard CFLAGS do you use?</p>
<p>We don't have a lot of OpenBSD users here; +cc Jeremy...</p>
<p>Does this affect older versions of Ruby, too? We've had<br>
some recent movement in cont.c in trunk and maybe broke<br>
something in Fiber stack marking...</p>
<p>I'm also curious which RB_GC_GUARD implementation you<br>
use (it's compiler-dependent). The rb_gc_guarded_ptr_val<br>
one in include/ruby/ruby.h should be strongest since it can't<br>
be inlined (but slowest). Perhaps try that one if you're<br>
not using it...</p>
<p>Thanks.</p> Ruby master - Bug #13875: segfault in Enumerable#zip after GChttps://bugs.ruby-lang.org/issues/13875?journal_id=665212017-09-07T01:04:42Zkernigh (George Koehler)xkernigh@netscape.net
<ul></ul><p>In reply to Eric Wong:</p>
<p>FIBER_USE_NATIVE is 0. Flipping it to 1 causes compiler errors; OpenBSD doesn't have <ucontext.h>.</p>
<p>I'm using the system gcc, which is <code>gcc (GCC) 4.2.1 20070719</code>. I configured Ruby with <code>../ruby/configure --prefix=$HOME/prefix --with-baseruby=ruby23</code> and didn't add any extra CFLAGS. I have two other compilers (a newer gcc and clang), but I have not tried them with Ruby.</p>
<p>I forgot to try older Ruby versions. I have Ruby 2.3 and 2.4 from OpenBSD packages. My script doesn't reproduce the crash in 2.3 or 2.4, so bug is only in trunk.</p>
<pre><code>$ ruby24 -v
ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-openbsd]
$ ruby23 -v
ruby 2.3.3p222 (2016-11-21 revision 56859) [x86_64-openbsd]
</code></pre>
<p>Ruby was using <strong>GNUC</strong> version of RB_GC_GUARD. I now edit include/ruby/ruby.h so it uses rb_gc_guarded_ptr_val version. I put my RB_GC_GUARD(tmp) in zip_i(). My script still crashes Ruby. So rb_gc_guarded_ptr_val doesn't fix this bug.</p> Ruby master - Bug #13875: segfault in Enumerable#zip after GChttps://bugs.ruby-lang.org/issues/13875?journal_id=665242017-09-07T01:53:48Zjeremyevans0 (Jeremy Evans)merch-redmine@jeremyevans.net
<ul></ul><p>normalperson (Eric Wong) wrote:</p>
<blockquote>
<p><a href="mailto:xkernigh@netscape.net" class="email">xkernigh@netscape.net</a> wrote:</p>
<blockquote>
<p><a href="https://bugs.ruby-lang.org/issues/13875#change-66516" class="external">https://bugs.ruby-lang.org/issues/13875#change-66516</a></p>
</blockquote>
<blockquote>
<p>But this trick might not be the correct fix. I fear a problem<br>
with fibers, because there is a fiber switch when zip_i()<br>
calls Enumerator#next. Perhaps the GC can't find tmp while the<br>
other fiber is running.</p>
</blockquote>
<p>Is FIBER_USE_NATIVE enabled in cont.c?</p>
</blockquote>
<p>No. It is set to 0 on OpenBSD.</p>
<blockquote>
<p>Does the problem go away if you flip that?</p>
</blockquote>
<p>It doesn't compile, as ucontext.h is not available on OpenBSD:</p>
<p>cont.c:68:10: fatal error: 'ucontext.h' file not found</p>
<blockquote>
<p>Does this affect older versions of Ruby, too? We've had<br>
some recent movement in cont.c in trunk and maybe broke<br>
something in Fiber stack marking...</p>
</blockquote>
<p>I can't reproduce on OpenBSD-current or OpenBSD 6.1 using:</p>
<p>ruby 2.2.7p470 (2017-03-28 revision 58194) [x86_64-openbsd]<br>
ruby 2.3.4p301 (2017-03-30 revision 58214) [x86_64-openbsd]<br>
ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-openbsd]</p>
<p>Note that OpenBSD-current uses clang 4.0.0 as the system compiler, as opposed to OpenBSD 6.1 and previous versions, which use gcc 4.2.1.</p>
<p>I can reproduce on both OpenBSD-current and OpenBSD 6.1 using:</p>
<p>ruby 2.5.0dev (2017-09-06 trunk 59764) [x86_64-openbsd]</p>
<blockquote>
<p>I'm also curious which RB_GC_GUARD implementation you<br>
use (it's compiler-dependent). The rb_gc_guarded_ptr_val<br>
one in include/ruby/ruby.h should be strongest since it can't<br>
be inlined (but slowest). Perhaps try that one if you're<br>
not using it...</p>
</blockquote>
<p>OpenBSD defaults to the first branch (rb_gc_guarded_ptr), even when compiling with clang. Forcing it to use last branch (rb_gc_guarded_ptr_val) with the following diff still results in the same segfault using the code provided by kernigh:</p>
<pre><code>--- a/include/ruby/ruby.h
+++ b/include/ruby/ruby.h
@@ -534,7 +534,7 @@ static inline int rb_type(VALUE obj);
((type) == RUBY_T_FLOAT) ? RB_FLOAT_TYPE_P(obj) : \
(!RB_SPECIAL_CONST_P(obj) && RB_BUILTIN_TYPE(obj) == (type)))
-#ifdef __GNUC__
+#ifndef __GNUC__
#define RB_GC_GUARD(v) \
(*__extension__ ({ \
volatile VALUE *rb_gc_guarded_ptr = &(v); \
</code></pre> Ruby master - Bug #13875: segfault in Enumerable#zip after GChttps://bugs.ruby-lang.org/issues/13875?journal_id=665442017-09-07T19:41:31Znormalperson (Eric Wong)normalperson@yhbt.net
<ul></ul><p>Thanks, I can reproduce the bug on GNU/Linux with:</p>
<p>--- a/cont.c<br>
+++ b/cont.c<br>
@@ -57,6 +57,7 @@</p>
<a name="define-FIBER_USE_NATIVE-1"></a>
<h1 >define FIBER_USE_NATIVE 1<a href="#define-FIBER_USE_NATIVE-1" class="wiki-anchor">¶</a></h1>
<a name="endif"></a>
<h1 >endif<a href="#endif" class="wiki-anchor">¶</a></h1>
<p>#endif<br>
+#undef FIBER_USE_NATIVE<br>
#if !defined(FIBER_USE_NATIVE)<br>
#define FIBER_USE_NATIVE 0<br>
#endif</p>
<p>Now, I'm testing the following patch:</p>
<p><a href="https://80x24.org/spew/20170907193559.27639-1-e@80x24.org/raw" class="external">https://80x24.org/spew/20170907193559.27639-1-e@80x24.org/raw</a></p>
<p>And I no longer get segfaults with the new test</p>
<p>However, test/ruby/test_io.rb seems stuck when FIBER_USE_NATIVE is 0<br>
on my system...</p> Ruby master - Bug #13875: segfault in Enumerable#zip after GChttps://bugs.ruby-lang.org/issues/13875?journal_id=665492017-09-08T02:48:56Zkernigh (George Koehler)xkernigh@netscape.net
<ul></ul><p>Jeremy Evans wrote:</p>
<blockquote>
<p>Note that OpenBSD-current uses clang 4.0.0 as the system compiler, as opposed to OpenBSD 6.1 and previous versions, which use gcc 4.2.1.</p>
</blockquote>
<p>Thanks for testing with clang. You showed that the bug wasn't only with gcc 4.2.1.</p>
<p>Eric Wrong wrote:</p>
<blockquote>
<p><a href="https://80x24.org/spew/20170907193559.27639-1-e@80x24.org/raw" class="external">https://80x24.org/spew/20170907193559.27639-1-e@80x24.org/raw</a></p>
</blockquote>
<p>This patch also prevents the segfault for me.</p>
<blockquote>
<p>However, test/ruby/test_io.rb seems stuck when FIBER_USE_NATIVE is 0 on my system...</p>
</blockquote>
<p>This file (test/ruby/test_io.rb) and a few other tests usually get stuck in OpenBSD. The cause is a bug that I reported to OpenBSD (fifo plus threads equals stuck: <a href="https://marc.info/?l=openbsd-bugs&m=146276089610123&w=2" class="external">https://marc.info/?l=openbsd-bugs&m=146276089610123&w=2</a>). I have local edits to those tests so they fail and don't get stuck when I run make test-all or make test-spec.</p>
<p>You can see my local edits here:<br>
<a href="https://gist.github.com/kernigh/5770f8b90427ce6ede535dae729cb960" class="external">https://gist.github.com/kernigh/5770f8b90427ce6ede535dae729cb960</a></p>
<p>Your patch, with my OpenBSD machine, didn't cause any more tests (in test/ruby/test_io.rb or elsewhere) to get stuck. If you run Linux, you probably don't have the OpenBSD bug. So your stuck test might be different from my stuck test. You might have found a bug in Ruby that happens in GNU/Linux but I can't reproduce in OpenBSD.</p> Ruby master - Bug #13875: segfault in Enumerable#zip after GChttps://bugs.ruby-lang.org/issues/13875?journal_id=665502017-09-08T04:51:24Zjeremyevans0 (Jeremy Evans)merch-redmine@jeremyevans.net
<ul></ul><p>kernigh (George Koehler) wrote:</p>
<blockquote>
<p>Eric Wrong wrote:</p>
<blockquote>
<p>However, test/ruby/test_io.rb seems stuck when FIBER_USE_NATIVE is 0 on my system...</p>
</blockquote>
<p>This file (test/ruby/test_io.rb) and a few other tests usually get stuck in OpenBSD. The cause is a bug that I reported to OpenBSD (fifo plus threads equals stuck: <a href="https://marc.info/?l=openbsd-bugs&m=146276089610123&w=2" class="external">https://marc.info/?l=openbsd-bugs&m=146276089610123&w=2</a>). I have local edits to those tests so they fail and don't get stuck when I run make test-all or make test-spec.</p>
</blockquote>
<p>Some additional background: the OpenBSD ports for ruby also skip this test. The fifo pthread fdlock bug has been in OpenBSD probably since it moved from userland threads to kernel threads, and there has been a failing regress test for it since 2012 (<a href="https://github.com/openbsd/src/blob/master/regress/lib/libpthread/blocked_fifo/blocked_fifo.c" class="external">https://github.com/openbsd/src/blob/master/regress/lib/libpthread/blocked_fifo/blocked_fifo.c</a>). There was an attempt to fix it (<a href="https://github.com/openbsd/src/commit/4ca9b96f0bca4f64040c5f77f0c29ccfac8bd418#diff-3701716ce89e506e5b445acbe4095ee6" class="external">https://github.com/openbsd/src/commit/4ca9b96f0bca4f64040c5f77f0c29ccfac8bd418#diff-3701716ce89e506e5b445acbe4095ee6</a>), but it was backed out shortly after being committed due to regressions (<a href="https://github.com/openbsd/src/commit/4185654479fabb05682e85a51de78cbd2fa8dc5c#diff-3701716ce89e506e5b445acbe4095ee6" class="external">https://github.com/openbsd/src/commit/4185654479fabb05682e85a51de78cbd2fa8dc5c#diff-3701716ce89e506e5b445acbe4095ee6</a>).</p>
<p>If you don't want to skip the test in test_io.rb, the workaround is fairly simple:</p>
<pre><code>- open("fifo", "r") {|r|
+ open("fifo", "r+") {|r|
</code></pre>
<blockquote>
<p>Your patch, with my OpenBSD machine, didn't cause any more tests (in test/ruby/test_io.rb or elsewhere) to get stuck. If you run Linux, you probably don't have the OpenBSD bug. So your stuck test might be different from my stuck test. You might have found a bug in Ruby that happens in GNU/Linux but I can't reproduce in OpenBSD.</p>
</blockquote>
<p>I also tested the patch using OpenBSD-current with clang 4.0.0, and it fixes the issue here too.</p> Ruby master - Bug #13875: segfault in Enumerable#zip after GChttps://bugs.ruby-lang.org/issues/13875?journal_id=665512017-09-08T06:41:52Znormalperson (Eric Wong)normalperson@yhbt.net
<ul></ul><p>Thank you both for the extra info, I think there's a different<br>
bug for FIBER_USE_NATIVE=0 on my GNU/Linux system...</p>
<p>Anyways, for this segfault here is an updated v2 patch (for<br>
r59776) which I'll commit soonish:</p>
<p><a href="https://80x24.org/spew/20170908062817.GA9144@dcvr/raw" class="external">https://80x24.org/spew/20170908062817.GA9144@dcvr/raw</a></p>
<p>I'll try to work on tracking down the test_io.rb stuckage with<br>
FIBER_USE_NATIVE=0 tomorrow. It seems to affect 2.4.1, even.</p> Ruby master - Bug #13875: segfault in Enumerable#zip after GChttps://bugs.ruby-lang.org/issues/13875?journal_id=665592017-09-08T23:51:20ZAnonymous
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Closed</i></li></ul><p>Applied in changeset trunk|r59785.</p>
<hr>
<p>fiber: fix machine stack marking when FIBER_USE_NATIVE is 0</p>
<ul>
<li>cont.c (cont_mark): mark Fiber machine stack correctly when<br>
FIBER_USE_NATIVE is 0</li>
<li>test/ruby/test_fiber.rb (test_mark_fiber): new test<br>
[Bug <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: segfault in Enumerable#zip after GC (Closed)" href="https://bugs.ruby-lang.org/issues/13875">#13875</a>] <a href="/issues/13875">[ruby-core:82681]</a></li>
</ul>
<p>This bug appears to be introduced with r59557.<br>
("refactoring Fiber status")</p> Ruby master - Bug #13875: segfault in Enumerable#zip after GChttps://bugs.ruby-lang.org/issues/13875?journal_id=665982017-09-11T05:51:17Zwanabe (_ wanabe)s.wanabe@gmail.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-5 priority-4 priority-default closed" href="/issues/13887">Bug #13887</a>: test/ruby/test_io.rb may get stuck with FIBER_USE_NATIVE=0 on Linux</i> added</li></ul>