https://bugs.ruby-lang.org/https://bugs.ruby-lang.org/favicon.ico?17113305112017-08-25T11:45:43ZRuby Issue Tracking SystemRuby master - Bug #13757: TestBacktrace#test_caller_lev segaults on PPChttps://bugs.ruby-lang.org/issues/13757?journal_id=662792017-08-25T11:45:43Zvo.x (Vit Ondruch)v.ondruch@tiscali.cz
<ul><li><strong>Related to</strong> <i><a class="issue tracker-2 status-5 priority-4 priority-default closed" href="/issues/13637">Feature #13637</a>: [PATCH] tool/runruby.rb: test with smallest possible machine stack</i> added</li></ul> Ruby master - Bug #13757: TestBacktrace#test_caller_lev segaults on PPChttps://bugs.ruby-lang.org/issues/13757?journal_id=662812017-08-25T11:54:52Zvo.x (Vit Ondruch)v.ondruch@tiscali.cz
<ul><li><strong>Assignee</strong> set to <i>normalperson (Eric Wong)</i></li></ul><p>Ok, so I did my homework :) It started with r59047 and it is issue for ppc64/ppc64le. No other Fedora architectures are affected. Since this release, the test suite always get stuck at TestBacktrace#test_caller_lev. Later, since r59159, the test is not stuck, but segfaults instead. Not sure how to fix this, but for the moment I am going to revert r59047 for Fedora ...</p> Ruby master - Bug #13757: TestBacktrace#test_caller_lev segaults on PPChttps://bugs.ruby-lang.org/issues/13757?journal_id=662822017-08-25T12:01:10Zvo.x (Vit Ondruch)v.ondruch@tiscali.cz
<ul></ul><p>BTW, very likely unrelated, but I noticed, that on PCC64 (BE), the C level backtrace is not correctly collected. It looks like:</p>
<pre><code>-- C level backtrace information -------------------------------------------
[0x4ef7a734]
[0x4ef7a824]
[0x4ef72928]
[0x4ee67564]
linux-vdso64.so.1 [0x3fffa9e404d8]
[0x4ee69d0c]
[0x4ee6d10c]
[0x4eef3640]
[0x4eef4154]
[0x4eef422c]
[0x4eef44a4]
[0x4eecd2e4]
[0x4eedbcd8]
[0x4eeedda4]
[0x4eee4008]
... snip ...
</code></pre>
<p>Is it some big-endian issue?</p> Ruby master - Bug #13757: TestBacktrace#test_caller_lev segaults on PPChttps://bugs.ruby-lang.org/issues/13757?journal_id=667682017-09-19T08:51:37Znormalperson (Eric Wong)normalperson@yhbt.net
<ul></ul><p><a href="mailto:v.ondruch@tiscali.cz" class="email">v.ondruch@tiscali.cz</a> wrote:</p>
<blockquote>
<p>Issue <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: TestBacktrace#test_caller_lev segaults on PPC (Closed)" href="https://bugs.ruby-lang.org/issues/13757">#13757</a> has been updated by vo.x (Vit Ondruch).</p>
<p>Assignee set to normalperson (Eric Wong)</p>
</blockquote>
<p>Sorry, I did not notice earlier. Feel free to Cc: me directly<br>
in the future.</p>
<blockquote>
<p>Ok, so I did my homework :) It started with r59047 and it is<br>
issue for ppc64/ppc64le. No other Fedora architectures are<br>
affected. Since this release, the test suite always get stuck<br>
at TestBacktrace#test_caller_lev. Later, since r59159, the<br>
test is not stuck, but segfaults instead. Not sure how to fix<br>
this, but for the moment I am going to revert r59047 for<br>
Fedora ...</p>
</blockquote>
<p>If you have time, can you try to find larger values changed by<br>
r59047 (RUBY_THREAD_MACHINE_STACK_SIZE and<br>
RUBY_FIBER_MACHINE_STACK_SIZE) which do not get stuck/segfault<br>
on PPC?</p>
<p>Worst case is we define larger minimum values in vm_core.h for PPC.<br>
Best case is we find stack usage bloat on PPC and fix it :></p>
<p>ppc64le? I don't know much about PPC and did not know there is<br>
little-endian PPC. So it's PPC-specific and not BE-specific...<br>
<em>scratches head</em></p> Ruby master - Bug #13757: TestBacktrace#test_caller_lev segaults on PPChttps://bugs.ruby-lang.org/issues/13757?journal_id=668122017-09-21T05:15:22Zvo.x (Vit Ondruch)v.ondruch@tiscali.cz
<ul></ul><p>normalperson (Eric Wong) wrote:</p>
<blockquote>
<p>If you have time, can you try to find larger values changed by<br>
r59047 (RUBY_THREAD_MACHINE_STACK_SIZE and<br>
RUBY_FIBER_MACHINE_STACK_SIZE) which do not get stuck/segfault<br>
on PPC?</p>
</blockquote>
<p>Any idea what should be rasonable value? I started naively with 2 and then continued with 5, 10, 100, 1000, 10000 and it is still crashing :/</p>
<blockquote>
<p>ppc64le? I don't know much about PPC and did not know there is<br>
little-endian PPC.</p>
</blockquote>
<p>Yep, LE is available since Power8 if I am not mistaken ...</p>
<blockquote>
<p>So it's PPC-specific and not BE-specific...<br>
<em>scratches head</em></p>
</blockquote>
<p>Should I try to get involved somebody from Fedora secondary arches?</p> Ruby master - Bug #13757: TestBacktrace#test_caller_lev segaults on PPChttps://bugs.ruby-lang.org/issues/13757?journal_id=668142017-09-21T08:31:43Znormalperson (Eric Wong)normalperson@yhbt.net
<ul></ul><p><a href="mailto:v.ondruch@tiscali.cz" class="email">v.ondruch@tiscali.cz</a> wrote:</p>
<blockquote>
<p>normalperson (Eric Wong) wrote:</p>
<blockquote>
<p>If you have time, can you try to find larger values changed by<br>
r59047 (RUBY_THREAD_MACHINE_STACK_SIZE and<br>
RUBY_FIBER_MACHINE_STACK_SIZE) which do not get stuck/segfault<br>
on PPC?</p>
</blockquote>
<p>Any idea what should be rasonable value? I started naively with 2 and then continued with 5, 10, 100, 1000, 10000 and it is still crashing :/</p>
</blockquote>
<p>Oh, check the *_STACK_SIZE #defines in vm_core.h for what the<br>
defaults are, and maybe binary search until you find something<br>
which works/breaks. I made them all `1' for the tests so it<br>
would automatically pick the *_MIN values.</p>
<blockquote>
<blockquote>
<p>ppc64le? I don't know much about PPC and did not know there is<br>
little-endian PPC.</p>
</blockquote>
<p>Yep, LE is available since Power8 if I am not mistaken ...</p>
<blockquote>
<p>So it's PPC-specific and not BE-specific...<br>
<em>scratches head</em></p>
</blockquote>
<p>Should I try to get involved somebody from Fedora secondary arches?</p>
</blockquote>
<p>Sure! I'm not sure if we have many PPC folks here and could use<br>
the extra help.</p>
<p>Meanwhile...</p>
<p>Can you also try commenting out parts of test_caller_lev and<br>
narrowing down where in that test it fails?</p>
<p>And give scripts/checkstack.pl in the Linux kernel source a try<br>
to find big stack users:</p>
<p><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/scripts/checkstack.pl" class="external">https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/scripts/checkstack.pl</a></p>
<p>Usage: objdump -d /path/to/ruby | /path/to/checkstack.pl</p>
<p>I've found some big offenders with that over the years, but the<br>
dynamic alloca cases are trickier to find...</p>
<p>Thanks.</p> Ruby master - Bug #13757: TestBacktrace#test_caller_lev segaults on PPChttps://bugs.ruby-lang.org/issues/13757?journal_id=668372017-09-22T14:28:13Zvo.x (Vit Ondruch)v.ondruch@tiscali.cz
<ul></ul><p>normalperson (Eric Wong) wrote:</p>
<blockquote>
<p><a href="mailto:v.ondruch@tiscali.cz" class="email">v.ondruch@tiscali.cz</a> wrote:</p>
<blockquote>
<p>normalperson (Eric Wong) wrote:</p>
<blockquote>
<p>If you have time, can you try to find larger values changed by<br>
r59047 (RUBY_THREAD_MACHINE_STACK_SIZE and<br>
RUBY_FIBER_MACHINE_STACK_SIZE) which do not get stuck/segfault<br>
on PPC?</p>
</blockquote>
<p>Any idea what should be rasonable value? I started naively with 2 and then continued with 5, 10, 100, 1000, 10000 and it is still crashing :/</p>
</blockquote>
<p>Oh, check the *_STACK_SIZE #defines in vm_core.h for what the<br>
defaults are, and maybe binary search until you find something<br>
which works/breaks. I made them all `1' for the tests so it<br>
would automatically pick the *_MIN values.</p>
</blockquote>
<p>I passes with 262144 but fails with 131072. Should I try to find some value in between?</p> Ruby master - Bug #13757: TestBacktrace#test_caller_lev segaults on PPChttps://bugs.ruby-lang.org/issues/13757?journal_id=668472017-09-23T20:36:12Znormalperson (Eric Wong)normalperson@yhbt.net
<ul></ul><p><a href="mailto:v.ondruch@tiscali.cz" class="email">v.ondruch@tiscali.cz</a> wrote:</p>
<blockquote>
<p>normalperson (Eric Wong) wrote:</p>
<blockquote>
<p>Oh, check the *_STACK_SIZE #defines in vm_core.h for what the<br>
defaults are, and maybe binary search until you find something<br>
which works/breaks. I made them all `1' for the tests so it<br>
would automatically pick the *_MIN values.</p>
</blockquote>
<p>I passes with 262144 but fails with 131072. Should I try to find some value in between?</p>
</blockquote>
<p>Yes, exactly. Thank you.</p> Ruby master - Bug #13757: TestBacktrace#test_caller_lev segaults on PPChttps://bugs.ruby-lang.org/issues/13757?journal_id=679672017-11-28T07:51:08Zvo.x (Vit Ondruch)v.ondruch@tiscali.cz
<ul><li><strong>Has duplicate</strong> <i><a class="issue tracker-1 status-5 priority-4 priority-default closed" href="/issues/14131">Bug #14131</a>: test_backtrace.rb: Fails on ppc64le</i> added</li></ul> Ruby master - Bug #13757: TestBacktrace#test_caller_lev segaults on PPChttps://bugs.ruby-lang.org/issues/13757?journal_id=679802017-11-28T17:09:24Zleitao (Breno Leitao)
<ul></ul><p>was able to narrow down this issue to the following code:</p>
<pre><code> max = 20
rec = lambda{|n|
if n > 0
rec[n-1]
end
}
#rec[max]
Fiber.new{
rec[max]
}.resume
</code></pre>
<p>That you should call with:</p>
<pre><code> # ./miniruby tool/runruby.rb lambda.rb
</code></pre>
<p>Some further findings:</p>
<ul>
<li>
<p>The problem is not reproducible if we use functions recursion instead of lambda recursion.</p>
</li>
<li>
<p>The problem does not happen if we run the recursion outside of a Fiber.</p>
</li>
</ul> Ruby master - Bug #13757: TestBacktrace#test_caller_lev segaults on PPChttps://bugs.ruby-lang.org/issues/13757?journal_id=681922017-12-04T19:00:47Zleitao (Breno Leitao)
<ul></ul><p>Hi,</p>
<p>I've been investigating this issue, and I think I found the solution:</p>
<p>stack_check() is checking the lambda stack size against a very small stack size (8Kb). This is a quite small stack size, and powerpc usually have a bigger frames, causing this test case to fail. My suggestion is increasing the powerpc64 stack frame size limits, since just allow a very few recursion for lambda functions.</p>
<p><a href="https://github.com/ruby/ruby/pull/1768" class="external">https://github.com/ruby/ruby/pull/1768</a></p> Ruby master - Bug #13757: TestBacktrace#test_caller_lev segaults on PPChttps://bugs.ruby-lang.org/issues/13757?journal_id=681942017-12-05T01:19:57Zhsbt (Hiroshi SHIBATA)hsbt@ruby-lang.org
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Closed</i></li></ul><p>Fixed at r61020</p>