Bug #6344

1.9.3 p125, p194 ruby causes SEGV with test_massign.rb on ppc/ppc64

Added by Mamoru Tasaka almost 2 years ago. Updated over 1 year ago.

[ruby-dev:45554]
Status:Rejected
Priority:Normal
Assignee:-
Category:-
Target version:1.9.3
ruby -v:ruby1.9.3p125 Backport:

Description

1.9.3 p125, p194 ruby causes SEGV with test_massign.rb on ppc/ppc64.
Short reproducer and backtrace with ruby 1.9.3 p125:

[tasaka@localhost ruby-1.9.3-p125]$ cat rubydev-32581.rb
a,s=[],"aaa"
300.times { a<...
Reading symbols from
/home/tasaka/rpmbuild/BUILD/ruby-1.9.3-p125/miniruby...done.
(gdb) run -Ilib -I. --disable-gems ./rubydev-32581.rb
Starting program: /home/tasaka/rpmbuild/BUILD/ruby-1.9.3-p125/miniruby -Ilib
-I. --disable-gems ./rubydev-32581.rb
[Thread debugging using libthread_db enabled]
[New Thread 0xfffb1dff200 (LWP 1759)]

Program received signal SIGSEGV, Segmentation fault.
0x00000080db556b20 in .__makecontext () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install
nss-softokn-freebl-3.12.9-3.el7.ppc64
(gdb) thread apply all bt

Thread 2 (Thread 0xfffb1dff200 (LWP 1759)):
#0 0x00000080db5fe054 in .select () from /lib64/libc.so.6
#1 0x000000001018911c in threadtimer (p=0x10280f68) at threadpthread.c:1155
#2 0x00000080db70b330 in startthread (arg=0xfffb1dff200) at
pthread
create.c:299
#3 0x00000080db6076ec in .
clone () from /lib64/libc.so.6

Thread 1 (Thread 0x80db4d7010 (LWP 1756)):
#0 0x00000080db556b20 in ._makecontext () from /lib64/libc.so.6
#1 0x000000001018ff80 in fiber
initializemachinestackcontext
(fib=0x103ab970, size=65536) at cont.c:606
#2 0x0000000010190094 in fiber
setcontext (newfib=0x103ab970,
oldfib=0x103ac940) at cont.c:623
#3 0x0000000010190214 in fiberstore (nextfib=0x103ab970) at cont.c:1234
#4 0x00000000101903f8 in fiberswitch (fibval=271105960, argc=, argv=0xfffb1e00098) at cont.c:1319
#5 rb
fiberresume (fibval=271105960, argc=,
argv=0xfffb1e00098) at cont.c:1347
#6 0x00000000101905e4 in rb
fibermresume (argc=,
argv=, fib=) at cont.c:1404
#7 0x000000001016c754 in callcfunc (func=@0x10255a90: 0x101905c0
<rb
fibermresume>, recv=271105960, len=, argc=,
argv=) at vminsnhelper.c:326
#8 0x0000000010171c74 in vm
callcfunc (th=0x10281560, cfp=0xfffb1effe00,
num=, blockptr=, flag=0,
id=, me=0x1039b8f0, recv=271105960) at
vm
insnhelper.c:404
#9 vmcallmethod (th=0x10281560, cfp=0xfffb1effe00, num=, blockptr=, flag=0, id=,
me=0x1039b8f0,
recv=271105960) at vminsnhelper.c:534
#10 0x00000000101734f4 in vm
execcore (th=0x10281560, initial=) at insns.def:1015
#11 0x0000000010178da8 in vm
exec (th=0x10281560) at vm.c:1220
#12 0x0000000010179480 in evalstringwithcref (self=271477440, src=271373360,
scope=4, cref=0x0, file=0x101b7fd8 "(eval)", line=1) at vm
eval.c:1050
#13 0x0000000010179b20 in evalstring (argc=, argv=, self=271477440) at vmeval.c:1091
#14 rbfeval (argc=, argv=,
self=271477440) at vmeval.c:1139
#15 0x000000001016c754 in call
cfunc (func=@0x10254660: 0x101799a0 ,
recv=271477440, len=, argc=,
argv=) at vminsnhelper.c:326
#16 0x0000000010171c74 in vm
callcfunc (th=0x10281560, cfp=0xfffb1efff08,
num=, blockptr=, flag=8,
id=, me=0x1030f710, recv=271477440) at
vm
insnhelper.c:404
#17 vmcallmethod (th=0x10281560, cfp=0xfffb1efff08, num=, blockptr=, flag=8, id=,
me=0x1030f710,
recv=271477440) at vminsnhelper.c:534
#18 0x00000000101734f4 in vm
execcore (th=0x10281560, initial=) at insns.def:1015
#19 0x0000000010178da8 in vm
exec (th=0x10281560) at vm.c:1220
#20 0x0000000010179078 in rbiseqevalmain (iseqval=271386440) at vm.c:1461
#21 0x00000000100559a8 in ruby
execinternal (n=0x102d0748) at eval.c:204
#22 0x00000000100559f8 in ruby
execnode (n=value has been optimized out
) at eval.c:251
#23 0x0000000010057650 in ruby
run_node (n=0x102d0748) at eval.c:244
#24 0x0000000010015664 in main (argc=5, argv=0xffffffff528) at main.c:38

1.9.3 p194 causes the same segv.

The attached patch seems to suppress this segv.

ruby-1.9.3-p125-increase-stack-allocation-size-ppc.patch Magnifier - Proposal patch (418 Bytes) Mamoru Tasaka, 04/23/2012 11:26 PM

Associated revisions

Revision 35694
Added by Motohiro KOSAKI almost 2 years ago

  • cont.c: bump up fiber machine stack size when running on 64bit arch. [Bug #6344]

Revision 35697
Added by Motohiro KOSAKI almost 2 years ago

decrease fiber stack size. 1MB is too large for windows. [Bug #6344]

History

#1 Updated by Yusuke Endoh almost 2 years ago

  • Status changed from Open to Feedback

日本語わかって頂けそうな気がするので日本語で。

残念ながら ppc にはプラットフォームメンテナがいなかったと
思います。
納得できるパッチを頂ければ取り込むにやぶさかではないですが、
このパッチはいかにも workaround という感じで、納得しかねる
感じです。

面倒そうなところなので心苦しいですが、きちんと原因究明して
パッチを頂けたら幸いです。

Yusuke Endoh mame@tsg.ne.jp

#2 Updated by Mamoru Tasaka almost 2 years ago

(Well, as RedHat people are also seeing this...)
So first of all, I think the current value "0x10000" is
almost baseless and theoretically this value should be
assigned dynamically. BTW Vit says that this segv happens
also on 2.0.0dev (2012-04-23 trunk 35432)
x86_64-linux and the above patch
seems to work.

Would you know how the current value is selected first of
all?

#3 Updated by Yusuke Endoh almost 2 years ago

mtasaka (Mamoru Tasaka) wrote:

(Well, as RedHat people are also seeing this...)
So first of all, I think the current value "0x10000" is
almost baseless and theoretically this value should be
assigned dynamically. BTW Vit says that this segv happens
also on 2.0.0dev (2012-04-23 trunk 35432)
x86_64-linux and the above patch
seems to work.

What ticket (or mail) are you talking about?
Please give me a pointer.

Yusuke Endoh mame@tsg.ne.jp

#5 Updated by Yusuke Endoh almost 2 years ago

  • Status changed from Feedback to Assigned
  • Assignee set to Koichi Sasada

TL;DR :-)

Assigning this to ko1, as I heard ko1 is planning to change the code so that the value is dynamically determined.
But I don't know ko1 can work soon. It might be helpful to elaborate the mechanism of the issue.

Yusuke Endoh mame@tsg.ne.jp

#6 Updated by Motohiro KOSAKI almost 2 years ago

If increasing machine stack, it definitly reduce limit of number of fibers.
But, of course, RISC need more stack than CISC and 64bit need more stack than 32bit. then I'm not surprised this patch solved
ppc issue. So, I suspect the best way is,

#if 64BIT
#define FIBERMACHINESTACKALLOCATIONSIZE (0x20000)
#else

#define FIBERMACHINESTACKALLOCATIONSIZE (0x10000)
#endif

or likewise. I don't think dynamic fiber stack feature fit 1.9.3 branch.

#7 Updated by Motohiro KOSAKI almost 2 years ago

I meant, if a patch has negative impact against 32bit x86, I can't agree it. You should think 80%+ people are using 32bit x86.

#8 Updated by Yusuke Endoh almost 2 years ago

  • Assignee changed from Koichi Sasada to Motohiro KOSAKI

Hello,

Okay, Kosaki-san, I leave this up to you.
But personally I'm against changing the parameter blindly, without
credible explanation and evidence about the mechanism of the problem.

2012/4/24, kosaki (Motohiro KOSAKI) kosaki.motohiro@gmail.com:

If increasing machine stack, it definitly reduce limit of number of fibers.
But, of course, RISC need more stack than CISC and 64bit need more stack
than 32bit. then I'm not surprised this patch solved
ppc issue.

OP said the same issue occurred on x86_64-linux, but I couldn't reproduce.
Can you?

Yusuke Endoh mame@tsg.ne.jp

#9 Updated by Motohiro KOSAKI almost 2 years ago

OP said the same issue occurred on x86_64-linux, but I couldn't reproduce.
Can you?

No I can't. we need more feedback, I think.

#10 Updated by Motohiro KOSAKI almost 2 years ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r35694.
Mamoru, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • cont.c: bump up fiber machine stack size when running on 64bit arch. [Bug #6344]

#11 Updated by Motohiro KOSAKI almost 2 years ago

  • Status changed from Closed to Feedback

If anyone send me a feedback of ppc test result, I'll backport this.

#12 Updated by Vit Ondruch almost 2 years ago

Weird, it still/again crashes with ruby 2.0.0dev (2012-06-25 trunk 36213) [powerpc-linux]

http://ppc.koji.fedoraproject.org/koji/getfile?taskID=597002&name=build.log

#13 Updated by Motohiro KOSAKI over 1 year ago

  • Assignee deleted (Motohiro KOSAKI)

#14 Updated by Yusuke Endoh over 1 year ago

  • Status changed from Feedback to Rejected

After all, increasing the value is not an essential fix, I think.

Again, there is no powerpc maintainer. I'm closing this ticket.
Feel free to reopen or open a new ticket if you can make a patch.

Yusuke Endoh mame@tsg.ne.jp

Also available in: Atom PDF