Ruby 1.8.6p230 SEGV on OpenSolaris
I have a Rails application that causes Ruby 1.8.6 p230 to seg fault and dump core on OpenSolaris running on an AMD64 chip. I'm able to find the cause of the crash, but don't know quite how to fix it.
Here are my findings.
Ruby seg faults on bringing up a Rails application. The core dump happens at line 5913 in method rb_call0 in eval.c. This line is "*local_vars++ = (VALUE)body;"
The Ruby source line where the problem happens(based on using DTrace and the Ruby core dump) is in actionpack 2.1.0, line 1050, in file prototype_helper.rb. But I don't believe that this line causes the problem since I've been able to move the crash around by placing some "print" statements in the Ruby code.
rb_call0 is called by rb_call, and I believe that rb_call makes the call to rb_call0 using an uninitialized reference "*body". Inside rb_call0, a call is made to alloca with the argument( sizeof ( VALUE ) * ( body -> u1 . tbl [ 0 ] + 1 ))
Since *body is uninitialized, the value of body->us.tbl is an out of range value in the range of hundreds of megabytes, and this makes the argument to alloca much higher than can be allocated on the stack, which cause it to return a bad pointer, and the bad pointer causes a seg fault in line 5913.
Ruby 1.8.6 p230
OpenSolaris Nevada build 95, and I suspect other versions of Solaris too.
This happens independent of which compiler is used, compiler flags etc.,
Where is the application?:
It's a social calendar. It's too big to attach to this bug report. But I believe the problem description is good enough without the application. If someone needs it, I'll look for a way to transfer the rails application.
#2 Updated by prashant (Prashant Srinivasan) almost 10 years ago
Thanks for the note - I hadn't noticed that p287 was released. The problem has gone away with p287.
The eval.c code in the area of the SEGV seems not to have changed, but hopefully a change in some other part of the code will prevent the uninitialized *body from being passed into rb_call0?