Project

General

Profile

Bug #10626

BUS error from nesting lambda's and calls to methods defined with define_method

Added by jaroslawr (Jarosław Rzeszótko) about 4 years ago. Updated about 4 years ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
2.1.5
[ruby-core:67002]

Description

I get a BUS error from executing the following Ruby program: https://gist.github.com/jaroslawr/8579678d7c68a49208f0

I am on Gentoo Linux and Ruby 2.1.5, and have also tried Ruby 2.1.4, 2.1.3, ..., down to 2.1.0. My colleagues The problem seems to lie in rapidly consuming stack space, and goes away when the stack size limit is increased with ulimit -s. For the real world context behind this, see the corresponding Rails issue I opened:

https://github.com/rails/rails/issues/18011


Related issues

Is duplicate of Ruby trunk - Bug #10460: Segfault instead of stack level too deepOpenActions

History

Updated by jaroslawr (Jarosław Rzeszótko) about 4 years ago

By the way, even more strange things happen if you replace the simple call to test with something like:

t1 = Thread.new { test }
t2 = Thread.new { test }
t1.join
t2.join

I get then the following error:

test.rb:6: [BUG] object allocation during garbage collection phase

Although for me this part is a purely theoretical exercise.

Updated by nobu (Nobuyoshi Nakada) about 4 years ago

  • Is duplicate of Bug #10460: Segfault instead of stack level too deep added

Updated by jaroslawr (Jarosław Rzeszótko) about 4 years ago

I have seen #10460 and would not say this is an obvious duplicate. Here, it is not only that you do not get a stack overflow, it seems that the C stack grows up abnormally quickly when you nest lambdas and method calls, compared to just nesting method calls. If I run Ruby under GDB, the number of stack frames is not really all that big, e.g. Ruby can in other situations often easily handle stacks 3-4 times as big in terms of pure number of call frames, both on the Ruby and on the C level.

Also, the symptomps (error message, backtrace etc.) are different than what people report in #10460.

Updated by jaroslawr (Jarosław Rzeszótko) about 4 years ago

Here is a stripped down and easier to understand test case:

https://gist.github.com/anonymous/a2a784c9f37b1fc6b753

Basically, the bigger the M, the lower N is needed to trigger the crash. On my computer, just nesting 100 lambdas is enough to trigger a crash if you allocate a lot of memory at the same time.

Updated by jaroslawr (Jarosław Rzeszótko) about 4 years ago

Some more findings, you can run the above test case under gdb like this:

# gdb ruby
(gdb) set disable-randomization off
(gdb) run test.rb

The test program does not crash with randomization disabled in gdb, nor does it crash when run under valgrind. Where the program crashes varies from run to run, sometimes it does not crash at all. At the assembly level, it always crashes on this call:

call   0xb75169a0 <__x86.get_pc_thunk.bx>

Which is basically (http://gcc.gnu.org/ml/gcc-help/2010-12/msg00131.html):

movl (%esp), %e##reg;

And indeed, in info registers I get for example:

esp            0xbfc08fe0       0xbfc08fe0

And then:

(gdb) x 0xbfc08fe0
0xbfc08fe0:     Cannot access memory at address 0xbfc08fe0

So the stack pointer is somehow broken. In this case the start of the stack is:

(gdb) proc stat
...
Start of stack: 0xbfc3cd90

Doing the math, the stack in total occupies:

((0xbfc3cd90 - 0xbfc08fe0) words * 4 bytes) / 1024 bytes = 829 kbytes

Which is way lower than the default ulimit -s of 8192 bytes.

Updated by jaroslawr (Jarosław Rzeszótko) about 4 years ago

... way lower than the default ulimit -s of 8192 kilobytes. Wish this bugtracker supported editing ^

Updated by jaroslawr (Jarosław Rzeszótko) about 4 years ago

Given that this is a BUS error, here is perhaps a particularly interesting backtrace you can get if you are "lucky":

#0  0xaf9e3187 in _int_memalign (av=av@entry=0xafb00420 <main_arena>, alignment=alignment@entry=16384, bytes=bytes@entry=16364) at malloc.c:4359
#1  0xaf9e42e1 in _mid_memalign (alignment=alignment@entry=16384, bytes=bytes@entry=16364, address=0xafcd6acc <heap_assign_page+188>) at malloc.c:3095
#2  0xaf9e5d6d in __posix_memalign (memptr=memptr@entry=0xbfacd0c0, alignment=alignment@entry=16384, size=size@entry=16364) at malloc.c:4980
#3  0xafcd6acc in aligned_malloc (size=16364, alignment=16384) at gc.c:5909
#4  heap_page_allocate (objspace=0xb1381e90) at gc.c:1035
#5  heap_page_create (objspace=0xb1381e90) at gc.c:1121
#6  heap_assign_page (objspace=0xb1381e90, heap=0xb1381e98) at gc.c:1143
#7  0xafcdafdf in heap_increment (heap=0xb1381e98, objspace=0xb1381e90) at gc.c:1191
#8  heap_prepare_freepage (heap=0xb1381e98, objspace=0xb1381e90) at gc.c:1212
#9  heap_get_freeobj_from_next_freepage (heap=0xb1381e98, objspace=0xb1381e90) at gc.c:1237
#10 heap_get_freeobj (heap=0xb1381e98, objspace=0xb1381e90) at gc.c:1259
#11 newobj_of (klass=klass@entry=2973259080, flags=flags@entry=36, v1=v1@entry=0, v2=v2@entry=0, v3=v3@entry=0) at gc.c:1303
#12 0xafcdb127 in rb_newobj_of (klass=2973259080, flags=flags@entry=36) at gc.c:1356
#13 0xafd1f813 in rb_float_new_in_heap (d=0.61034828073270286) at numeric.c:639
#14 0xafd5fbd0 in rb_float_new_inline (d=<optimized out>) at internal.h:591
#15 rb_f_rand (argc=0, argv=0xaf6efa70, obj=2973266420) at random.c:1212
#16 0xafe1da7e in call_cfunc_m1 (func=0xafd5fa80 <rb_f_rand>, recv=2973266420, argc=0, argv=0xaf6efa70) at vm_insnhelper.c:1317
#17 0xafe24491 in vm_call_cfunc_with_frame (th=0xb1381be0, reg_cfp=0xaf76c678, ci=0xb15bf7b0) at vm_insnhelper.c:1489
#18 0xafe2c5e7 in vm_exec_core (th=0xb1381be0, initial=initial@entry=0) at insns.def:1028
#19 0xafe328f7 in vm_exec (th=th@entry=0xb1381be0) at vm.c:1398
#20 0xafe25ab5 in invoke_block_from_c (th=<optimized out>, block=<optimized out>, self=2973266420, argc=argc@entry=1, argv=argv@entry=0xbfacd870, 
    blockptr=blockptr@entry=0x0, cref=cref@entry=0x0, defined_class=2973268920) at vm.c:817
#21 0xafe3b238 in vm_yield (argv=<optimized out>, argc=<optimized out>, th=<optimized out>) at vm.c:856
#22 rb_yield_0 (argv=<optimized out>, argc=<optimized out>) at vm_eval.c:938
#23 rb_yield (val=697) at vm_eval.c:948
#24 0xafc5c8db in rb_ary_collect (ary=2796886860) at array.c:2677
#25 0xafe1da9e in call_cfunc_0 (func=0xafc5c880 <rb_ary_collect>, recv=2796886860, argc=0, argv=0xaf6efa5c) at vm_insnhelper.c:1323
#26 0xafe24491 in vm_call_cfunc_with_frame (th=0xb1381be0, reg_cfp=0xaf76c6c8, ci=0xb15bfcf8) at vm_insnhelper.c:1489
#27 0xafe2cd8b in vm_exec_core (th=0xb1381be0, initial=initial@entry=0) at insns.def:999
#28 0xafe328f7 in vm_exec (th=th@entry=0xb1381be0) at vm.c:1398
#29 0xafe25ab5 in invoke_block_from_c (th=th@entry=0xb1381be0, block=block@entry=0xb15c9b68, self=2973266420, argc=0, argv=0xaf6efa48, blockptr=0x0, cref=cref@entry=0x0, 
    defined_class=2973268920) at vm.c:817
#30 0xafe26964 in vm_invoke_proc (th=0xb1381be0, proc=0xb15c9b68, self=2973266420, defined_class=2973268920, argc=0, argv=0xaf6efa48, blockptr=0x0) at vm.c:881
#31 0xafe26a1a in rb_vm_invoke_proc (th=<optimized out>, proc=<optimized out>, proc@entry=0xb15c9b68, argc=argc@entry=0, argv=argv@entry=0xaf6efa48, blockptr=0x0)
    at vm.c:900
#32 0xafcc0dcd in proc_call (argc=0, argv=0xaf6efa48, procval=2975625740) at proc.c:713
#33 0xafe1da7e in call_cfunc_m1 (func=0xafcc0d70 <proc_call>, recv=2975625740, argc=0, argv=0xaf6efa48) at vm_insnhelper.c:1317
#34 0xafe24491 in vm_call_cfunc_with_frame (th=0xb1381be0, reg_cfp=0xaf76c718, ci=0xb15bfd38) at vm_insnhelper.c:1489
#35 0xafe2c5e7 in vm_exec_core (th=0xb1381be0, initial=initial@entry=0) at insns.def:1028

As always, in this case too the stack pointer (%esp) is pointing to an invalid address, for what it is worth.

Updated by jaroslawr (Jarosław Rzeszótko) about 4 years ago

A still more simple test case for apparently the same problem:

https://gist.github.com/anonymous/a86f5eb0198acc10ae1e

It really isn't simply an unhandled stack overflow. If you decrease the number of allocations, the program runs just fine.

Updated by jaroslawr (Jarosław Rzeszótko) about 4 years ago

Maybe someone can now rename this issue to a name better reflecting the actual problem, it seems like a pretty general memory allocation bug that can causes many different code patterns to produce a crash. I also have reproduced the same issue on Ruby 2.2.0-rc1.

Sorry for the large amount of somewhat disorganized writing, I have spent a huge amount of time debugging this issue starting from a complex Rails app, would very much like to find out what is at the bottom of this, and it's still an ongoing research.

Also available in: Atom PDF