Bug #18464 closed
RUBY_INTERNAL_EVENT_NEWOBJ tracepoint causes an interpreter crash when combined with Ractors
Added by kjtsanaktsidis (KJ Tsanaktsidis) about 3 years ago.
Updated over 1 year ago.
ruby -v :
ruby 3.1.0p0 (2021-12-25 revision fb4df44d16) [x86_64-darwin20]
[ruby-core:107005]
Description
When a Ractor is created whilst a tracepoint for RUBY_INTERNAL_EVENT_NEWOBJ
is active (registered with rb_tracepoint_new
/rb_tracepoint_enabled
), the interpreter crashes with a null pointer dereference with the following backtrace:
[BUG] Segmentation fault at 0x0000000000000000
ruby 3.1.0p0 (2021-12-25 revision fb4df44d16) [x86_64-darwin20]
...
-- C level backtrace information -------------------------------------------
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(rb_print_backtrace+0xf) [0x10a15fadd] vm_dump.c:759
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(rb_vm_bugreport) vm_dump.c:1045
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(rb_vm_bugreport) (null):0
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(bug_report_end+0x0) [0x109f96b81] error.c:820
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(rb_bug_for_fatal_signal) error.c:820
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(sigsegv+0x52) [0x10a0be3a2] signal.c:964
/usr/lib/system/libsystem_platform.dylib(_sigtramp+0x1d) [0x7fff20934d7d]
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(gc_event_hook_body+0x4) [0x109fb9d21] gc.c:2214
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(newobj_slowpath) gc.c:2486
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(newobj_slowpath_wb_unprotected) gc.c:2507
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(newobj_fill+0x0) [0x109fac92e] gc.c:2543
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(newobj_of0) gc.c:2553
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(newobj_of) gc.c:2552
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(rb_wb_unprotected_newobj_of) gc.c:2567
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(io_alloc+0x12) [0x109fd341c] io.c:1047
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(prep_io) io.c:8483
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(prep_stdio) io.c:8514
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(rb_io_prep_stdin) io.c:8532
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(thread_start_func_2+0xf7) [0x10a1058a7] thread.c:802
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(rb_native_cond_initialize+0x0) [0x10a1055fb] ./thread_pthread.c:1047
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(register_cached_thread_and_wait) ./thread_pthread.c:1099
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(thread_start_func_1) ./thread_pthread.c:1054
/usr/lib/system/libsystem_pthread.dylib(_pthread_start+0xe0) [0x7fff208ef8fc]
(full output is attached).
This seems to be because the new Ractor sets up stdio objects (rb_io_prep_stdin
et. al.), which in turn allocate Ruby objects, before rb_ec_initialize_vm_stack
is called to set up the initial stack frame.
I've attached a patch which works around this by not firing GC event hooks if there is no control frame on the execution context. The patch also includes a test which reproduces the issue using the objspace
extension; creating a Ractor within an ObjectSpace.trace_object_allocations
block is enough to trigger the crash. The patch seems to fix things, but if you folk prefer I can also try swapping around the order of prep_stdio
and rb_ec_initialize_vm_stack
.
Files
Status changed from Open to Assigned
Assignee set to ko1 (Koichi Sasada)
Just checked, this is still an issue with 3.2.0-preview1. Is there any feedback on the patch I posted? Any other way you would suggest going about a solution? Thanks!
If it helps, here's a Linux-based backtrace:
-- C level backtrace information -------------------------------------------
/usr/local/lib/libruby.so.3.1(rb_print_backtrace+0x11) [0x7f75e6678aa8] vm_dump.c:759
/usr/local/lib/libruby.so.3.1(rb_vm_bugreport) vm_dump.c:1045
/usr/local/lib/libruby.so.3.1(rb_bug_for_fatal_signal+0xf0) [0x7f75e6477750] error.c:821
/usr/local/lib/libruby.so.3.1(sigsegv+0x49) [0x7f75e65ced19] signal.c:964
/lib/x86_64-linux-gnu/libpthread.so.0(__restore_rt+0x0) [0x7f75e636e140]
/usr/local/lib/libruby.so.3.1(gc_event_hook_body+0x1b) [0x7f75e649010b] gc.c:2217
/usr/local/lib/libruby.so.3.1(gc_enter+0x1f) [0x7f75e64a495f] gc.c:9194
/usr/local/lib/libruby.so.3.1(gc_enter) gc.c:9165
/usr/local/lib/libruby.so.3.1(gc_sweep_continue) gc.c:5743
/usr/local/lib/libruby.so.3.1(heap_prepare) gc.c:2193
/usr/local/lib/libruby.so.3.1(heap_next_freepage) gc.c:2388
/usr/local/lib/libruby.so.3.1(ractor_cache_slots) gc.c:2424
/usr/local/lib/libruby.so.3.1(newobj_slowpath) gc.c:2484
/usr/local/lib/libruby.so.3.1(newobj_slowpath_wb_unprotected) gc.c:2510
/usr/local/lib/libruby.so.3.1(newobj_fill+0x0) [0x7f75e64a4cf9] gc.c:2546
/usr/local/lib/libruby.so.3.1(newobj_of) gc.c:2556
/usr/local/lib/libruby.so.3.1(rb_wb_unprotected_newobj_of) gc.c:2570
/usr/local/lib/libruby.so.3.1(io_alloc+0x5) [0x7f75e64ccbca] io.c:1047
/usr/local/lib/libruby.so.3.1(prep_io) io.c:8479
/usr/local/lib/libruby.so.3.1(prep_stdio) io.c:8510
/usr/local/lib/libruby.so.3.1(rb_io_prep_stdin) io.c:8528
/usr/local/lib/libruby.so.3.1(thread_start_func_2+0x165) [0x7f75e6619965] thread.c:802
/usr/local/lib/libruby.so.3.1(register_cached_thread_and_wait+0x0) [0x7f75e661a6f9] thread_pthread.c:1047
/usr/local/lib/libruby.so.3.1(thread_start_func_1) thread_pthread.c:1054
/lib/x86_64-linux-gnu/libpthread.so.0(0x8ea7) [0x7f75e6362ea7]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f75e607fdef]
Interestingly, my crash happened on RUBY_INTERNAL_EVENT_GC_ENTER
(you can see my stack includes an attempt to garbage collect) but I believe the fix would work for this situation as well.
The PR to fix this has been merged ( https://github.com/ruby/ruby/pull/5990 ).
Would it be possible for the fix to be backported to 3.0/3.1/3.2? There's a few features in the ddtrace gem that can trigger this crash and that we've had to disable for these Rubies.
Backport changed from 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN to 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED, 3.1: REQUIRED, 3.2: REQUIRED
The fix was merged as 7bd7aee02e303de27d2cddfc5ef47e612d6782cb
Status changed from Assigned to Closed
Backport changed from 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED, 3.1: REQUIRED, 3.2: REQUIRED to 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED, 3.1: DONE, 3.2: REQUIRED
ruby_3_1 bdbe6053853c11ffe9b8737eb4da50ed84c9dbd6 merged revision(s) 7bd7aee02e303de27d2cddfc5ef47e612d6782cb.
Backport changed from 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED, 3.1: DONE, 3.2: REQUIRED to 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED, 3.1: DONE, 3.2: DONE
ruby_3_2 b422c3523c419b88c6da23a4022ae8864f411b84 merged revision(s) 7bd7aee02e303de27d2cddfc5ef47e612d6782cb.
Thanks again @nagachika (Tomoyuki Chikanaga) !
Can I bother you with a backport to 3.0 as well? I know that one is getting "long in the tooth" in terms of support, but having it fixed would mean this crash would not happen on any of the Ruby releases which support Ractors (3.0/3.1/3.2/...) which would make our usage of tracepoints in the ddtrace gem simpler :)
Aaaahh. It's a shame, but I can understand 😓 . Thanks for the clarification :)
Also available in: Atom
PDF
Like 1
Like 0 Like 0 Like 0 Like 0 Like 0 Like 0 Like 0 Like 0 Like 0 Like 0 Like 0 Like 0 Like 0 Like 0