Project

General

Profile

Actions

Bug #18464

closed

RUBY_INTERNAL_EVENT_NEWOBJ tracepoint causes an interpreter crash when combined with Ractors

Added by kjtsanaktsidis (KJ Tsanaktsidis) about 2 years ago. Updated 9 months ago.

Status:
Closed
Target version:
-
ruby -v:
ruby 3.1.0p0 (2021-12-25 revision fb4df44d16) [x86_64-darwin20]
[ruby-core:107005]

Description

When a Ractor is created whilst a tracepoint for RUBY_INTERNAL_EVENT_NEWOBJ is active (registered with rb_tracepoint_new/rb_tracepoint_enabled), the interpreter crashes with a null pointer dereference with the following backtrace:

[BUG] Segmentation fault at 0x0000000000000000
ruby 3.1.0p0 (2021-12-25 revision fb4df44d16) [x86_64-darwin20]

...

-- C level backtrace information -------------------------------------------
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(rb_print_backtrace+0xf) [0x10a15fadd] vm_dump.c:759
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(rb_vm_bugreport) vm_dump.c:1045
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(rb_vm_bugreport) (null):0
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(bug_report_end+0x0) [0x109f96b81] error.c:820
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(rb_bug_for_fatal_signal) error.c:820
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(sigsegv+0x52) [0x10a0be3a2] signal.c:964
/usr/lib/system/libsystem_platform.dylib(_sigtramp+0x1d) [0x7fff20934d7d]
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(gc_event_hook_body+0x4) [0x109fb9d21] gc.c:2214
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(newobj_slowpath) gc.c:2486
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(newobj_slowpath_wb_unprotected) gc.c:2507
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(newobj_fill+0x0) [0x109fac92e] gc.c:2543
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(newobj_of0) gc.c:2553
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(newobj_of) gc.c:2552
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(rb_wb_unprotected_newobj_of) gc.c:2567
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(io_alloc+0x12) [0x109fd341c] io.c:1047
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(prep_io) io.c:8483
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(prep_stdio) io.c:8514
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(rb_io_prep_stdin) io.c:8532
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(thread_start_func_2+0xf7) [0x10a1058a7] thread.c:802
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(rb_native_cond_initialize+0x0) [0x10a1055fb] ./thread_pthread.c:1047
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(register_cached_thread_and_wait) ./thread_pthread.c:1099
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(thread_start_func_1) ./thread_pthread.c:1054
/usr/lib/system/libsystem_pthread.dylib(_pthread_start+0xe0) [0x7fff208ef8fc]

(full output is attached).

This seems to be because the new Ractor sets up stdio objects (rb_io_prep_stdin et. al.), which in turn allocate Ruby objects, before rb_ec_initialize_vm_stack is called to set up the initial stack frame.

I've attached a patch which works around this by not firing GC event hooks if there is no control frame on the execution context. The patch also includes a test which reproduces the issue using the objspace extension; creating a Ractor within an ObjectSpace.trace_object_allocations block is enough to trigger the crash. The patch seems to fix things, but if you folk prefer I can also try swapping around the order of prep_stdio and rb_ec_initialize_vm_stack.


Files

0001-Fix-interpreter-crash-caused-by-RUBY_INTERNAL_EVENT_.patch (1.91 KB) 0001-Fix-interpreter-crash-caused-by-RUBY_INTERNAL_EVENT_.patch kjtsanaktsidis (KJ Tsanaktsidis), 01/08/2022 04:34 AM
crash.log (26.1 KB) crash.log kjtsanaktsidis (KJ Tsanaktsidis), 01/08/2022 04:35 AM
ruby_2022-01-08-151326_8927-ktsanaktsidis.crash (18.8 KB) ruby_2022-01-08-151326_8927-ktsanaktsidis.crash kjtsanaktsidis (KJ Tsanaktsidis), 01/08/2022 04:37 AM

Updated by nobu (Nobuyoshi Nakada) about 2 years ago

  • Status changed from Open to Assigned
  • Assignee set to ko1 (Koichi Sasada)

Updated by kjtsanaktsidis (KJ Tsanaktsidis) almost 2 years ago

Just checked, this is still an issue with 3.2.0-preview1. Is there any feedback on the patch I posted? Any other way you would suggest going about a solution? Thanks!

Updated by kjtsanaktsidis (KJ Tsanaktsidis) almost 2 years ago

I opened a PR with this patch. Happy to try fixing it a different way but this at least stops the crash. https://github.com/ruby/ruby/pull/5990

Updated by ivoanjo (Ivo Anjo) over 1 year ago

If it helps, here's a Linux-based backtrace:

-- C level backtrace information -------------------------------------------
/usr/local/lib/libruby.so.3.1(rb_print_backtrace+0x11) [0x7f75e6678aa8] vm_dump.c:759
/usr/local/lib/libruby.so.3.1(rb_vm_bugreport) vm_dump.c:1045
/usr/local/lib/libruby.so.3.1(rb_bug_for_fatal_signal+0xf0) [0x7f75e6477750] error.c:821
/usr/local/lib/libruby.so.3.1(sigsegv+0x49) [0x7f75e65ced19] signal.c:964
/lib/x86_64-linux-gnu/libpthread.so.0(__restore_rt+0x0) [0x7f75e636e140]
/usr/local/lib/libruby.so.3.1(gc_event_hook_body+0x1b) [0x7f75e649010b] gc.c:2217
/usr/local/lib/libruby.so.3.1(gc_enter+0x1f) [0x7f75e64a495f] gc.c:9194
/usr/local/lib/libruby.so.3.1(gc_enter) gc.c:9165
/usr/local/lib/libruby.so.3.1(gc_sweep_continue) gc.c:5743
/usr/local/lib/libruby.so.3.1(heap_prepare) gc.c:2193
/usr/local/lib/libruby.so.3.1(heap_next_freepage) gc.c:2388
/usr/local/lib/libruby.so.3.1(ractor_cache_slots) gc.c:2424
/usr/local/lib/libruby.so.3.1(newobj_slowpath) gc.c:2484
/usr/local/lib/libruby.so.3.1(newobj_slowpath_wb_unprotected) gc.c:2510
/usr/local/lib/libruby.so.3.1(newobj_fill+0x0) [0x7f75e64a4cf9] gc.c:2546
/usr/local/lib/libruby.so.3.1(newobj_of) gc.c:2556
/usr/local/lib/libruby.so.3.1(rb_wb_unprotected_newobj_of) gc.c:2570
/usr/local/lib/libruby.so.3.1(io_alloc+0x5) [0x7f75e64ccbca] io.c:1047
/usr/local/lib/libruby.so.3.1(prep_io) io.c:8479
/usr/local/lib/libruby.so.3.1(prep_stdio) io.c:8510
/usr/local/lib/libruby.so.3.1(rb_io_prep_stdin) io.c:8528
/usr/local/lib/libruby.so.3.1(thread_start_func_2+0x165) [0x7f75e6619965] thread.c:802
/usr/local/lib/libruby.so.3.1(register_cached_thread_and_wait+0x0) [0x7f75e661a6f9] thread_pthread.c:1047
/usr/local/lib/libruby.so.3.1(thread_start_func_1) thread_pthread.c:1054
/lib/x86_64-linux-gnu/libpthread.so.0(0x8ea7) [0x7f75e6362ea7]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f75e607fdef]

Updated by ivoanjo (Ivo Anjo) over 1 year ago

Interestingly, my crash happened on RUBY_INTERNAL_EVENT_GC_ENTER (you can see my stack includes an attempt to garbage collect) but I believe the fix would work for this situation as well.

Updated by ivoanjo (Ivo Anjo) about 1 year ago

The PR to fix this has been merged ( https://github.com/ruby/ruby/pull/5990 ).

Would it be possible for the fix to be backported to 3.0/3.1/3.2? There's a few features in the ddtrace gem that can trigger this crash and that we've had to disable for these Rubies.

Updated by byroot (Jean Boussier) about 1 year ago

  • Backport changed from 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN to 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED, 3.1: REQUIRED, 3.2: REQUIRED

The fix was merged as 7bd7aee02e303de27d2cddfc5ef47e612d6782cb

Actions #8

Updated by byroot (Jean Boussier) about 1 year ago

  • Status changed from Assigned to Closed

Updated by nagachika (Tomoyuki Chikanaga) 12 months ago

  • Backport changed from 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED, 3.1: REQUIRED, 3.2: REQUIRED to 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED, 3.1: DONE, 3.2: REQUIRED

ruby_3_1 bdbe6053853c11ffe9b8737eb4da50ed84c9dbd6 merged revision(s) 7bd7aee02e303de27d2cddfc5ef47e612d6782cb.

Updated by nagachika (Tomoyuki Chikanaga) 9 months ago

  • Backport changed from 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED, 3.1: DONE, 3.2: REQUIRED to 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED, 3.1: DONE, 3.2: DONE

ruby_3_2 b422c3523c419b88c6da23a4022ae8864f411b84 merged revision(s) 7bd7aee02e303de27d2cddfc5ef47e612d6782cb.

Updated by ivoanjo (Ivo Anjo) 9 months ago

Thanks again @nagachika (Tomoyuki Chikanaga)!

Can I bother you with a backport to 3.0 as well? I know that one is getting "long in the tooth" in terms of support, but having it fixed would mean this crash would not happen on any of the Ruby releases which support Ractors (3.0/3.1/3.2/...) which would make our usage of tracepoints in the ddtrace gem simpler :)

Updated by jeremyevans0 (Jeremy Evans) 9 months ago

ivoanjo (Ivo Anjo) wrote in #note-12:

Can I bother you with a backport to 3.0 as well?

Ruby 3.0 is in security maintenance mode, and this does not appear to be a security issue: https://www.ruby-lang.org/en/downloads/branches/

Updated by ivoanjo (Ivo Anjo) 9 months ago

Aaaahh. It's a shame, but I can understand 😓 . Thanks for the clarification :)

Actions

Also available in: Atom PDF

Like1
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0