Project

General

Profile

Actions

Bug #20016

closed

3.3.0dev `rb_postponed_job_register_one` crashes when `RUBY_MN_THREADS=1`

Added by byroot (Jean Boussier) 5 months ago. Updated 4 months ago.

Status:
Closed
Target version:
-
ruby -v:
ruby 3.3.0dev (2023-11-22T17:01:13Z shopify c1fc1a00ea) +MN [x86_64-linux]
[ruby-core:115458]

Description

I discovered this while running our internal CI with MaNy enabled, our application crash when trying to profile with StackProf:

[BUG] Segmentation fault at 0x0000000000000020
ruby 3.3.0dev (2023-11-22T17:01:13Z shopify c1fc1a00ea) +MN [x86_64-linux]

-- Machine register context ------------------------------------------------
 RIP: 0x000055df5fe38489 RBP: 0x00007f517bc59000 RSP: 0x00007f5123f3c6d0
 RAX: 0x0000000000000000 RBX: 0x00007f51596554e0 RCX: 0x00007f517bdb6b40
 RDX: 0x0000000000000001 RDI: 0x00007f517bc59000 RSI: 0x0000000000000000
  R8: 0x0000000000000000  R9: 0x00000000ffffffff R10: 0x0000000000000000
 R11: 0x0000000000000246 R12: 0x0000000000000000 R13: 0x0000000000000000
 R14: 0x0000000000001ea5 R15: 0x00007f517bc59104 EFL: 0x0000000000010202

-- C level backtrace information -------------------------------------------
/usr/local/ruby/bin/ruby(rb_print_backtrace+0x14) [0x55df5fe328e1] /tmp/ruby-build/ruby-3.3.0-c1fc1a00ea9633961153451d0e927db49c1b268d/vm_dump.c:812
/usr/local/ruby/bin/ruby(rb_vm_bugreport) /tmp/ruby-build/ruby-3.3.0-c1fc1a00ea9633961153451d0e927db49c1b268d/vm_dump.c:1143
/usr/local/ruby/bin/ruby(rb_bug_for_fatal_signal+0xfc) [0x55df5ffe509c] /tmp/ruby-build/ruby-3.3.0-c1fc1a00ea9633961153451d0e927db49c1b268d/error.c:1065
/usr/local/ruby/bin/ruby(sigsegv+0x4d) [0x55df5fd7f19d] /tmp/ruby-build/ruby-3.3.0-c1fc1a00ea9633961153451d0e927db49c1b268d/signal.c:920
/lib/x86_64-linux-gnu/libc.so.6(0x7f517c277520) [0x7f517c277520]
/usr/local/ruby/bin/ruby(rbimpl_atomic_or+0x0) [0x55df5fe38489] /tmp/ruby-build/ruby-3.3.0-c1fc1a00ea9633961153451d0e927db49c1b268d/vm_trace.c:1691
/usr/local/ruby/bin/ruby(postponed_job_register) /tmp/ruby-build/ruby-3.3.0-c1fc1a00ea9633961153451d0e927db49c1b268d/vm_trace.c:1693
/usr/local/ruby/bin/ruby(postponed_job_register) /tmp/ruby-build/ruby-3.3.0-c1fc1a00ea9633961153451d0e927db49c1b268d/vm_trace.c:1675
/usr/local/ruby/bin/ruby(rb_postponed_job_register_one) /tmp/ruby-build/ruby-3.3.0-c1fc1a00ea9633961153451d0e927db49c1b268d/vm_trace.c:1746
/tmp/bundle/ruby/3.3.0+0/gems/stackprof-0.2.25/lib/stackprof/stackprof.so(stackprof_signal_handler+0x2d) [0x7f5159655434] /tmp/bundle/ruby/3.3.0+0/gems/stackprof-0.2.25/ext/stackprof/stackprof.c:763
/tmp/bundle/ruby/3.3.0+0/gems/stackprof-0.2.25/lib/stackprof/stackprof.so(stackprof_signal_handler) /tmp/bundle/ruby/3.3.0+0/gems/stackprof-0.2.25/ext/stackprof/stackprof.c:722
/lib/x86_64-linux-gnu/libc.so.6(0x7f517c277520) [0x7f517c277520]
/lib/x86_64-linux-gnu/libc.so.6(0x7f517c2c6117) [0x7f517c2c6117]
/lib/x86_64-linux-gnu/libc.so.6(pthread_cond_wait+0x211) [0x7f517c2c8a41]
/usr/local/ruby/bin/ruby(rb_native_cond_wait+0xb) [0x55df5fdc75fb] /tmp/ruby-build/ruby-3.3.0-c1fc1a00ea9633961153451d0e927db49c1b268d/thread_pthread.c:214
/usr/local/ruby/bin/ruby(ractor_sched_deq) /tmp/ruby-build/ruby-3.3.0-c1fc1a00ea9633961153451d0e927db49c1b268d/thread_pthread.c:1230
/usr/local/ruby/bin/ruby(nt_start) /tmp/ruby-build/ruby-3.3.0-c1fc1a00ea9633961153451d0e927db49c1b268d/thread_pthread.c:2209
/lib/x86_64-linux-gnu/libc.so.6(0x7f517c2c9ac3) [0x7f517c2c9ac3]

Ref: https://github.com/tmm1/stackprof/issues/221

Updated by jhawthorn (John Hawthorn) 4 months ago

I opened a PR with a proposal to fix this. https://github.com/ruby/ruby/pull/9311

The issue is that under M:N threads rb_vm_main_ractor_ec now returns NULL when all threads in the main Ractor's shared thread are sleeping. rb_postponed_job_register_one needs to always be able to find a working EC.

Actions #2

Updated by jhawthorn (John Hawthorn) 4 months ago

  • Status changed from Open to Closed

Applied in changeset git|1f0304218cf00e05a4a126196676ba221ebf91f6.


Use main_thread->ec from rb_vm_main_ractor_ec

rb_vm_main_ractor_ec was introduced to allow rb_postponed_job_* to work
when fired on non-Ruby threads, which have no EC set, and that is its
only use.

When RUBY_MN_THREADS=1 is set ractor->threads.running_ec is NULL when
the shared thread is sleeping. This instead grabs the EC directly from
the main thread which seems to always be set.

Fixes [Bug #20016]

Co-authored-by: Dustin Brown

Actions

Also available in: Atom PDF

Like1
Like0Like0