Project

General

Profile

Actions

Bug #20017

closed

3.3.0dev `rb_thread_profile_frames` crashes when `RUBY_MN_THREADS=1`

Added by byroot (Jean Boussier) 5 months ago. Updated 4 months ago.

Status:
Closed
Target version:
-
ruby -v:
ruby 3.3.0dev (2023-11-19T03:01:05Z shopify 9aee12cc28) +MN [x86_64-linux]
[ruby-core:115459]

Description

I discovered this while running our internal CI with MaNy enabled, our application crash when trying to profile with StackProf (with postponed jobs disabled):

[BUG] Segmentation fault at 0x0000000000000010
ruby 3.3.0dev (2023-11-19T03:01:05Z shopify 9aee12cc28) +MN [x86_64-linux]

-- C level backtrace information -------------------------------------------
/usr/local/ruby/bin/ruby(rb_print_backtrace+0x14) [0x55c8f16333e1] /tmp/ruby-build/ruby-3.3.0-9aee12cc28cbca40306784e54e38558688caa9f7/vm_dump.c:812
/usr/local/ruby/bin/ruby(rb_vm_bugreport) /tmp/ruby-build/ruby-3.3.0-9aee12cc28cbca40306784e54e38558688caa9f7/vm_dump.c:1143
/usr/local/ruby/bin/ruby(rb_bug_for_fatal_signal+0xfc) [0x55c8f17e559c] /tmp/ruby-build/ruby-3.3.0-9aee12cc28cbca40306784e54e38558688caa9f7/error.c:1065
/usr/local/ruby/bin/ruby(sigsegv+0x4d) [0x55c8f158091d] /tmp/ruby-build/ruby-3.3.0-9aee12cc28cbca40306784e54e38558688caa9f7/signal.c:920
/lib/x86_64-linux-gnu/libc.so.6(0x7ff3db333520) [0x7ff3db333520]
/usr/local/ruby/bin/ruby(thread_profile_frames+0x10) [0x55c8f162efd0] /tmp/ruby-build/ruby-3.3.0-9aee12cc28cbca40306784e54e38558688caa9f7/vm_backtrace.c:1587
/tmp/bundle/ruby/3.3.0+0/gems/stackprof-0.2.25/lib/stackprof/stackprof.so(stackprof_buffer_sample+0x28) [0x7ff3b7d2435c] /tmp/bundle/ruby/3.3.0+0/gems/stackprof-0.2.25/ext/stackprof/stackprof.c:622
/tmp/bundle/ruby/3.3.0+0/gems/stackprof-0.2.25/lib/stackprof/stackprof.so(stackprof_buffer_sample) /tmp/bundle/ruby/3.3.0+0/gems/stackprof-0.2.25/ext/stackprof/stackprof.c:604
/tmp/bundle/ruby/3.3.0+0/gems/stackprof-0.2.25/lib/stackprof/stackprof.so(stackprof_buffer_sample) (null):0
/tmp/bundle/ruby/3.3.0+0/gems/stackprof-0.2.25/lib/stackprof/stackprof.so(stackprof_signal_handler+0x5) [0x7ff3b7d24545] /tmp/bundle/ruby/3.3.0+0/gems/stackprof-0.2.25/ext/stackprof/stackprof.c:767
/tmp/bundle/ruby/3.3.0+0/gems/stackprof-0.2.25/lib/stackprof/stackprof.so(stackprof_signal_handler) /tmp/bundle/ruby/3.3.0+0/gems/stackprof-0.2.25/ext/stackprof/stackprof.c:722
/lib/x86_64-linux-gnu/libc.so.6(0x7ff3db333520) [0x7ff3db333520]
/lib/x86_64-linux-gnu/libc.so.6(pthread_cond_wait+0x24a) [0x7ff3db384a7a]
/usr/local/ruby/bin/ruby(rb_native_cond_wait+0xb) [0x55c8f15c8cbb] /tmp/ruby-build/ruby-3.3.0-9aee12cc28cbca40306784e54e38558688caa9f7/thread_pthread.c:214
/usr/local/ruby/bin/ruby(ractor_sched_deq) /tmp/ruby-build/ruby-3.3.0-9aee12cc28cbca40306784e54e38558688caa9f7/thread_pthread.c:1230
/usr/local/ruby/bin/ruby(nt_start) /tmp/ruby-build/ruby-3.3.0-9aee12cc28cbca40306784e54e38558688caa9f7/thread_pthread.c:2209

For additional context, as far as I know rb_thread_profile_frames isn't officially async signal safe, but it happens to be since 3.0 for the interpreter, and 3.3 for YJIT, so StackProf tries not to use postponed jobs for profiling when it can as it leads to much more accurate profiling results.

Ref: https://github.com/tmm1/stackprof/issues/221

Updated by jhawthorn (John Hawthorn) 4 months ago

I opened a PR with a proposal to fix this https://github.com/ruby/ruby/pull/9310

The issue is that under M:N threads the current thread's EC can become NULL (when running on a shared native thread if there are no more threads to schedule). This PR fixes the issue by returning 0 in that case.

Actions #2

Updated by jhawthorn (John Hawthorn) 4 months ago

  • Status changed from Open to Closed

Applied in changeset git|ffa5f16273f46c97bfca56e4549b0b38b9322d63.


Make rb_profile_frames return 0 for NULL ec

When using M:N threads, EC is set to NULL in the shared native thread
when nothing is scheduled. This previously caused a segfault when we try
to examine the EC.

Returning 0 instead means we may miss profiling information, but a
profiler relying on this isn't thread aware anyways, and observing that
"nothing" is running is probably correct.

Fixes [Bug #20017]

Co-authored-by: Dustin Brown

Actions

Also available in: Atom PDF

Like1
Like0Like0