Project

General

Profile

Actions

Bug #21150

closed

Segfault in MacOS libunwind (c backtrace info) when called from a ractor

Added by luke-gru (Luke Gruber) 4 days ago. Updated 2 days ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:121127]

Description

On Macos Arm64 with llvm18, libunwind fails with SEGV when called within a Ractor. So, the bug
report fails with a SEGV and quits right before giving C backtrace information. It looks like:

-- Ruby level backtrace information ----------------------------------------
../ruby/test.rb:49:in 'block in <main>'
<internal:ractor>:902:in 'fail_assert'

-- Threading information ---------------------------------------------------
Total ractor count: 2
Ruby thread count for this ractor: 1

-- C level backtrace information -------------------------------------------
<internal:ractor>:902: [BUG] Segmentation fault at 0xfffffffffffffff8

It tried to dereference the value -8, it looks like.

To reproduce:

test.rb:

r = Ractor.new do
  Ractor.fail_assert # to produce a bug report
end
r.take

ractor.rb:

  def self.fail_assert
    __builtin_cexpr! %q{
      VM_ASSERT(0), Qfalse
    }
  end

System info:

clang --version:
Homebrew clang version 18.1.8
Target: arm64-apple-darwin24.3.0
Thread model: posix

otool -L miniruby:
/opt/homebrew/opt/llvm@18/lib/libunwind.1.dylib (compatibility version 1.0.0, current version 1.0.0)

I haven't tried to reproduce it on another system, but I did try with clang 16 and got the same results.

Possible Causes

This is just a guess, but I think the coroutine context switching is messing up libunwind's stack unwinding heuristic.

Other issues that this causes

Right now, if ruby receives a SEGV in a ractor, it tries to print the bug report and then receives another SEGV when
running the libunwind code. This hangs the program because the sigaction for the SEGV signal was installed without SA_NODEFER, so
that SEGV is blocked (masked) by the running handler. The program can't make any forward progress, so it hangs. The solution
here is just to install the fatal handlers with SA_NODEFER. There is code already that checks if the bug report has already been called
and it just aborts the process.

Actions #1

Updated by luke-gru (Luke Gruber) 4 days ago

  • Subject changed from Segfault in Ractor messes up libunwind (c backtrace info) to Segfault in libunwind (c backtrace info) when called from a ractor
Actions #2

Updated by luke-gru (Luke Gruber) 4 days ago

  • Subject changed from Segfault in libunwind (c backtrace info) when called from a ractor to Segfault in MacOS libunwind (c backtrace info) when called from a ractor

Updated by luke-gru (Luke Gruber) 4 days ago ยท Edited

When printing out the backtrace after each iteration of unw_step(&cursor), the one right before the crashing call to unw_get_reg is:

/Users/luke/workspace/ruby-build/miniruby(rb_assert_failure_detail) (null):0
/Users/luke/workspace/ruby-build/miniruby(builtin_inline_class_908+0x0) [0x10447c854] ../ruby/ractor.rb:903
/Users/luke/workspace/ruby-build/miniruby(builtin_inline_class_902) (null):0
/Users/luke/workspace/ruby-build/miniruby(invoke_bf+0x40) [0x104585e1c] ../ruby/vm_insnhelper.c:7394
/Users/luke/workspace/ruby-build/miniruby(vm_invoke_builtin_delegate) ../ruby/vm_insnhelper.c:7418
/Users/luke/workspace/ruby-build/miniruby(vm_exec_core+0x8258) [0x1045612e4] ../ruby/insns.def:1657
/Users/luke/workspace/ruby-build/miniruby(rb_vm_exec+0x184) [0x104557ac4] ../ruby/vm.c:2580
/Users/luke/workspace/ruby-build/miniruby(invoke_iseq_block_from_c+0x138) [0x104570be8] ../ruby/vm.c:1611
/Users/luke/workspace/ruby-build/miniruby(invoke_block_from_c_proc) ../ruby/vm.c:1705
/Users/luke/workspace/ruby-build/miniruby(vm_invoke_proc) ../ruby/vm.c:1735
/Users/luke/workspace/ruby-build/miniruby(thread_do_start_proc+0x148) [0x1045291e8] ../ruby/thread.c:584
/Users/luke/workspace/ruby-build/miniruby(thread_do_start+0xa0) [0x104528a34] ../ruby/thread.c:626
/Users/luke/workspace/ruby-build/miniruby(thread_start_func_2) ../ruby/thread.c:677
/Users/luke/workspace/ruby-build/miniruby(call_thread_start_func_2+0x18) [0x1045295dc] ../ruby/thread_pthread.c:2175
/Users/luke/workspace/ruby-build/miniruby(co_start) ../ruby/thread_pthread_mn.c:453
/Users/luke/workspace/ruby-build/miniruby(co_start+0x0) [0x10452954c] ../ruby/thread_pthread.c:126
/Users/luke/workspace/ruby-build/miniruby(thread_cleanup_func_before_exec) (null):0 <------- This is clearly wrong, that function is never called and bad address
<internal:ractor>:902: [BUG] Segmentation fault at 0xfffffffffffffff8

So it appears it is the coroutine stuff that is messing it up. One way to fix this is:

        if (unw_get_reg(&cursor, UNW_REG_IP, &ip) == 0) {
            // Strip Arm64's pointer authentication.
            // https://developer.apple.com/documentation/security/preparing_your_app_to_work_with_pointer_authentication
            // I wish I could use "ptrauth_strip()" but I get an error:
            // "this target does not support pointer authentication"
            trace[n++] = (void *)(ip & 0x7fffffffffffull);
            extern COROUTINE co_start(struct coroutine_context *from, struct coroutine_context *self);
            // Apple's libunwind can't handle our coroutine switching code
            if ((void*)ip == (void*)co_start) {
                break;
            }
        }

This works fine, but I don't know if other platforms also need this check. Also a better fix would be a lot nicer, being able to see the entire backtrace.

Actions #4

Updated by nobu (Nobuyoshi Nakada) 2 days ago

  • Status changed from Open to Closed

Applied in changeset git|1bc57b5e0e3cd15e8702c8856a276e98b6e46ba8.


[Bug #21150] macOS: Temporary workaround at unwinding coroutine

On arm64 macOS, libunwind (both of system library and homebrew
llvm-18) seems not to handle our coroutine switching code.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0