Project

General

Profile

Actions

Bug #19991

closed

rb_register_postponed_job async-signal-unsafety causes crash in GC

Added by kjtsanaktsidis (KJ Tsanaktsidis) about 1 year ago. Updated 11 months ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:115286]

Description

Our production application experienced an interpreter crash today that I’m fairly sure is the result of a bug in the rb_register_postponed_job infrastructure.

Diagnosis

I’ve attached a more complete version of the crash dump, but the key information I think is the following backtrace:

<internal:gc>:35: [BUG] Segmentation fault at 0x00000000000000c8
....
/usr/lib/libruby.so.3.1(rb_vm_bugreport+0x60c) [0xffffaaf57a6c] vm_dump.c:759
/usr/lib/libruby.so.3.1(rb_bug_for_fatal_signal+0xd8) [0xffffaad6d120] error.c:821
/usr/lib/libruby.so.3.1(sigsegv+0x58) [0xffffaaeb4350] signal.c:964
/usr/lib/libruby.so.3.1(sigill) (null):0
linux-vdso.so.1(__kernel_rt_sigreturn+0x0) [0xffffab20f78c]
/usr/lib/libruby.so.3.1(rbimpl_atomic_exchange+0x0) [0xffffaad94018] gc.c:4081
/usr/lib/libruby.so.3.1(gc_finalize_deferred) gc.c:4081
/usr/lib/libruby.so.3.1(rb_postponed_job_flush+0x218) [0xffffaaf5dd60] vm_trace.c:1728
/usr/lib/libruby.so.3.1(rb_threadptr_execute_interrupts+0x328) [0xffffaaef7520] thread.c:2444
/usr/lib/libruby.so.3.1(vm_exec_core+0x4e54) [0xffffaaf41c6c] vm.inc:588
/usr/lib/libruby.so.3.1(rb_vm_exec+0x144) [0xffffaaf427cc] vm.c:2211

I disassembled gc_finalize_deferred in GDB (different execution, so the addresses don’t line up - this is NOT from a core dump):

(gdb) disassemble gc_finalize_deferred
Dump of assembler code for function gc_finalize_deferred:
   0x0000fffff7cc3ff8 <+0>:  stp x29, x30, [sp, #-48]!
   0x0000fffff7cc3ffc <+4>:  mov w1, #0x1 // #1
   0x0000fffff7cc4000 <+8>:  mov x29, sp
   0x0000fffff7cc4004 <+12>: stp x19, x20, [sp, #16]
   0x0000fffff7cc4008 <+16>: mov x19, x0
   0x0000fffff7cc400c <+20>: add x20, x0, #0xc8
   0x0000fffff7cc4010 <+24>: ldaxr w0, [x20]
   .... continues ....

Based on the line numbers from the debuginfo, the faulting instruction is likely to be gc_finalize_deferred+24 (which is the inlined rbimpl_atomic_exchange). That would attempt a load of 0xc8 if x0 was zero - i.e. if gc_finalize_deferred was called with a NULL objspace.

The enqueuing of gc_finalize_deferred with postponed job only happens in one place (in gc_sweep_page, here) and if objspace was null there then the crash would have had to have already happened in gc_sweep_page. Thus, I think that gc_finalize_deferred was enqueued into the postponed job list with a not-NULL argument, but the argument was corrupted whilst it was in vm->postponed_job_buffer, and if objspace was null there then the crash would have had to have already happened in gc_sweep_page. Thus, I think that gc_finalize_deferred was enqueued into the postponed job list with a not-NULL argument, but the argument was corrupted whilst it was in vm->postponed_job_buffer.

I had a look at the postponed job code, which is of course very tricky because it needs to be async-signal-safe. More specifically:

  • It needs to work if run from a thread not holding the GVL
  • It needs to work if run from a thread, whilst another thread is actually executing rb_postponed_job_flush
  • It needs to work if run from a signal caught in a thread that is currently executing rb_postponed_job_flush (this rules out trivial mutex-based solutions)

We use the Datadog continuous profiler in our application (CC: @ivoanjo ;), which calls rb_postponed_job_register to capture profiling samples. Thus, I think our application is likely to hit all of those scenarios semi-reguarly.

My best guess at a plausible sequence of events, to generate this particular crash, is that:

  1. rb_postponed_job_flush was running on thread T1.
  2. There is a queued call to gc_finalize_deferred sitting in vm->postponed_job_buffer[vm->postponed_job_index-1].
  3. T1 executed the ATOMIC_CAS at vm_trace.c:1800, decrementing vm->postponed_job_index (which now equals index - 1) and determining that a job at index index should be run.
  4. Thread T2 received a signal, and the Datadog continuous profiler called rb_postponed_job_register_one(0, sample_from_postponed_job, NULL)
  5. T2 executed the ATOMIC_CAS at vm_trace.c:1679, re-incrementing vm->postponed_job_index. It’s now equal to index from T1 again.
  6. T2 then executes the sets on pjob->func and pjob->data at vm_trace.c:1687. It sets ->func to sample_from_postponed_job (from ddtrace), and ->data to 0.
  7. T1 then goes to call (*pjob->func)(pjob->data) at vm_trace.c:1802
  8. Since there is no memory barrier between 6 & 7, T1 is allowed to see the effects of the set on pjob->data and not see that of pjob->func.
    9, T1 thus calls gc_finalize_deferred (which it was meant to do) with an argument of 0 (which it was not).

Solution

Managing a thread-safe list of too-big-to-be-atomic objects (like rb_postponed_job_t) is really tricky. I think it might be best for this code if we use a combination of masking signals (to prevent manipulation of the postponed job list occurring during rb_postponed_job_flush, and using a semaphore to protect the list (since semaphores are required to be async-signal-safe on POSIX systems). I've implemented this in a PR here: https://github.com/ruby/ruby/pull/8856

It seems slightly slower to do it this way - semaphores require system calls in the uncontended case, which is why they're async-signal-safe but also makes them more expensive than pthread mutexes, which don't, on most systems. I ran my branch through yjit bench:

With my patch:

interp: ruby 3.3.0dev (2023-11-07T08:14:09Z ktsanaktsidis/post.. 342f30f566) [arm64-darwin22]
yjit: ruby 3.3.0dev (2023-11-07T08:14:09Z ktsanaktsidis/post.. 342f30f566) +YJIT [arm64-darwin22]

--------------  -----------  ----------  ---------  ----------  ------------  -----------
bench           interp (ms)  stddev (%)  yjit (ms)  stddev (%)  yjit 1st itr  interp/yjit
activerecord    31.2         3.4         17.0       5.7         1.29          1.83
chunky-png      543.5        0.5         367.0      0.7         1.40          1.48
erubi-rails     1044.2       0.6         564.7      1.3         1.69          1.85
hexapdf         1517.6       3.1         917.6      1.2         1.46          1.65
liquid-c        37.1         1.3         28.9       1.4         0.89          1.29
liquid-compile  39.0         1.4         29.9       1.6         0.76          1.30
liquid-render   89.9         1.8         39.6       1.4         1.37          2.27
lobsters        598.2        1.7         435.4      5.2         0.63          1.37
mail            79.8         3.1         52.5       1.0         0.79          1.52
psych-load      1441.5       1.7         885.4      0.5         1.60          1.63
railsbench      1010.8       1.0         609.3      1.3         1.24          1.66
ruby-lsp        40.9         3.4         29.2       30.0        0.66          1.40
sequel          39.8         1.8         33.0       2.4         1.18          1.21
--------------  -----------  ----------  ---------  ----------  ------------  -----------

Without the patch:

interp: ruby 3.3.0dev (2023-11-07T07:56:43Z master 5a2779d40f) [arm64-darwin22]
yjit: ruby 3.3.0dev (2023-11-07T07:56:43Z master 5a2779d40f) +YJIT [arm64-darwin22]

--------------  -----------  ----------  ---------  ----------  ------------  -----------
bench           interp (ms)  stddev (%)  yjit (ms)  stddev (%)  yjit 1st itr  interp/yjit
activerecord    31.3         3.3         16.7       5.5         1.36          1.88
chunky-png      521.6        0.6         348.8      0.7         1.40          1.50
erubi-rails     1038.9       0.9         566.3      1.2         1.70          1.83
hexapdf         1501.9       1.1         951.7      3.9         1.42          1.58
liquid-c        36.7         1.2         29.3       1.7         0.86          1.25
liquid-compile  38.8         1.1         29.7       3.7         0.73          1.31
liquid-render   92.2         0.9         38.3       1.0         1.47          2.40
lobsters        582.5        2.0         429.8      5.6         0.59          1.36
mail            77.9         1.3         54.8       0.9         0.76          1.42
psych-load      1419.1       0.7         887.7      0.5         1.60          1.60
railsbench      1017.8       1.1         609.9      1.2         1.24          1.67
ruby-lsp        41.0         2.2         28.8       28.8        0.64          1.43
sequel          36.0         1.5         30.4       1.8         1.11          1.18
--------------  -----------  ----------  ---------  ----------  ------------  -----------

Maybe this is within the noise floor, but I thought I should bring it up.


Files

crash.txt (3.69 KB) crash.txt kjtsanaktsidis (KJ Tsanaktsidis), 11/07/2023 09:38 AM

Updated by ivoanjo (Ivo Anjo) about 1 year ago

Thanks for sharing this one 😅.

The datadog profiler tries to mimic what stackprof does for interruption (signal + postponed_job_register) but indeed the race you're describing seems feasible, thus making the rb_postponed_job_register actually not signal-safe as it's intended.

Updated by ivoanjo (Ivo Anjo) about 1 year ago

I guess this can be closed since https://github.com/ruby/ruby/pull/8949 was merged? ;)

Actions #4

Updated by jeremyevans0 (Jeremy Evans) about 1 year ago

  • Status changed from Open to Closed

Updated by ivoanjo (Ivo Anjo) 11 months ago

I've also documented how to workaround this issue in the Datadog Ruby profiler in https://docs.datadoghq.com/profiler/profiler_troubleshooting/ruby/#segmentation-faults-in-gc_finalize_deferred-in-ruby-versions-26-to-32 although we're still considering putting in a workaround in the profiler itself.

Actions

Also available in: Atom PDF

Like2
Like0Like0Like0Like0Like0