Bug #19234


[3.2.0dev] YJIT code GC can lead to crashes

Added by byroot (Jean Boussier) over 1 year ago. Updated over 1 year ago.

Target version:
ruby -v:
ruby 3.2.0dev (2022-12-13T16:07:29Z master a66a69865d) +YJIT [x86_64-linux]


Filing this bug here in case some people may have observed it too and may have more information, and also to keep track of it for the upcoming 3.2.0 release.

After changing some settings on our CI to make sure YJIT's code_gc would trigger, we discovered that it sometimes cause crashes.

The crash can take many different form (e.g. [BUG] Segmentation fault at 0x00005604a8e78006 or [BUG] Illegal instruction at 0x0000aaaacc0ce4c0), and happens on both x86 and arm64.

It however happens very consistently on our CI, but only after running for 15 to 20 minutes and we haven't been able to reduce it to a local reproduction script.

When it happens however the backtrace isn't really helpful:

-- C level backtrace information -------------------------------------------
/usr/local/ruby/bin/real-ruby(rb_print_backtrace+0x11) [0x5604a8a6df7d] vm_dump.c:770
/usr/local/ruby/bin/real-ruby(rb_vm_bugreport) vm_dump.c:1065
/usr/local/ruby/bin/real-ruby(rb_bug_for_fatal_signal+0xee) [0x5604a8ba927e] error.c:813
/usr/local/ruby/bin/real-ruby(sigsegv+0x4d) [0x5604a89c3ded] signal.c:964
/lib/x86_64-linux-gnu/ [0x7fb5b4285420]

Like regular GC bugs, it is likely that the code GC need to trigger at a very specific place for the bug to happen. Our attempts at triggering it manually with RubyVM::YJIT.code_gc or to set the executable memory very low to trigger it more often didn't allow for a simpler reproduction.

Both @k0kubun (Takashi Kokubun) and @alanwu (Alan Wu) are investigating it right now.

Actions #1

Updated by byroot (Jean Boussier) over 1 year ago

  • Backport changed from 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN to 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED
Actions #2

Updated by alanwu (Alan Wu) over 1 year ago

  • Status changed from Open to Closed

Applied in changeset git|5fa608ed79645464bf80fa318d89745159301471.

YJIT: Fix code GC freeing stubs with a trampoline (#6937)

Stubs we generate for invalidation don't necessarily co-locate with the
code that jump to the stub. Since we rely on co-location to keep stubs
alive as they are in the outlined code block, it used to be possible for
code GC inside branch_stub_hit() to free the stub that's its direct
caller, leading us to return to freed code after.

Stubs used to look like:

mov arg0, branch_ptr
mov arg1, target_idx
mov arg2, ec
call branch_stub_hit
jmp return_reg

Since the call and the jump after the call is the same for all stubs, we
can extract them and use a static trampoline for them. That makes
branch_stub_hit() always return to static code. Stubs now look like:

mov arg0, branch_ptr
mov arg1, target_idx
jmp trampoline

Where the trampoline is:

mov arg2, ec
call branch_stub_hit
jmp return_reg

Code GC can now free stubs without problems since we'll always return
to the trampoline, which we generate once on boot and lives forever.

This might save a small bit of memory due to factoring out the static
part of stubs, but it's probably minor.

[Bug #19234]

Co-authored-by: Takashi Kokubun


Also available in: Atom PDF