Bug #21257
open
YJIT can generate infinite loop when OOM
Description
We've found an edge case where YJIT can generate an infinite loop (jump to the same address) when it's out-of-memory.
Reproduction:
def first
second
end
def second
::File
end
# Make `second` side exit on its first instruction
trace = TracePoint.new(:line) { }
trace.enable(target: method(:second))
32.times do |i|
puts i
first
if i == 29
# We've JITed the methods now - trigger the bug
# Trigger a constant cache miss in rb_vm_opt_getconstant_path (in `second`) next time it's called
module InvalidateConstantCache
File = nil
end
# nb. this only works in yjit dev mode
RubyVM::YJIT.simulate_oom!
end
end
This hangs indefinitely when run with YJIT (./configure --enable-yjit=dev
is required for simulate_oom).
If we attach a debugger to the Ruby process at this point, it's stuck in an infinite loop:
$ lldb -p 9753
(lldb) process attach --pid 9753
Process 9753 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
frame #0: 0x0000000104b202b8
-> 0x104b202b8: b 0x104b202b8
0x104b202bc: nop
0x104b202c0: nop
0x104b202c4: nop
Target 0: (ruby) stopped.
Executable module set to "/Users/rian/opt/ruby/bin/ruby".
Architecture set to: arm64-apple-macosx-.
We've reproduced this on:
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) [x86_64-linux]
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin23]
ruby 3.5.0dev (2025-04-08T06:46:45Z master b68fe530f1) +PRISM [arm64-darwin23]
Updated by rianmcguire (Rian McGuire) 5 days ago
ยท Edited
YJIT compiles the first
and second
methods to this (on x86_64-linux):
# regenerate_branch
# Block: first@infinite-jmp.rb:2 (chain_depth: 1)
# reg_temps: 00000001
# Insn: 0001 opt_send_without_block (stack_size: 1)
# call to Object#second
# guard known object with singleton class
0x5571d6436187: movabs rax, 0x7f6ea166c400
0x5571d6436191: cmp rsi, rax
0x5571d6436194: jne 0x5571d6438181
# stack overflow check
0x5571d643619a: lea rax, [rbx + 0x80]
0x5571d64361a1: cmp r13, rax
0x5571d64361a4: jbe 0x5571d64381a1
# store caller sp
0x5571d64361aa: lea rax, [rbx]
0x5571d64361ad: mov qword ptr [r13 + 8], rax
# save PC to CFP
0x5571d64361b1: movabs rax, 0x557205c1ce58
0x5571d64361bb: mov qword ptr [r13], rax
0x5571d64361bf: lea rax, [rbx + 0x20]
# push cme, specval, frame type
0x5571d64361c3: movabs rcx, 0x7f6e9decba30
0x5571d64361cd: mov qword ptr [rax - 0x18], rcx
0x5571d64361d1: mov qword ptr [rax - 0x10], 0
0x5571d64361d9: mov qword ptr [rax - 8], 0x11110003
# push callee control frame
0x5571d64361e1: mov qword ptr [r13 - 0x30], rax
0x5571d64361e5: movabs rcx, 0x7f6e9decbe50
0x5571d64361ef: mov qword ptr [r13 - 0x28], rcx
0x5571d64361f3: mov qword ptr [r13 - 0x20], rsi
0x5571d64361f7: mov qword ptr [r13 - 0x10], 0
# spill_temps: 00000001 -> 00000000
0x5571d64361ff: mov qword ptr [rbx], rsi
0x5571d6436202: mov rbx, rax
0x5571d6436205: sub rax, 8
0x5571d6436209: mov qword ptr [r13 - 0x18], rax
# update cfp->jit_return
0x5571d643620d: movabs rax, 0x5571d64381c5
0x5571d6436217: mov qword ptr [r13 - 8], rax
# switch to new CFP
0x5571d643621b: sub r13, 0x38
0x5571d643621f: mov qword ptr [r12 + 0x10], r13
# gen_direct_jmp: fallthrough
# Block: second@infinite-jmp.rb:6
# reg_temps: 00000000
# exit to interpreter on trace_opt_getconstant_path
0x5571d6436224: movabs rax, 0x557205c1d580
0x5571d643622e: mov qword ptr [r13], rax
0x5571d6436232: pop rbx
0x5571d6436233: pop r12
0x5571d6436235: pop r13
0x5571d6436237: mov eax, 0x24
0x5571d643623c: ret
Notably:
- the first method is a fallthrough to the second - the branch is BranchGenFn::JumpToTarget0 and BranchShape::Next0, so the branch is effectively empty (see gen_direct_jmp).
- the second method exits to the interpreter on its first instruction
After the methods have been compiled, the reproduction causes rb_yjit_constant_ic_update and invalidate_block_version to be called for the second method, which generates the infinite loop:
Invalidating block from second@infinite-jmp.rb:6, ISEQ offsets [0, 0)
# gen_direct_jmp: fallthrough
# Block: second@infinite-jmp.rb:6
# reg_temps: 00000000
# exit to interpreter on trace_opt_getconstant_path
# regenerate_branch
0x5571d6436224: jmp 0x5571d6436224
invalidate_block_version
skips patching block to jump to block.entry_exit, because it exits on entry already:
if block_start == block_entry_exit {
// Some blocks exit on entry. Patching a jump to the entry at the
// entry makes an infinite loop.
} else {
It then rewrites the incoming branch from the first method. As we're OOM, gen_branch_stub returns None, and we fall back to using the invalidated block's exit for the branch target, rather than a new stub:
// Create a stub for this branch target
let stub_addr = gen_branch_stub(block.ctx, block.iseq.get(), ocb, branchref.as_ptr() as usize, target_idx as u32);
// In case we were unable to generate a stub (e.g. OOM). Use the block's
// exit instead of a stub for the block. It's important that we
// still patch the branch in this situation so stubs are unique
// to branches. Think about what could go wrong if we run out of
// memory in the middle of this loop.
let stub_addr = stub_addr.unwrap_or(block_entry_exit);
The invalidated block immediately follows the branch (it's a fallthrough), which we detect and update the branch shape to BranchShape::Default:
// Check if the invalidated block immediately follows
let target_next = block.start_addr == branch.end_addr.get();
if target_next {
// The new block will no longer be adjacent.
// Note that we could be enlarging the branch and writing into the
// start of the block being invalidated.
branch.gen_fn.set_shape(BranchShape::Default);
}
This means when the branch is regenerated, we emit a jmp to the block exit address. The original branch code was zero-length fallthrough, so this jmp is written over the start of the invalidated block (this is allowed). However, because that block exits on entry, the jmp target is the start of address of that block and we end up with an infinite loop.
It feels like the invalid assumption here is that, if target_next is true, the new branch target will no longer be adjacent? This is normally true, as the target is a newly generated stub, but it falls down if gen_branch_stub failed (because we're OOM).
Updated by hsbt (Hiroshi SHIBATA) 5 days ago
- Status changed from Open to Assigned
- Assignee set to yjit