Bug #21257: YJIT can generate infinite loop when OOM - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #21257

closed

YJIT can generate infinite loop when OOM

Bug #21257: YJIT can generate infinite loop when OOM

Added by rianmcguire (Rian McGuire) 9 months ago. Updated 8 months ago.

Status:

Closed

Assignee:

jit

Target version:

ruby -v:

Backport:

3.2: DONTNEED, 3.3: DONE, 3.4: DONE

[ruby-core:121597]

Description

We've found an edge case where YJIT can generate an infinite loop (jump to the same address) when it's out-of-memory.

Reproduction:

def first
  second
end

def second
  ::File
end

# Make `second` side exit on its first instruction
trace = TracePoint.new(:line) { }
trace.enable(target: method(:second))

32.times do |i|
  puts i
  first

  if i == 29
    # We've JITed the methods now - trigger the bug

    # Trigger a constant cache miss in rb_vm_opt_getconstant_path (in `second`) next time it's called
    module InvalidateConstantCache
      File = nil
    end

    # nb. this only works in yjit dev mode
    RubyVM::YJIT.simulate_oom!
  end
end

This hangs indefinitely when run with YJIT (./configure --enable-yjit=dev is required for simulate_oom).

If we attach a debugger to the Ruby process at this point, it's stuck in an infinite loop:

$ lldb -p 9753
(lldb) process attach --pid 9753
Process 9753 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
    frame #0: 0x0000000104b202b8
->  0x104b202b8: b      0x104b202b8
    0x104b202bc: nop
    0x104b202c0: nop
    0x104b202c4: nop
Target 0: (ruby) stopped.
Executable module set to "/Users/rian/opt/ruby/bin/ruby".
Architecture set to: arm64-apple-macosx-.

We've reproduced this on:

ruby 3.3.6 (2024-11-05 revision 75015d4c1f) [x86_64-linux]
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin23]
ruby 3.5.0dev (2025-04-08T06:46:45Z master b68fe530f1) +PRISM [arm64-darwin23]

Updated by rianmcguire (Rian McGuire) 9 months ago · Edited 1Actions
Copy link
#1 [ruby-core:121598]

YJIT compiles the first and second methods to this (on x86_64-linux):

# regenerate_branch
# Block: first@infinite-jmp.rb:2 (chain_depth: 1)
# reg_temps: 00000001
# Insn: 0001 opt_send_without_block (stack_size: 1)
# call to Object#second
# guard known object with singleton class
0x5571d6436187: movabs rax, 0x7f6ea166c400
0x5571d6436191: cmp rsi, rax
0x5571d6436194: jne 0x5571d6438181
# stack overflow check
0x5571d643619a: lea rax, [rbx + 0x80]
0x5571d64361a1: cmp r13, rax
0x5571d64361a4: jbe 0x5571d64381a1
# store caller sp
0x5571d64361aa: lea rax, [rbx]
0x5571d64361ad: mov qword ptr [r13 + 8], rax
# save PC to CFP
0x5571d64361b1: movabs rax, 0x557205c1ce58
0x5571d64361bb: mov qword ptr [r13], rax
0x5571d64361bf: lea rax, [rbx + 0x20]
# push cme, specval, frame type
0x5571d64361c3: movabs rcx, 0x7f6e9decba30
0x5571d64361cd: mov qword ptr [rax - 0x18], rcx
0x5571d64361d1: mov qword ptr [rax - 0x10], 0
0x5571d64361d9: mov qword ptr [rax - 8], 0x11110003
# push callee control frame
0x5571d64361e1: mov qword ptr [r13 - 0x30], rax
0x5571d64361e5: movabs rcx, 0x7f6e9decbe50
0x5571d64361ef: mov qword ptr [r13 - 0x28], rcx
0x5571d64361f3: mov qword ptr [r13 - 0x20], rsi
0x5571d64361f7: mov qword ptr [r13 - 0x10], 0
# spill_temps: 00000001 -> 00000000
0x5571d64361ff: mov qword ptr [rbx], rsi
0x5571d6436202: mov rbx, rax
0x5571d6436205: sub rax, 8
0x5571d6436209: mov qword ptr [r13 - 0x18], rax
# update cfp->jit_return
0x5571d643620d: movabs rax, 0x5571d64381c5
0x5571d6436217: mov qword ptr [r13 - 8], rax
# switch to new CFP
0x5571d643621b: sub r13, 0x38
0x5571d643621f: mov qword ptr [r12 + 0x10], r13

# gen_direct_jmp: fallthrough
# Block: second@infinite-jmp.rb:6
# reg_temps: 00000000
# exit to interpreter on trace_opt_getconstant_path
0x5571d6436224: movabs rax, 0x557205c1d580
0x5571d643622e: mov qword ptr [r13], rax
0x5571d6436232: pop rbx
0x5571d6436233: pop r12
0x5571d6436235: pop r13
0x5571d6436237: mov eax, 0x24
0x5571d643623c: ret

Notably:

the first method is a fallthrough to the second - the branch is BranchGenFn::JumpToTarget0 and BranchShape::Next0, so the branch is effectively empty (see gen_direct_jmp).
the second method exits to the interpreter on its first instruction

After the methods have been compiled, the reproduction causes rb_yjit_constant_ic_update and invalidate_block_version to be called for the second method, which generates the infinite loop:

Invalidating block from second@infinite-jmp.rb:6, ISEQ offsets [0, 0)
  # gen_direct_jmp: fallthrough
  # Block: second@infinite-jmp.rb:6
  # reg_temps: 00000000
  # exit to interpreter on trace_opt_getconstant_path
  # regenerate_branch
  0x5571d6436224: jmp 0x5571d6436224

invalidate_block_version skips patching block to jump to block.entry_exit, because it exits on entry already:

        if block_start == block_entry_exit {
            // Some blocks exit on entry. Patching a jump to the entry at the
            // entry makes an infinite loop.
        } else {

It then rewrites the incoming branch from the first method. As we're OOM, gen_branch_stub returns None, and we fall back to using the invalidated block's exit for the branch target, rather than a new stub:

        // Create a stub for this branch target
        let stub_addr = gen_branch_stub(block.ctx, block.iseq.get(), ocb, branchref.as_ptr() as usize, target_idx as u32);

        // In case we were unable to generate a stub (e.g. OOM). Use the block's
        // exit instead of a stub for the block. It's important that we
        // still patch the branch in this situation so stubs are unique
        // to branches. Think about what could go wrong if we run out of
        // memory in the middle of this loop.
        let stub_addr = stub_addr.unwrap_or(block_entry_exit);

The invalidated block immediately follows the branch (it's a fallthrough), which we detect and update the branch shape to BranchShape::Default:

        // Check if the invalidated block immediately follows
        let target_next = block.start_addr == branch.end_addr.get();

        if target_next {
            // The new block will no longer be adjacent.
            // Note that we could be enlarging the branch and writing into the
            // start of the block being invalidated.
            branch.gen_fn.set_shape(BranchShape::Default);
        }

This means when the branch is regenerated, we emit a jmp to the block exit address. The original branch code was zero-length fallthrough, so this jmp is written over the start of the invalidated block (this is allowed). However, because that block exits on entry, the jmp target is the start of address of that block and we end up with an infinite loop.

It feels like the invalid assumption here is that, if target_next is true, the new branch target will no longer be adjacent? This is normally true, as the target is a newly generated stub, but it falls down if gen_branch_stub failed (because we're OOM).

Updated by hsbt (Hiroshi SHIBATA) 9 months ago Actions
Copy link
#2 [ruby-core:121600]

Status changed from Open to Assigned
Assignee set to jit

Updated by rianmcguire (Rian McGuire) 9 months ago 1Actions
Copy link
#3 [ruby-core:121741]

I've had a swing at fixing this in https://github.com/ruby/ruby/pull/13186

Updated by rianmcguire (Rian McGuire) 9 months ago Actions
Copy link
#4

Status changed from Assigned to Closed

Applied in changeset git|80a1a1bb8ae8435b916ae4f66a483e91ad31356a.

YJIT: Fix potential infinite loop when OOM (GH-13186)

Avoid generating an infinite loop in the case where:

Block first is adjacent to block second, and the branch from first to
second is a fallthrough, and
Block second immediately exits to the interpreter, and
Block second is invalidated and YJIT is OOM

While pondering how to fix this, I think I've stumbled on another related edge case:

Block incoming_one and incoming_two both branch to block second. Block
incoming_one has a fallthrough
Block second immediately exits to the interpreter (so it starts with its exit)
When Block second is invalidated, the incoming fallthrough branch from
incoming_one might be rewritten first, which overwrites the start of block
second with a jump to a new branch stub.
YJIT runs of out memory
The incoming branch from incoming_two is then rewritten, but because we're
OOM we can't generate a new stub, so we use second's exit as the branch
target. However second's exit was already overwritten with a jump to the
branch stub for incoming_one, so incoming_two will end up jumping to
incoming_one's branch stub.

Fixes [Bug #21257]

Updated by alanwu (Alan Wu) 9 months ago Actions
Copy link
#5

Backport changed from 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN to 3.2: DONTNEED, 3.3: REQUIRED, 3.4: REQUIRED

Updated by k0kubun (Takashi Kokubun) 8 months ago Actions
Copy link
#6 [ruby-core:122075]

Backport changed from 3.2: DONTNEED, 3.3: REQUIRED, 3.4: REQUIRED to 3.2: DONTNEED, 3.3: REQUIRED, 3.4: DONE

ruby_3_4 50b1759be00713535c41f5650feb3967c533450a.

Updated by nagachika (Tomoyuki Chikanaga) 8 months ago Actions
Copy link
#7 [ruby-core:122183]

Backport changed from 3.2: DONTNEED, 3.3: REQUIRED, 3.4: DONE to 3.2: DONTNEED, 3.3: DONE, 3.4: DONE

ruby_3_3 f57dd4470b9ba1e2e0007e814f94e8bb4fd2ab6f merged revision(s) 80a1a1bb8ae8435b916ae4f66a483e91ad31356a.

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Bug #21257

YJIT can generate infinite loop when OOM

Updated by rianmcguire (Rian McGuire) 9 months ago · Edited 1Actions
Copy link
#1 [ruby-core:121598]

Updated by hsbt (Hiroshi SHIBATA) 9 months ago Actions
Copy link
#2 [ruby-core:121600]

Updated by rianmcguire (Rian McGuire) 9 months ago 1Actions
Copy link
#3 [ruby-core:121741]

Updated by rianmcguire (Rian McGuire) 9 months ago Actions
Copy link
#4

Updated by alanwu (Alan Wu) 9 months ago Actions
Copy link
#5

Updated by k0kubun (Takashi Kokubun) 8 months ago Actions
Copy link
#6 [ruby-core:122075]

Updated by nagachika (Tomoyuki Chikanaga) 8 months ago Actions
Copy link
#7 [ruby-core:122183]

Project

General

Profile

Ruby

Custom queries

Bug #21257

YJIT can generate infinite loop when OOM

Updated by rianmcguire (Rian McGuire) 9 months ago · Edited 1ActionsCopy link #1 [ruby-core:121598]

Updated by hsbt (Hiroshi SHIBATA) 9 months ago ActionsCopy link #2 [ruby-core:121600]

Updated by rianmcguire (Rian McGuire) 9 months ago 1ActionsCopy link #3 [ruby-core:121741]

Updated by rianmcguire (Rian McGuire) 9 months ago ActionsCopy link #4

Updated by alanwu (Alan Wu) 9 months ago ActionsCopy link #5

Updated by k0kubun (Takashi Kokubun) 8 months ago ActionsCopy link #6 [ruby-core:122075]

Updated by nagachika (Tomoyuki Chikanaga) 8 months ago ActionsCopy link #7 [ruby-core:122183]

Updated by rianmcguire (Rian McGuire) 9 months ago · Edited 1Actions
Copy link
#1 [ruby-core:121598]

Updated by hsbt (Hiroshi SHIBATA) 9 months ago Actions
Copy link
#2 [ruby-core:121600]

Updated by rianmcguire (Rian McGuire) 9 months ago 1Actions
Copy link
#3 [ruby-core:121741]

Updated by rianmcguire (Rian McGuire) 9 months ago Actions
Copy link
#4

Updated by alanwu (Alan Wu) 9 months ago Actions
Copy link
#5

Updated by k0kubun (Takashi Kokubun) 8 months ago Actions
Copy link
#6 [ruby-core:122075]

Updated by nagachika (Tomoyuki Chikanaga) 8 months ago Actions
Copy link
#7 [ruby-core:122183]