Feature #22138
openAdd `RB_NOGVL_PENDING_INTERRUPT_FAIL` flag for `rb_nogvl`.
Description
Add a new flag, RB_NOGVL_PENDING_INTERRUPT_FAIL, to rb_nogvl(). When set, rb_nogvl() does not enter the blocking region (does not call the supplied function) if the current thread has pending interrupts — including interrupts that are currently masked by Thread.handle_interrupt. In that case it returns 0 without calling the function, with errno == 0 — the same as the existing RB_NOGVL_INTR_FAIL skip path (the function was never called).
This gives selector / event-loop extensions a reliable way to detect that a thread has work to do (a pending interrupt) before committing to a potentially unbounded native wait, so they can unwind and let Ruby process the interrupt instead of hanging.
Background and motivation¶
rb_nogvl() already supports a "fail before blocking" mode via RB_NOGVL_INTR_FAIL (0x1). With that flag, if the VM has been interrupted (RUBY_VM_INTERRUPTED_ANY) before the blocking region is entered, the function is skipped and rb_nogvl() returns 0. This is used by rb_thread_call_without_gvl2().
However, RB_NOGVL_INTR_FAIL only reacts to deliverable interrupts. It does not account for interrupts that the current thread has deliberately deferred via Thread.handle_interrupt. There is an important class of bugs where:
- A thread has a pending interrupt (e.g. a
Thread#raise, a timeout, a shutdown/termination request). - That interrupt is masked by an enclosing
Thread.handle_interrupt(... => :never)/:on_blockingregion — typical in schedulers, supervisors and connection pools that want to control exactly where interrupts are delivered. - The thread is about to enter a native wait (e.g.
kqueue/epoll/select, or some other blocking syscall) with the GVL released. - Because the interrupt is masked, the existing
RB_NOGVL_INTR_FAILcheck does not trip, the native wait is entered, and — if nothing else wakes it — the wait can block indefinitely. The pending interrupt is never observed.
This is not hypothetical: it was hit in production through async-container and io-event, where a scheduler entering a native selector wait could hang because a pending (masked) interrupt was not noticed before the wait began.
A blocking operation that is skipped because the thread has pending work should be treated the same way as an interrupted wait: return immediately without entering the wait, and let the caller unwind so Ruby can process the interrupt.
Proposal¶
Add the flag:
Semantics when the flag is set:
- If the current thread has pending interrupts (as reported by the thread's pending-interrupt queue, including interrupts masked by
Thread.handle_interrupt),rb_nogvl()does not callfunc. It returns0. - Otherwise
rb_nogvl()behaves as usual: it enters the blocking region, callsfunc, and preservesfunc's resultingerrno.
The check is performed both as an early pre-check (before any of the blocking-region machinery runs) and again inside the blocking-region setup, to close the window between the pre-check and the point where the GVL is released.
Relationship to existing flags¶
RB_NOGVL_INTR_FAIL(0x1) reacts to deliverable VM interrupts (RUBY_VM_INTERRUPTED_ANY). It does not consider interrupts deferred byThread.handle_interrupt.RB_NOGVL_PENDING_INTERRUPT_FAIL(0x8) reacts to pending interrupts in the thread's queue, including masked ones. This is the key difference: it lets a caller bail out before a native wait even when the interrupt is not currently deliverable, so the caller can unwind to a point where the interrupt can be handled safely.
The flags are independent and may be combined with each other and with RB_NOGVL_UBF_ASYNC_SAFE / RB_NOGVL_OFFLOAD_SAFE.
Example¶
A common pattern is to have the callback write its result into a struct that is pre-initialised to a sentinel (e.g. -1), then report failure once back under the GVL:
struct Arguments {
int result;
};
static void *my_func(void *ptr) {
struct Arguments *arguments = ptr;
arguments->result = my_syscall();
return NULL;
}
struct Arguments arguments = {.result = -1};
rb_nogvl(my_func, &arguments, ubf, &arguments, RB_NOGVL_PENDING_INTERRUPT_FAIL);
if (arguments.result == -1) {
// Either the syscall ran and was interrupted (errno == EINTR), or it was
// not run at all because the thread had pending interrupts (errno == 0).
rb_sys_fail("my_syscall");
}
Note the errno caveat for this proposal: on the skip path errno is 0, and rb_sys_fail() does not behave well with errno == 0. A caller that wants to funnel both cases through rb_sys_fail() should normalise it:
if (arguments.result == -1) {
if (errno == 0) errno = EINTR; // skip path leaves errno == 0
rb_sys_fail("my_syscall");
}
See "Errno handling" below for what errno == EINTR vs errno == 0 tells you.
Extensions can feature-detect the flag at build time so they keep working on older Rubies:
Errno handling¶
On the pending-interrupt skip path, rb_nogvl() returns 0 with errno == 0. This matches the existing skip path used by RB_NOGVL_INTR_FAIL: when the function is never called, errno is restored to its initial value (0). The flag does not otherwise change rb_nogvl()'s errno behaviour, so this change is purely additive and carries no compatibility risk for existing extensions.
For a caller, when an operation reports "no result" (e.g. result == -1), the two interesting outcomes can be told apart by errno:
errno == EINTR— the function did run and its syscall was actually interrupted (the conventionalEINTR). Some side effects may have occurred.errno == 0— the function did not run: it was skipped before entering the blocking region, either byRB_NOGVL_PENDING_INTERRUPT_FAIL(pending, possibly masked, interrupts) or byRB_NOGVL_INTR_FAIL.
So for this proposal, result == -1 && (errno == EINTR || errno == 0) means the operation was not executed, or was interrupted — in both cases the caller should unwind and let Ruby process interrupts.
An errno of 0 on a skipped callback is admittedly not very ergonomic (it forces the if (errno == 0) errno = EINTR; dance above before rb_sys_fail()). Making the skip paths report EINTR directly would remove that wrinkle, but it also erases the EINTR-vs-0 distinction above (you could no longer tell a truly-interrupted syscall from a never-run one). That trade-off is a separate concern and may be proposed independently; this proposal keeps the existing behaviour and stays focused on the new flag.
Implementation¶
A reference implementation is available:
RB_NOGVL_PENDING_INTERRUPT_FAILflag: https://github.com/ruby/ruby/pull/17553
Downstream user with feature detection and a fallback for older Rubies:
The focused CRuby C-API specs (spec/ruby/optional/capi/thread_spec.rb) cover the new flag: the function is not called when the current thread has masked pending interrupts, and errno is 0.
Updated by ioquatix (Samuel Williams) 5 days ago
- Description updated (diff)
- Assignee set to ioquatix (Samuel Williams)