ko1: I noticed this while working on timer-thread elimination/lazy-spawning.
However, it looks like a bug we introduced in 2.5 with `ec'.
Can you check this patch? Thanks.
timer-thread may set trap interrupt with rb_threadptr_check_signal
at any time independent of GVL. This means timer-thread may set
the trap interrupt flag on the previous execution context; causing
the flag to be unnoticed until a future ec switch (or lost
completely if the ec is done).
Note: I avoid relying on th->interrupt_lock here and use
atomics because we won't be able to rely on it for proposed lazy
timer-thread [Misc #14937].
This regression affects Ruby 2.5 as it was introduced by moving
interrupt_flag to `ec' which is an unstable pointer. Ruby <= 2.4
was unaffected because vm->main_thread->interrupt_flag never
changed.
timer-thread may set trap interrupt with rb_threadptr_check_signal
at any time independent of GVL. This means timer-thread may set
the trap interrupt flag on the previous execution context; causing
the flag to be unnoticed until a future ec switch (or lost
completely if the ec is done).
Note: I avoid relying on th->interrupt_lock here and use
atomics because we won't be able to rely on it for proposed lazy
timer-thread [Misc #14937].
This regression affects Ruby 2.5 as it was introduced by moving
interrupt_flag to `ec' which is an unstable pointer. Ruby <= 2.4
was unaffected because vm->main_thread->interrupt_flag never
changed.
BTW, I forget that why the trap handler is limited to main thread.
I can think of at least three reasons:
Similar to this bug: other threads may exit soon,
main thread is long-lived so interrupt won't get lost
Typical trap handlers are established at program startup,
before other threads exist, so maybe some are reliant on
Thread.current[] values inside trap (perhaps unknowingly
through libraries).
Concurrency problem. Mutex isn't usable inside trap,
but there may be instances of reentrancy-safe code which
can't be made thread-safe.
Can we relax this limitation?
I don't think it's necessary; anybody who relies on trap will
likely structure main thread to deal with it (e.g. not block on
slow NFS IO#read :P). We can workaround 1), but I expect
compatibility and safety problems from 2) and 3).