Project

General

Profile

Bug #14939

[PATCH] cont.c (ec_switch): prevent delayed/missed trap interrupt race

Added by normalperson (Eric Wong) almost 2 years ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Target version:
-
[ruby-core:88110]

Description

ko1: I noticed this while working on timer-thread elimination/lazy-spawning.
However, it looks like a bug we introduced in 2.5 with `ec'.
Can you check this patch? Thanks.

cont.c (ec_switch): prevent delayed/missed trap interrupt race

timer-thread may set trap interrupt with rb_threadptr_check_signal
at any time independent of GVL. This means timer-thread may set
the trap interrupt flag on the previous execution context; causing
the flag to be unnoticed until a future ec switch (or lost
completely if the ec is done).

Note: I avoid relying on th->interrupt_lock here and use
atomics because we won't be able to rely on it for proposed lazy
timer-thread [Misc #14937].

This regression affects Ruby 2.5 as it was introduced by moving
interrupt_flag to `ec' which is an unstable pointer. Ruby <= 2.4
was unaffected because vm->main_thread->interrupt_flag never
changed.


Files

Updated by ko1 (Koichi Sasada) almost 2 years ago

Good point. Seems good. Thank you!

BTW, I forget that why the trap handler is limited to main thread.

trap(:INT) do
  p Thread.current
end

Thread.new{
  sleep 1
  Process.kill :INT, $$
  p [:child, Thread.current]
}.join
sleep 10
p [:main, Thread.current]

__END__

[:child, #<Thread:0x00000125801a3028@t.rb:5 run>]
#<Thread:0x00000125801e9c80 run>
# after 10 seconds
[:main, #<Thread:0x00000125801e9c80 run>]

Can we relax this limitation?

Updated by ko1 (Koichi Sasada) almost 2 years ago

Oops, sorry forget about sleep lines.

#3

Updated by normalperson (Eric Wong) almost 2 years ago

  • Status changed from Open to Closed

Applied in changeset trunk|r64062.


cont.c (ec_switch): prevent delayed/missed trap interrupt race

timer-thread may set trap interrupt with rb_threadptr_check_signal
at any time independent of GVL. This means timer-thread may set
the trap interrupt flag on the previous execution context; causing
the flag to be unnoticed until a future ec switch (or lost
completely if the ec is done).

Note: I avoid relying on th->interrupt_lock here and use
atomics because we won't be able to rely on it for proposed lazy
timer-thread [Misc #14937].

This regression affects Ruby 2.5 as it was introduced by moving
interrupt_flag to `ec' which is an unstable pointer. Ruby <= 2.4
was unaffected because vm->main_thread->interrupt_flag never
changed.

[ruby-core:88119] [Bug #14939]

Updated by normalperson (Eric Wong) almost 2 years ago

ko1@atdot.net wrote:

Good point. Seems good. Thank you!

OK, commited as r64062

BTW, I forget that why the trap handler is limited to main thread.

I can think of at least three reasons:

1) Similar to this bug: other threads may exit soon,
main thread is long-lived so interrupt won't get lost

2) Typical trap handlers are established at program startup,
before other threads exist, so maybe some are reliant on
Thread.current[] values inside trap (perhaps unknowingly
through libraries).

3) Concurrency problem. Mutex isn't usable inside trap,
but there may be instances of reentrancy-safe code which
can't be made thread-safe.

Can we relax this limitation?

I don't think it's necessary; anybody who relies on trap will
likely structure main thread to deal with it (e.g. not block on
slow NFS IO#read :P). We can workaround 1), but I expect
compatibility and safety problems from 2) and 3).

Updated by nagachika (Tomoyuki Chikanaga) over 1 year ago

  • Backport changed from 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: REQUIRED to 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: DONE

ruby_2_5 r64999 merged revision(s) 64062.

Also available in: Atom PDF