Project

General

Profile

Bug #14939

[PATCH] cont.c (ec_switch): prevent delayed/missed trap interrupt race

Added by normalperson (Eric Wong) about 1 year ago. Updated 11 months ago.

Status:
Closed
Priority:
Normal
Target version:
-
[ruby-core:88110]

Description

ko1: I noticed this while working on timer-thread elimination/lazy-spawning.
However, it looks like a bug we introduced in 2.5 with `ec'.
Can you check this patch? Thanks.

cont.c (ec_switch): prevent delayed/missed trap interrupt race

timer-thread may set trap interrupt with rb_threadptr_check_signal
at any time independent of GVL. This means timer-thread may set
the trap interrupt flag on the previous execution context; causing
the flag to be unnoticed until a future ec switch (or lost
completely if the ec is done).

Note: I avoid relying on th->interrupt_lock here and use
atomics because we won't be able to rely on it for proposed lazy
timer-thread [Misc #14937].

This regression affects Ruby 2.5 as it was introduced by moving
interrupt_flag to `ec' which is an unstable pointer. Ruby <= 2.4
was unaffected because vm->main_thread->interrupt_flag never
changed.


Files

Associated revisions

Revision 97538e81
Added by normal about 1 year ago

cont.c (ec_switch): prevent delayed/missed trap interrupt race

timer-thread may set trap interrupt with rb_threadptr_check_signal
at any time independent of GVL. This means timer-thread may set
the trap interrupt flag on the previous execution context; causing
the flag to be unnoticed until a future ec switch (or lost
completely if the ec is done).

Note: I avoid relying on th->interrupt_lock here and use
atomics because we won't be able to rely on it for proposed lazy
timer-thread [Misc #14937].

This regression affects Ruby 2.5 as it was introduced by moving
interrupt_flag to `ec' which is an unstable pointer. Ruby <= 2.4
was unaffected because vm->main_thread->interrupt_flag never
changed.

[ruby-core:88119] [Bug #14939]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64062 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 64062
Added by normalperson (Eric Wong) about 1 year ago

cont.c (ec_switch): prevent delayed/missed trap interrupt race

timer-thread may set trap interrupt with rb_threadptr_check_signal
at any time independent of GVL. This means timer-thread may set
the trap interrupt flag on the previous execution context; causing
the flag to be unnoticed until a future ec switch (or lost
completely if the ec is done).

Note: I avoid relying on th->interrupt_lock here and use
atomics because we won't be able to rely on it for proposed lazy
timer-thread [Misc #14937].

This regression affects Ruby 2.5 as it was introduced by moving
interrupt_flag to `ec' which is an unstable pointer. Ruby <= 2.4
was unaffected because vm->main_thread->interrupt_flag never
changed.

[ruby-core:88119] [Bug #14939]

Revision 64062
Added by normal about 1 year ago

cont.c (ec_switch): prevent delayed/missed trap interrupt race

timer-thread may set trap interrupt with rb_threadptr_check_signal
at any time independent of GVL. This means timer-thread may set
the trap interrupt flag on the previous execution context; causing
the flag to be unnoticed until a future ec switch (or lost
completely if the ec is done).

Note: I avoid relying on th->interrupt_lock here and use
atomics because we won't be able to rely on it for proposed lazy
timer-thread [Misc #14937].

This regression affects Ruby 2.5 as it was introduced by moving
interrupt_flag to `ec' which is an unstable pointer. Ruby <= 2.4
was unaffected because vm->main_thread->interrupt_flag never
changed.

[ruby-core:88119] [Bug #14939]

Revision 9fc4c79f
Added by nagachika (Tomoyuki Chikanaga) 11 months ago

merge revision(s) 64062: [Backport #14939]

    cont.c (ec_switch): prevent delayed/missed trap interrupt race

    timer-thread may set trap interrupt with rb_threadptr_check_signal
    at any time independent of GVL.  This means timer-thread may set
    the trap interrupt flag on the previous execution context; causing
    the flag to be unnoticed until a future ec switch (or lost
    completely if the ec is done).

    Note: I avoid relying on th->interrupt_lock here and use
    atomics because we won't be able to rely on it for proposed lazy
    timer-thread [Misc #14937].

    This regression affects Ruby 2.5 as it was introduced by moving
    interrupt_flag to `ec' which is an unstable pointer.  Ruby <= 2.4
    was unaffected because vm->main_thread->interrupt_flag never
    changed.

    [ruby-core:88119] [Bug #14939]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_5@64999 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 64999
Added by nagachika (Tomoyuki Chikanaga) 11 months ago

merge revision(s) 64062: [Backport #14939]

cont.c (ec_switch): prevent delayed/missed trap interrupt race

timer-thread may set trap interrupt with rb_threadptr_check_signal
at any time independent of GVL.  This means timer-thread may set
the trap interrupt flag on the previous execution context; causing
the flag to be unnoticed until a future ec switch (or lost
completely if the ec is done).

Note: I avoid relying on th->interrupt_lock here and use
atomics because we won't be able to rely on it for proposed lazy
timer-thread [Misc #14937].

This regression affects Ruby 2.5 as it was introduced by moving
interrupt_flag to `ec' which is an unstable pointer.  Ruby <= 2.4
was unaffected because vm->main_thread->interrupt_flag never
changed.

[ruby-core:88119] [Bug #14939]

History

Updated by ko1 (Koichi Sasada) about 1 year ago

Good point. Seems good. Thank you!

BTW, I forget that why the trap handler is limited to main thread.

trap(:INT) do
  p Thread.current
end

Thread.new{
  sleep 1
  Process.kill :INT, $$
  p [:child, Thread.current]
}.join
sleep 10
p [:main, Thread.current]

__END__

[:child, #<Thread:0x00000125801a3028@t.rb:5 run>]
#<Thread:0x00000125801e9c80 run>
# after 10 seconds
[:main, #<Thread:0x00000125801e9c80 run>]

Can we relax this limitation?

Updated by ko1 (Koichi Sasada) about 1 year ago

Oops, sorry forget about sleep lines.

#3

Updated by normalperson (Eric Wong) about 1 year ago

  • Status changed from Open to Closed

Applied in changeset trunk|r64062.


cont.c (ec_switch): prevent delayed/missed trap interrupt race

timer-thread may set trap interrupt with rb_threadptr_check_signal
at any time independent of GVL. This means timer-thread may set
the trap interrupt flag on the previous execution context; causing
the flag to be unnoticed until a future ec switch (or lost
completely if the ec is done).

Note: I avoid relying on th->interrupt_lock here and use
atomics because we won't be able to rely on it for proposed lazy
timer-thread [Misc #14937].

This regression affects Ruby 2.5 as it was introduced by moving
interrupt_flag to `ec' which is an unstable pointer. Ruby <= 2.4
was unaffected because vm->main_thread->interrupt_flag never
changed.

[ruby-core:88119] [Bug #14939]

Updated by normalperson (Eric Wong) about 1 year ago

ko1@atdot.net wrote:

Good point. Seems good. Thank you!

OK, commited as r64062

BTW, I forget that why the trap handler is limited to main thread.

I can think of at least three reasons:

1) Similar to this bug: other threads may exit soon,
main thread is long-lived so interrupt won't get lost

2) Typical trap handlers are established at program startup,
before other threads exist, so maybe some are reliant on
Thread.current[] values inside trap (perhaps unknowingly
through libraries).

3) Concurrency problem. Mutex isn't usable inside trap,
but there may be instances of reentrancy-safe code which
can't be made thread-safe.

Can we relax this limitation?

I don't think it's necessary; anybody who relies on trap will
likely structure main thread to deal with it (e.g. not block on
slow NFS IO#read :P). We can workaround 1), but I expect
compatibility and safety problems from 2) and 3).

Updated by nagachika (Tomoyuki Chikanaga) 11 months ago

  • Backport changed from 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: REQUIRED to 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: DONE

ruby_2_5 r64999 merged revision(s) 64062.

Also available in: Atom PDF