Bug #9835
closedIGNORE signal handler has possible race condition in Ruby 2.1.2
Description
I'm migrating an application from 1.8.7 to 2.1.1/2.1.2, so I'm not sure when this was introduced. Attached is a demo program with some notes about how the IGNORE option to Signal.trap seems to have a race condition whereby receiving that signal shortly after that call, it raises a SignalException with the message including only the name of the signal it received. However, if you trap the signal with an empty block, that race does not occur.
I've tested the attached program on 1.8.7 (no issue), 2.1.1 (issue) and 2.1.2 (issue) on a Linux system (Ubuntu lucid).
Files
Updated by rb2k (Marc Seeger) about 10 years ago
This script produces the same "Caught SignalException: SIGUSR1" message on Ubuntu Precise.
I couldn't get it to trigger on OSX
Updated by nobu (Nobuyoshi Nakada) about 10 years ago
- Status changed from Open to Rejected
It's your bug, just sending a signal before the children trap SIGUSR1
.
Updated by askreet (Kyle Smith) about 10 years ago
Nobuyoshi Nakada,
Please take a second look at the ticket and attached source, you failed to address two things:
-
Why is this code not an issue in 1.8.7. The children (as expected) inherit the signal handler from the parent and those who lose the race run the parent signal handlers in their place. Is this new behavior in 2.1?
-
Why does it work properly if I trap the signal with an empty block, in place of IGNORE? Please see the comments in the original source that highlight alternate codepaths that do not raise SignalException.
I still believe there is a defect here, or at least a change in signal handling with regard to fork() that I do not understand.
Thank you,
Kyle
Updated by normalperson (Eric Wong) about 10 years ago
It looks like a race condition in your code. You were lucky to not hit
it in 1.8.7 or with a different use of trap (because execution speeds
may be different different).
parent execution timeline
-----------------------------+
trap(USR1, show_children) |
|
fork ------------------------+-- child execution timeline -----
|
*** kill(USR1, -getpgid($$)) | trap(USR1, IGNORE)
|
Once you fork, parent and child run in parallel. So the order of
execution where '***' is highlighted is outside your control.
So to ensure your child ignores properly, you must setup the ignore
handler before forking. There will always be a moment in time where
the parent and child will have the same signal handler.
You also do not want to a signal in the parent which arrive immediately
before or after the fork, either. So I suggest something like the
following to defer signals in the parent:
# note: this count nals in general) is racy
usr1_queue = []
show_children = trap("USR1") { usr1_queue << nil } # defer signals
pid = fork
if pid # parent
trap("USR1", show_children) # restore immediate handling
# process deferred signals
usr1_queue.each { show_children.call }
usr1_queue.clear
...
else # child
trap("USR1", "IGNORE")
# do nothing with usr1_queue here, implicitly ignored
...
end
Updated by rb2k (Marc Seeger) about 10 years ago
Thanks for the answer, Eric!
If you look at the attached example (test.rb), you will see that there IS already a signal handler for USR1 registered (very bottom, before .run!).
So even before we set IGNORE in the child, there is an inherited handler present that should circumvent the SignalException.
(besides that, this would also fail to explain why an empty block fixes the issue and with IGNORE it keeps crashing.)
Updated by normalperson (Eric Wong) about 10 years ago
I could not reproduce the race, but I think the following is a fix.
http://bogomips.org/ruby.git/patch?id=3137d8389c32e4
signal.c: fix SIG_IGN race for #9835
A signal may arrive before SIG_IGN is set, but be handled
after trap_list[sig].cmd is cleared:
- signal_enque runs on signal arrival
- user calls trap(sig, "IGNORE"), setting SIG_IGN
- rb_signal_exec runs on queued signal
- signal.c (signal_exec): ignore nil for SIG_IGN
- signal.c (trap_handler): set cmd to nil for SIG_IGN
- signal.c (trap): generate "IGNORE" string for nil
Can you please try my patch? Thank you.
Updated by nobu (Nobuyoshi Nakada) about 10 years ago
As signal is asynchronous, it is not guaranteed essentially?
Anyway, the patch doesn't seem evil.
Updated by normalperson (Eric Wong) about 10 years ago
nobu@ruby-lang.org wrote:
As signal is asynchronous, it is not guaranteed essentially?
The problem is a race with the way we defer signal handling
Kyle expected two states for USR1
:
- print children: sighandler,
trap_list[sig].cmd === proc
- ignore:
SIG_IGN
,trap_list[sig].cmd == 0
However, the problem is we queued a SIGUSR1
at 1),
but queue and handle the signal after 2) passes
So the queued signal hits cmd==0
, leading to rb_threadptr_signal_raise
and SignalException
. Signals queued after 2) are safe from this
condition.
Anyway, the patch doesn't seem evil.
OK, I'll wait a bit for Kyle to confirm. However, I just managed to
reproduce the issue by limiting to one CPU on one of my machines:
schedtool -a 0x1 -e ruby test.rb
My proposed patch seems to fix it on my machine.
Updated by normalperson (Eric Wong) about 10 years ago
v2: updated to pass rubyspec for trap(sig, nil)
http://bogomips.org/ruby.git/patch?id=e19b6b1a5e0117
Updated by normalperson (Eric Wong) about 10 years ago
- Description updated (diff)
- Backport changed from 2.0.0: UNKNOWN, 2.1: UNKNOWN to 2.0.0: REQUIRED, 2.1: REQUIRED
update backport field for r46194
Updated by nagachika (Tomoyuki Chikanaga) about 10 years ago
- Status changed from Rejected to Closed
change status because there was a bugfix
.
Updated by nagachika (Tomoyuki Chikanaga) about 10 years ago
- Backport changed from 2.0.0: REQUIRED, 2.1: REQUIRED to 2.0.0: REQUIRED, 2.1: DONE
Backported into ruby_2_1
branch at r46520.
Updated by usa (Usaku NAKAMURA) about 10 years ago
- Backport changed from 2.0.0: REQUIRED, 2.1: DONE to 2.0.0: DONE, 2.1: DONE
backported into ruby_2_0_0
at r46575.