Project

General

Profile

Actions

Bug #9835

closed

IGNORE signal handler has possible race condition in Ruby 2.1.2

Added by askreet (Kyle Smith) almost 10 years ago. Updated almost 10 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-linux]
[ruby-core:62565]

Description

I'm migrating an application from 1.8.7 to 2.1.1/2.1.2, so I'm not sure when this was introduced. Attached is a demo program with some notes about how the IGNORE option to Signal.trap seems to have a race condition whereby receiving that signal shortly after that call, it raises a SignalException with the message including only the name of the signal it received. However, if you trap the signal with an empty block, that race does not occur.

I've tested the attached program on 1.8.7 (no issue), 2.1.1 (issue) and 2.1.2 (issue) on a Linux system (Ubuntu lucid).


Files

test.rb (1.37 KB) test.rb Test program demonstrating failure conditions. askreet (Kyle Smith), 05/13/2014 04:12 PM

Updated by rb2k (Marc Seeger) almost 10 years ago

This script produces the same "Caught SignalException: SIGUSR1" message on Ubuntu Precise.
I couldn't get it to trigger on OSX

Updated by nobu (Nobuyoshi Nakada) almost 10 years ago

  • Status changed from Open to Rejected

It's your bug, just sending a signal before the children trap SIGUSR1.

Updated by askreet (Kyle Smith) almost 10 years ago

Nobuyoshi Nakada,

Please take a second look at the ticket and attached source, you failed to address two things:

  1. Why is this code not an issue in 1.8.7. The children (as expected) inherit the signal handler from the parent and those who lose the race run the parent signal handlers in their place. Is this new behavior in 2.1?

  2. Why does it work properly if I trap the signal with an empty block, in place of IGNORE? Please see the comments in the original source that highlight alternate codepaths that do not raise SignalException.

I still believe there is a defect here, or at least a change in signal handling with regard to fork() that I do not understand.

Thank you,
Kyle

Updated by normalperson (Eric Wong) almost 10 years ago

It looks like a race condition in your code. You were lucky to not hit
it in 1.8.7 or with a different use of trap (because execution speeds
may be different different).

 parent execution timeline
 -----------------------------+
 trap(USR1, show_children)    |
                              |
 fork ------------------------+-- child execution timeline -----
                              |
 *** kill(USR1, -getpgid($$)) | trap(USR1, IGNORE)
                              |

Once you fork, parent and child run in parallel. So the order of
execution where '***' is highlighted is outside your control.

So to ensure your child ignores properly, you must setup the ignore
handler before forking. There will always be a moment in time where
the parent and child will have the same signal handler.

You also do not want to a signal in the parent which arrive immediately
before or after the fork, either. So I suggest something like the
following to defer signals in the parent:

 # note: this count nals in general) is racy
 usr1_queue = []
 show_children = trap("USR1") { usr1_queue << nil } # defer signals
 pid = fork
 if pid # parent
   trap("USR1", show_children) # restore immediate handling

   # process deferred signals
   usr1_queue.each { show_children.call }
   usr1_queue.clear
   ...
 else # child
   trap("USR1", "IGNORE")
   # do nothing with usr1_queue here, implicitly ignored
   ...
 end

Updated by rb2k (Marc Seeger) almost 10 years ago

Thanks for the answer, Eric!

If you look at the attached example (test.rb), you will see that there IS already a signal handler for USR1 registered (very bottom, before .run!).
So even before we set IGNORE in the child, there is an inherited handler present that should circumvent the SignalException.

(besides that, this would also fail to explain why an empty block fixes the issue and with IGNORE it keeps crashing.)

Updated by normalperson (Eric Wong) almost 10 years ago

I could not reproduce the race, but I think the following is a fix.

http://bogomips.org/ruby.git/patch?id=3137d8389c32e4

signal.c: fix SIG_IGN race for #9835

A signal may arrive before SIG_IGN is set, but be handled
after trap_list[sig].cmd is cleared:

  1. signal_enque runs on signal arrival
  2. user calls trap(sig, "IGNORE"), setting SIG_IGN
  3. rb_signal_exec runs on queued signal
  • signal.c (signal_exec): ignore nil for SIG_IGN
  • signal.c (trap_handler): set cmd to nil for SIG_IGN
  • signal.c (trap): generate "IGNORE" string for nil

Can you please try my patch? Thank you.

Updated by nobu (Nobuyoshi Nakada) almost 10 years ago

As signal is asynchronous, it is not guaranteed essentially?

Anyway, the patch doesn't seem evil.

Updated by normalperson (Eric Wong) almost 10 years ago

wrote:

As signal is asynchronous, it is not guaranteed essentially?

The problem is a race with the way we defer signal handling

Kyle expected two states for USR1:

  1. print children: sighandler, trap_list[sig].cmd === proc
  2. ignore: SIG_IGN, trap_list[sig].cmd == 0

However, the problem is we queued a SIGUSR1 at 1),
but queue and handle the signal after 2) passes

So the queued signal hits cmd==0, leading to rb_threadptr_signal_raise
and SignalException. Signals queued after 2) are safe from this
condition.

Anyway, the patch doesn't seem evil.

OK, I'll wait a bit for Kyle to confirm. However, I just managed to
reproduce the issue by limiting to one CPU on one of my machines:

schedtool -a 0x1 -e ruby test.rb

My proposed patch seems to fix it on my machine.

Updated by normalperson (Eric Wong) almost 10 years ago

  • Description updated (diff)
  • Backport changed from 2.0.0: UNKNOWN, 2.1: UNKNOWN to 2.0.0: REQUIRED, 2.1: REQUIRED

update backport field for r46194

Updated by nagachika (Tomoyuki Chikanaga) almost 10 years ago

  • Status changed from Rejected to Closed

change status because there was a bugfix.

Updated by nagachika (Tomoyuki Chikanaga) almost 10 years ago

  • Backport changed from 2.0.0: REQUIRED, 2.1: REQUIRED to 2.0.0: REQUIRED, 2.1: DONE

Backported into ruby_2_1 branch at r46520.

Updated by usa (Usaku NAKAMURA) almost 10 years ago

  • Backport changed from 2.0.0: REQUIRED, 2.1: DONE to 2.0.0: DONE, 2.1: DONE

backported into ruby_2_0_0 at r46575.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0