Bug #4777
closedRuby 1.9.2-p180 ignoring INT, TERM, and QUIT until it receives CONT
Description
We're having an intermittent but fairly frequent issue with a resque worker process that we're daemonizing with daemontools on Ubuntu 10.04 LTS
When we send a QUIT or TERM signal to the process, it is not handled.
When we send a CONT, the process traps and handles QUIT and TERM signals that were ignored just before handling the CONT.
It's as if the CONT frees the signals that were previously ignored.
Updated by sorah (Sorah Fukumori) over 13 years ago
- Priority changed from 6 to Normal
Could you produce ruby -v
? (Like ruby 1.9.2p180 (2011-02-18 revision 30909) [x86_64-linux])
And we don't think this issue "Urgent", however you thought "Urgent".
Updated by kosaki (Motohiro KOSAKI) over 13 years ago
- Status changed from Open to Feedback
Unfortunately, we are not ESPer. Please consider to make reproducer.
Now we have no way to digging your issue.
Updated by nathansobo (Nathan Sobo) over 13 years ago
"ruby -v" =>
ruby 1.9.2p180 (2011-02-18 revision 30909) [i686-linux]
We have a test case documented here:
https://github.com/carlhuda/bundler/issues/1200#comment_1269662
Thanks!
Updated by mame (Yusuke Endoh) over 13 years ago
- ruby -v changed from 1.9.2-p180 to -
Hello, Nathan
I think that this issue is similar to #4608.
In that ticket, a fault of 11.04 kernel is suspected.
Jason Earl said that manual install of Maverick (10.10) kernel
prevents that issue.
But nobody mentioned 10.04 LTS in that ticket.
So I'm not sure that this ticket is really related to #4608.
2011/6/1 Nathan Sobo nathansobo@gmail.com:
Issue #4777 has been updated by Nathan Sobo.
"ruby -v" =>
ruby 1.9.2p180 (2011-02-18 revision 30909) [i686-linux]We have a test case documented here:
https://github.com/carlhuda/bundler/issues/1200#comment_1269662Thanks!¶
Bug #4777: Ruby 1.9.2-p180 ignoring INT, TERM, and QUIT until it receives CONT
http://redmine.ruby-lang.org/issues/4777Author: Nathan Sobo
Status: Feedback
Priority: Normal
Assignee:
Category:
Target version:
ruby -v: 1.9.2-p180We're having an intermittent but fairly frequent issue with a resque worker process that we're daemonizing with daemontools on Ubuntu 10.04 LTS
When we send a QUIT or TERM signal to the process, it is not handled.
When we send a CONT, the process traps and handles QUIT and TERM signals that were ignored just before handling the CONT.
It's as if the CONT frees the signals that were previously ignored.
--
Yusuke Endoh mame@tsg.ne.jp
Updated by psanford (Peter Sanford) over 13 years ago
I have reproduced the same issue with Ubuntu 11.04 + ruby 1.9.2-p180 (I have not tried with other ubuntu versions). Here is a simple test script to reproduce the problem:
#!/usr/bin/env ruby
/bin/true
begin
require 'nonexisting/library'
rescue LoadError
end
while true do
puts "loop"
sleep 1
end
If I remove either the backtick call or the rescue I don't have the problem. If I move the backtick call below the rescue I don't have the problem.
Updated by kosaki (Motohiro KOSAKI) over 13 years ago
- Status changed from Feedback to Third Party's Issue
Nice information!!
But unfortunately, I've confirmed ruby-1.9.2p180 + Fedora15 + your test case doesn't reproduce the issue.
Also strace doesn't show any ruby fault. We have to conclude it's Ubuntu specific issue. I'm sorry.
Updated by psanford (Peter Sanford) over 13 years ago
Ok. I upgraded to the 2.6.39 kernel (using the kernel-ppa) and the problem went away. I know that there were some issues with 2.6.38 + bash related to signal handling*, although it is not clear to me if there was any kernel change to address the issue.
Updated by kosaki (Motohiro KOSAKI) over 13 years ago
OK, I and Endoh-san investigated more. I bet the guilty linux regression is below.
Ruby's timer thread is using pthread_cond_timedwait() and it is using futex_wait() internally.
And, more importantly, this fixing patch was already backported to 2.6.38.4.
Thus, the issue is really Ubuntu specific, unfortunately.
see https://bugzilla.kernel.org/show_bug.cgi?id=32922
commit 0cd9c6494ee5c19aef085152bc37f3a4e774a9e1
Author: Darren Hart dvhart@linux.intel.com
Date: Thu Apr 14 15:41:57 2011 -0700
futex: Set FLAGS_HAS_TIMEOUT during futex_wait restart setup
The FLAGS_HAS_TIMEOUT flag was not getting set, causing the restart_block to
restart futex_wait() without a timeout after a signal.
Commit b41277dc7a18ee332d in 2.6.38 introduced the regression by accidentally
removing the the FLAGS_HAS_TIMEOUT assignment from futex_wait() during the setup
of the restart block. Restore the originaly behavior.
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=32922
Reported-by: Tim Smith <tsmith201104@yahoo.com>
Reported-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Kacur <jkacur@redhat.com>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/%3Cdaac0eb3af607f72b9a4d3126b2ba8fb5ed3b883.1302820917.git.dvhart%40linux.intel.com%3E
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Updated by mame (Yusuke Endoh) over 13 years ago
Hello,
2011/6/5 Motohiro KOSAKI kosaki.motohiro@gmail.com:
And, more importantly, this fixing patch was already backported to 2.6.38.4.
Thus, the issue is really Ubuntu specific, unfortunately.
The patch was backported on 18 Apr., while Ubuntu natty was released
on the end of Apr. So it is cruel to blame Ubuntu :-)
The patch has already been backported to ubuntu kernel tree:
So, time will solve this problem, I think.
If you want to solve this now, it would be good to consider installing
a new kernel by kernel ppa or manual install.
--
Yusuke Endoh mame@tsg.ne.jp