Bug #4777

Ruby 1.9.2-p180 ignoring INT, TERM, and QUIT until it receives CONT

Added by Nathan Sobo almost 3 years ago. Updated almost 3 years ago.

[ruby-core:36447]
Status:Third Party's Issue
Priority:Normal
Assignee:-
Category:-
Target version:-
ruby -v:- Backport:

Description

We're having an intermittent but fairly frequent issue with a resque worker process that we're daemonizing with daemontools on Ubuntu 10.04 LTS

When we send a QUIT or TERM signal to the process, it is not handled.
When we send a CONT, the process traps and handles QUIT and TERM signals that were ignored just before handling the CONT.
It's as if the CONT frees the signals that were previously ignored.


Related issues

Related to ruby-trunk - Bug #4608: Ctrl-c to interrupt script causes hang and 100% cpu's cor... Third Party's Issue 04/25/2011

History

#1 Updated by Shota Fukumori almost 3 years ago

  • Priority changed from Urgent to Normal

Could you produce ruby -v ? (Like ruby 1.9.2p180 (2011-02-18 revision 30909) [x86_64-linux])

And we don't think this issue "Urgent", however you thought "Urgent".

#2 Updated by Motohiro KOSAKI almost 3 years ago

  • Status changed from Open to Feedback

Unfortunately, we are not ESPer. Please consider to make reproducer.
Now we have no way to digging your issue.

#3 Updated by Nathan Sobo almost 3 years ago

"ruby -v" =>
ruby 1.9.2p180 (2011-02-18 revision 30909) [i686-linux]

We have a test case documented here:
https://github.com/carlhuda/bundler/issues/1200#comment_1269662

Thanks!

#4 Updated by Yusuke Endoh almost 3 years ago

  • ruby -v changed from 1.9.2-p180 to -

Hello, Nathan

I think that this issue is similar to #4608.

In that ticket, a fault of 11.04 kernel is suspected.
Jason Earl said that manual install of Maverick (10.10) kernel
prevents that issue.

But nobody mentioned 10.04 LTS in that ticket.
So I'm not sure that this ticket is really related to #4608.

2011/6/1 Nathan Sobo nathansobo@gmail.com:

Issue #4777 has been updated by Nathan Sobo.

"ruby -v" =>
ruby 1.9.2p180 (2011-02-18 revision 30909) [i686-linux]

We have a test case documented here:
https://github.com/carlhuda/bundler/issues/1200#comment_1269662

Thanks!

Bug #4777: Ruby 1.9.2-p180 ignoring INT, TERM, and QUIT until it receives CONT
http://redmine.ruby-lang.org/issues/4777

Author: Nathan Sobo
Status: Feedback
Priority: Normal
Assignee:
Category:
Target version:
ruby -v: 1.9.2-p180

We're having an intermittent but fairly frequent issue with a resque worker process that we're daemonizing with daemontools on Ubuntu 10.04 LTS

When we send a QUIT or TERM signal to the process, it is not handled.
When we send a CONT, the process traps and handles QUIT and TERM signals that were ignored just before handling the CONT.
It's as if the CONT frees the signals that were previously ignored.

http://redmine.ruby-lang.org

--
Yusuke Endoh mame@tsg.ne.jp

#5 Updated by Peter Sanford almost 3 years ago

I have reproduced the same issue with Ubuntu 11.04 + ruby 1.9.2-p180 (I have not tried with other ubuntu versions). Here is a simple test script to reproduce the problem:

#!/usr/bin/env ruby

/bin/true

begin
require 'nonexisting/library'
rescue LoadError
end

while true do
puts "loop"
sleep 1
end

If I remove either the backtick call or the rescue I don't have the problem. If I move the backtick call below the rescue I don't have the problem.

#6 Updated by Motohiro KOSAKI almost 3 years ago

  • Status changed from Feedback to Third Party's Issue

Nice information!!

But unfortunately, I've confirmed ruby-1.9.2p180 + Fedora15 + your test case doesn't reproduce the issue.
Also strace doesn't show any ruby fault. We have to conclude it's Ubuntu specific issue. I'm sorry.

#7 Updated by Peter Sanford almost 3 years ago

Ok. I upgraded to the 2.6.39 kernel (using the kernel-ppa) and the problem went away. I know that there were some issues with 2.6.38 + bash related to signal handling*, although it is not clear to me if there was any kernel change to address the issue.

#8 Updated by Motohiro KOSAKI almost 3 years ago

OK, I and Endoh-san investigated more. I bet the guilty linux regression is below.
Ruby's timer thread is using pthreadcondtimedwait() and it is using futex_wait() internally.

And, more importantly, this fixing patch was already backported to 2.6.38.4.
Thus, the issue is really Ubuntu specific, unfortunately.

see https://bugzilla.kernel.org/show_bug.cgi?id=32922


commit 0cd9c6494ee5c19aef085152bc37f3a4e774a9e1
Author: Darren Hart dvhart@linux.intel.com
Date: Thu Apr 14 15:41:57 2011 -0700

futex: Set FLAGS_HAS_TIMEOUT during futex_wait restart setup

The FLAGS_HAS_TIMEOUT flag was not getting set, causing the restart_block to
restart futex_wait() without a timeout after a signal.

Commit b41277dc7a18ee332d in 2.6.38 introduced the regression by accidentally
removing the the FLAGS_HAS_TIMEOUT assignment from futex_wait() during the setup
of the restart block. Restore the originaly behavior.

Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=32922

Reported-by: Tim Smith <tsmith201104@yahoo.com>
Reported-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Kacur <jkacur@redhat.com>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/%3Cdaac0eb3af607f72b9a4d3126b2ba8fb5ed3b883.1302820917.git.dvhart%40linux.intel.com%3E
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

#9 Updated by Yusuke Endoh almost 3 years ago

Hello,

2011/6/5 Motohiro KOSAKI kosaki.motohiro@gmail.com:

And, more importantly, this fixing patch was already backported to 2.6.38.4.
Thus, the issue is really Ubuntu specific, unfortunately.

The patch was backported on 18 Apr., while Ubuntu natty was released
on the end of Apr. So it is cruel to blame Ubuntu :-)

The patch has already been backported to ubuntu kernel tree:

http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-natty.git;a=blob;f=kernel/futex.c;h=d5065e8283dac90ec20daf4eee4d530096d048ad;hb=HEAD#l1889

So, time will solve this problem, I think.
If you want to solve this now, it would be good to consider installing
a new kernel by kernel ppa or manual install.

--
Yusuke Endoh mame@tsg.ne.jp

Also available in: Atom PDF