Bug #5343

Unexpected blocking behavior when interrupt Socket#accept

Added by Tomoyuki Chikanaga over 2 years ago. Updated over 2 years ago.

[ruby-core:39634]
Status:Closed
Priority:High
Assignee:Koichi Sasada
Category:ext
Target version:1.9.3
ruby -v:- Backport:

Description

In CentOS release 5.6 (Kernel: 2.6.18-238.12.1.el5, glibc 2.5),
the following sample script rarely (about once every 1000) blocks at Thread#join with 1.9.3-head.

require "socket"
require "thread"

queue = Queue.new

th = Thread.start {
s = TCPServer.new(10000)
queue.push(nil)
cli = s.accept
}

queue.pop
th.kill.join

Backtrace:
thread-1:
#0 0x0000003a3500aee9 in pthreadcondwait@@GLIBC2.3.2 ()
from /lib64/libpthread.so.0
#1 0x000000000052d81b in native
condwait (th=0x19551550, timeouttv=0x0)
at ../ruby-1.9.3/threadpthread.c:307
#2 native
sleep (th=0x19551550, timeouttv=0x0)
at ../ruby-1.9.3/thread
pthread.c:908
#3 0x000000000052f9ab in sleepforever (th=0x19551550, deadlockable=1)
at ../ruby-1.9.3/thread.c:855
#4 0x000000000052fa4d in thread
joinsleep (arg=140733838958496)
at ../ruby-1.9.3/thread.c:688
#5 0x0000000000417abb in rb
ensure (bproc=0x52fa00 <threadjoinsleep>,
data1=140733838958496, e
proc=0x5288e0 ,
data2=140733838958496) at ../ruby-1.9.3/eval.c:736
#6 0x000000000052a78e in threadjoin (argc=,
argv=, self=)
at ../ruby-1.9.3/thread.c:721
#7 thread
joinm (argc=, argv=,
self=) at ../ruby-1.9.3/thread.c:802
#8 0x0000000000524b0d in vm
callcfunc (th=0x19551550, cfp=0x2b38cc37df08,
num=, blockptr=,
flag=, id=, me=0x196663c0,
recv=427644560) at ../ruby-1.9.3/vm
insnhelper.c:404
#9 vmcallmethod (th=0x19551550, cfp=0x2b38cc37df08,
num=, blockptr=,
flag=, id=, me=0x196663c0,
recv=427644560) at ../ruby-1.9.3/vminsnhelper.c:530
#10 0x000000000051908d in vm
execcore (th=0x19551550,
initial=) at ../ruby-1.9.3/insns.def:1015
#11 0x000000000051ed7e in vm
exec (th=0x19551550) at ../ruby-1.9.3/vm.c:1220
#12 0x0000000000525f9f in rbiseqevalmain (iseqval=427473840)
at ../ruby-1.9.3/vm.c:1461
#13 0x0000000000414c22 in ruby
execinternal (n=0x197abbb0)
at ../ruby-1.9.3/eval.c:204
#14 0x00000000004172d4 in ruby
execnode (n=)
at ../ruby-1.9.3/eval.c:251
#15 ruby
run_node (n=) at ../ruby-1.9.3/eval.c:244
#16 0x0000000000414689 in main (argc=2, argv=0x7fff267aa588)
at ../ruby-1.9.3/main.c:38

thread-2:
#0 0x0000003a344cb696 in poll () from /lib64/libc.so.6
#1 0x00000000005301ba in ppoll (fd=,
events=, tv=0x0) at ../ruby-1.9.3/thread.c:2820
#2 rbwaitforsinglefd (fd=,
events=, tv=0x0) at ../ruby-1.9.3/thread.c:2849
#3 0x000000000053052c in rbthreadwaitfdrw (fd=5)
at ../ruby-1.9.3/thread.c:2686
#4 rbthreadwaitfd (fd=5) at ../ruby-1.9.3/thread.c:2699
#5 0x00002aaaab0b7b6f in rsock
saccept (klass=427743720, fd=5,
sockaddr=, len=0x40473a3c)
at ../../../ruby-1.9.3/ext/socket/init.c:499
#6 0x00002aaaab0c3310 in tcp
accept (sock=)
at ../../../ruby-1.9.3/ext/socket/tcpserver.c:55
#7 0x0000000000524b0d in vmcallcfunc (th=0x197ffe90, cfp=0x2aaaab3d3f08,
num=, blockptr=,
flag=, id=, me=0x1980ba70,
recv=427644480) at ../ruby-1.9.3/vminsnhelper.c:404
#8 vm
callmethod (th=0x197ffe90, cfp=0x2aaaab3d3f08,
num=, blockptr=,
flag=, id=, me=0x1980ba70,
recv=427644480) at ../ruby-1.9.3/vm
insnhelper.c:530
#9 0x000000000051908d in vmexeccore (th=0x197ffe90,
initial=) at ../ruby-1.9.3/insns.def:1015
#10 0x000000000051ed7e in vmexec (th=0x197ffe90) at ../ruby-1.9.3/vm.c:1220
#11 0x000000000051fad5 in invoke
blockfromc (th=0x197ffe90,
block=, self=, argc=0,
argv=, blockptr=, cref=0x0)
at ../ruby-1.9.3/vm.c:624
#12 0x000000000052026f in rbvminvokeproc (th=0x197ffe90, proc=0x196d72a0,
self=425420560, argc=0, argv=0x197d56c8, blockptr=0x0)
at ../ruby-1.9.3/vm.c:670
#13 0x000000000052f5c1 in thread
startfunc2 (th=0x197ffe90,
stackstart=) at ../ruby-1.9.3/thread.c:453
#14 0x000000000052f75e in thread
startfunc1 (thptr=0x197ffe90)
at ../ruby-1.9.3/thread
pthread.c:656
#15 0x0000003a3500673d in start_thread () from /lib64/libpthread.so.0
#16 0x0000003a344d44bd in clone () from /lib64/libc.so.6


Related issues

Related to Backport93 - Backport #5757: main threadがreadやselectで待っていると、^C でなかなか死なない Closed 12/13/2011

Associated revisions

Revision 33307
Added by Tomoyuki Chikanaga over 2 years ago

  • threadpthread.c (ubfselect): activate timer thread when interrupt blocking thread. A patch created by Koichi Sasada. [Bug #5343] to cover race condition, timer thread periodically send SIGVTARLM to threads in signal thread list. so you should activate timer thread when interrupt a thread.

Revision 34034
Added by Yui NARUSE over 2 years ago

Add test for [Bug #5343]

History

#1 Updated by Tomoyuki Chikanaga over 2 years ago

Hi,

I've found that this issue is not specific problem of TCPServer#accept.

The following script also rarely blocks forever.

require "thread"

queue = Queue.new
r, w = IO.pipe
th = Thread.start {
queue.push(nil)
r.read 1
}
queue.pop
th.kill.join

I guess this is because SIGVTALRM from send from ubf_select() is received by blocking thread after GVL release but before enter poll(2)/ppoll(2).

#2 Updated by Motohiro KOSAKI over 2 years ago

  • Status changed from Open to Assigned
  • Assignee set to Koichi Sasada
  • Priority changed from Normal to High

Aghh. This is regression since 1.9.3.

#3 Updated by Koichi Sasada over 2 years ago

  • ruby -v changed from ruby 1.9.3dev (2011-09-17 revision 33290) [i686-linux] to -

(2011/09/20 2:22), Tomoyuki Chikanaga wrote:

Issue #5343 has been updated by Tomoyuki Chikanaga.

Hi,

I've found that this issue is not specific problem of TCPServer#accept.

The following script also rarely blocks forever.

require "thread"

queue = Queue.new
r, w = IO.pipe
th = Thread.start {
queue.push(nil)
r.read 1
}
queue.pop
th.kill.join

I guess this is because SIGVTALRM from send from ubf_select() is received by blocking thread after GVL release but before enter poll(2)/ppoll(2).

Thanks. I understand. How about it?

Index: threadpthread.c
===================================================================
--- thread
pthread.c (revision 33272)
+++ threadpthread.c (working copy)
@@ -1013,6 +1013,7 @@
{
rb
threadt *th = (rbthreadt *)ptr;
add
signalthreadlist(th);
+ rbthreadwakeuptimerthread(); /* activate timer thread */
ubfselecteach(th);
}

--
// SASADA Koichi at atdot dot net

#4 Updated by Eric Wong over 2 years ago

SASADA Koichi ko1@atdot.net wrote:

(2011/09/20 2:22), Tomoyuki Chikanaga wrote:

I guess this is because SIGVTALRM from send from ubf_select() is received by blocking thread after GVL release but before enter poll(2)/ppoll(2).

Thanks. I understand. How about it?

I'm not sure issues like this are completely avoidable, but your patch
seems to help avoid it.

I needed ~15K tries to reproduce blocking on the original test case
(no SMP on this Xen VM). So far I'm at 65K loop interations and still
going... I will update if I notice it blocked again.

#5 Updated by Tomoyuki Chikanaga over 2 years ago

Hi, sasada san.
With your patch, over 60000 times trials pass successfully.
Your patch works fine for me too.
Thank you.

#6 Updated by Koichi Sasada over 2 years ago

Thanks Eric and Chikanaga-san.

Could you commit it? Maybe 1.9.3 also needs this patch. But I'm not
sure how to commit frozen 1.9.3 branch.

Regards,
Koichi

(2011/09/20 17:25), Tomoyuki Chikanaga wrote:

Issue #5343 has been updated by Tomoyuki Chikanaga.

Hi, sasada san.
With your patch, over 60000 times trials pass successfully.
Your patch works fine for me too.

Thank you.

Bug #5343: Unexpected blocking behavior when interrupt Socket#accept
http://redmine.ruby-lang.org/issues/5343

Author: Tomoyuki Chikanaga
Status: Assigned
Priority: High
Assignee: Koichi Sasada
Category: ext
Target version: 1.9.3
ruby -v: -

In CentOS release 5.6 (Kernel: 2.6.18-238.12.1.el5, glibc 2.5),
the following sample script rarely (about once every 1000) blocks at Thread#join with 1.9.3-head.

require "socket"
require "thread"

queue = Queue.new

th = Thread.start {
s = TCPServer.new(10000)
queue.push(nil)
cli = s.accept
}

queue.pop
th.kill.join

Backtrace:
thread-1:
#0 0x0000003a3500aee9 in pthreadcondwait@@GLIBC2.3.2 ()
from /lib64/libpthread.so.0
#1 0x000000000052d81b in native
condwait (th=0x19551550, timeouttv=0x0)
at ../ruby-1.9.3/threadpthread.c:307
#2 native
sleep (th=0x19551550, timeouttv=0x0)
at ../ruby-1.9.3/thread
pthread.c:908
#3 0x000000000052f9ab in sleepforever (th=0x19551550, deadlockable=1)
at ../ruby-1.9.3/thread.c:855
#4 0x000000000052fa4d in thread
joinsleep (arg=140733838958496)
at ../ruby-1.9.3/thread.c:688
#5 0x0000000000417abb in rb
ensure (bproc=0x52fa00 <threadjoinsleep>,
data1=140733838958496, e
proc=0x5288e0 ,
data2=140733838958496) at ../ruby-1.9.3/eval.c:736
#6 0x000000000052a78e in threadjoin (argc=,
argv=, self=)
at ../ruby-1.9.3/thread.c:721
#7 thread
joinm (argc=, argv=,
self=) at ../ruby-1.9.3/thread.c:802
#8 0x0000000000524b0d in vm
callcfunc (th=0x19551550, cfp=0x2b38cc37df08,
num=, blockptr=,
flag=, id=, me=0x196663c0,
recv=427644560) at ../ruby-1.9.3/vm
insnhelper.c:404
#9 vmcallmethod (th=0x19551550, cfp=0x2b38cc37df08,
num=, blockptr=,
flag=, id=, me=0x196663c0,
recv=427644560) at ../ruby-1.9.3/vminsnhelper.c:530
#10 0x000000000051908d in vm
execcore (th=0x19551550,
initial=) at ../ruby-1.9.3/insns.def:1015
#11 0x000000000051ed7e in vm
exec (th=0x19551550) at ../ruby-1.9.3/vm.c:1220
#12 0x0000000000525f9f in rbiseqevalmain (iseqval=427473840)
at ../ruby-1.9.3/vm.c:1461
#13 0x0000000000414c22 in ruby
execinternal (n=0x197abbb0)
at ../ruby-1.9.3/eval.c:204
#14 0x00000000004172d4 in ruby
execnode (n=)
at ../ruby-1.9.3/eval.c:251
#15 ruby
run_node (n=) at ../ruby-1.9.3/eval.c:244
#16 0x0000000000414689 in main (argc=2, argv=0x7fff267aa588)
at ../ruby-1.9.3/main.c:38

thread-2:
#0 0x0000003a344cb696 in poll () from /lib64/libc.so.6
#1 0x00000000005301ba in ppoll (fd=,
events=, tv=0x0) at ../ruby-1.9.3/thread.c:2820
#2 rbwaitforsinglefd (fd=,
events=, tv=0x0) at ../ruby-1.9.3/thread.c:2849
#3 0x000000000053052c in rbthreadwaitfdrw (fd=5)
at ../ruby-1.9.3/thread.c:2686
#4 rbthreadwaitfd (fd=5) at ../ruby-1.9.3/thread.c:2699
#5 0x00002aaaab0b7b6f in rsock
saccept (klass=427743720, fd=5,
sockaddr=, len=0x40473a3c)
at ../../../ruby-1.9.3/ext/socket/init.c:499
#6 0x00002aaaab0c3310 in tcp
accept (sock=)
at ../../../ruby-1.9.3/ext/socket/tcpserver.c:55
#7 0x0000000000524b0d in vmcallcfunc (th=0x197ffe90, cfp=0x2aaaab3d3f08,
num=, blockptr=,
flag=, id=, me=0x1980ba70,
recv=427644480) at ../ruby-1.9.3/vminsnhelper.c:404
#8 vm
callmethod (th=0x197ffe90, cfp=0x2aaaab3d3f08,
num=, blockptr=,
flag=, id=, me=0x1980ba70,
recv=427644480) at ../ruby-1.9.3/vm
insnhelper.c:530
#9 0x000000000051908d in vmexeccore (th=0x197ffe90,
initial=) at ../ruby-1.9.3/insns.def:1015
#10 0x000000000051ed7e in vmexec (th=0x197ffe90) at ../ruby-1.9.3/vm.c:1220
#11 0x000000000051fad5 in invoke
blockfromc (th=0x197ffe90,
block=, self=, argc=0,
argv=, blockptr=, cref=0x0)
at ../ruby-1.9.3/vm.c:624
#12 0x000000000052026f in rbvminvokeproc (th=0x197ffe90, proc=0x196d72a0,
self=425420560, argc=0, argv=0x197d56c8, blockptr=0x0)
at ../ruby-1.9.3/vm.c:670
#13 0x000000000052f5c1 in thread
startfunc2 (th=0x197ffe90,
stackstart=) at ../ruby-1.9.3/thread.c:453
#14 0x000000000052f75e in thread
startfunc1 (thptr=0x197ffe90)
at ../ruby-1.9.3/thread
pthread.c:656
#15 0x0000003a3500673d in start_thread () from /lib64/libpthread.so.0
#16 0x0000003a344d44bd in clone () from /lib64/libc.so.6

--
// SASADA Koichi at atdot dot net

#7 Updated by Tomoyuki Chikanaga over 2 years ago

Koichi Sasada wrote:

Could you commit it? Maybe 1.9.3 also needs this patch. But I'm not
sure how to commit frozen 1.9.3 branch.
I'd be happy to commit it. I'll check in it to trunk and create a backport request ticket.

BTW, let me summarize the reason why this patch works fine. If I misunderstand, please point out.
When interrupt other thread (cf. by Thread#kill, Thread#raise) in blocking region, unblocking function (ubf) is called to interrupt blocking system call like select/ppoll etc... The default ubf (ubfselect) call pthreadkill() to send SIGVTALRM to interrupted thread.
But There's race condition. If interrupted thread was just released GVL and received SIGVTALRM before entering blocking systemcall, signal was lost.
To cover this race condition, ubfselect() also call addsignalthreadlist() to register "signal thread list". Timer thread periodically send SIGVTALRM to threads in signal thread list. Recently timer thread is suspended when there's no more than one runnable thread. So when interrupt other thread and register to signal thread list, you should activate timer thread to polling-mode.

#8 Updated by Tomoyuki Chikanaga over 2 years ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r33307.
Tomoyuki, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • threadpthread.c (ubfselect): activate timer thread when interrupt blocking thread. A patch created by Koichi Sasada. [Bug #5343] to cover race condition, timer thread periodically send SIGVTARLM to threads in signal thread list. so you should activate timer thread when interrupt a thread.

#9 Updated by Koichi Sasada over 2 years ago

(2011/09/21 9:38), Tomoyuki Chikanaga wrote:

Could you commit it? Maybe 1.9.3 also needs this patch. But I'm not
sure how to commit frozen 1.9.3 branch.
I'd be happy to commit it. I'll check in it to trunk and create a backport request ticket.

Thanks.

BTW, let me summarize the reason why this patch works fine. If I misunderstand, please point out.
When interrupt other thread (cf. by Thread#kill, Thread#raise) in blocking region, unblocking function (ubf) is called to interrupt blocking system call like select/ppoll etc...
The default ubf (ubf_select)

No. ubfselect is not a default one. ubfselect is only for canceling
select() blocking. If you use it for other process, it may be a bug.

call pthreadkill() to send SIGVTALRM to interrupted thread.
But There's race condition.
If interrupted thread was just released GVL and received SIGVTALRM
before entering blocking systemcall, signal was lost.
To cover this race condition, ubf
select() also call
addsignalthread_list() to register "signal thread list".
Timer thread periodically send SIGVTALRM to threads in signal thread
list.
Recently timer thread is suspended when there's no more than one
runnable thread. So when interrupt other thread and register to
signal thread list, you should activate timer thread to
polling-mode.

Yes, absolutely.

--
// SASADA Koichi at atdot dot net

#10 Updated by Eric Wong over 2 years ago

Tomoyuki Chikanaga nagachika00@gmail.com wrote:

To cover this race condition, ubfselect() also call
add
signalthreadlist() to register "signal thread list". Timer
thread periodically send SIGVTALRM to threads in signal thread list.

Ah, I missed the second part of that earlier. My concern raised in
is no more, thank you.

Also available in: Atom PDF