Bug #4121

test_getpty_nonexistentで止まることがある

Added by Kouhei Yanagita over 3 years ago. Updated over 1 year ago.

[ruby-dev:42686]
Status:Closed
Priority:Normal
Assignee:Motohiro KOSAKI
Category:core
Target version:-
ruby -v:ruby 1.9.3dev (2010-12-06 trunk 30091) [x86_64-linux] Backport:

Description

=begin
CentOS 5.5 で、test_pty.rbが途中で止まることがあります。

毎回止まるわけではなく、正しくテストが実行されることも多いです。

while true; do make test-all TESTS='test_pty.rb'; done

などと連続で実行させると簡単に再現できます。

Debian lennyでは問題は発生しませんでした。

$ make test-all TESTS='testpty.rb'
./miniruby -I./lib -I.ext/common ./tool/runruby.rb --extout=.ext -- "./test/runner.rb" test
pty.rb
Run options:

# Running tests:

.....(ここで止まるのでC-cで中断する)F......

Finished tests in 2.729398s, 4.3966 tests/s, 10.6251 assertions/s.

1) Failure:
testgetptynonexistent(TestPTY) [/home/yanagi/tmp/ruby/test/testpty.rb:157]:

[Errno::ENOENT] exception expected, not
Class:
Message: <"">
---Backtrace---
/home/yanagi/tmp/ruby/test/test
pty.rb:159:in getbyte'
/home/yanagi/tmp/ruby/test/test_pty.rb:159:in
getpty'
/home/yanagi/tmp/ruby/test/testpty.rb:159:in `block (2 levels) in testgetpty_nonexistent'


12 tests, 29 assertions, 1 failures, 0 errors, 0 skips
make: *** [yes-test-all] Error 1
=end


Related issues

Related to ruby-trunk - Bug #4157: test_pty で、たまに出る Failure Assigned 12/14/2010

Associated revisions

Revision 30350
Added by Motohiro KOSAKI over 3 years ago

  • ext/pty/pty.c (chfunc): Added rbthreadatforkbeforeexec(). We must reinitialize GVL when new process creation. Otherwise we may meet an insane deadlock. [Bug #4121]

History

#1 Updated by Makoto Kishimoto over 3 years ago

=begin
うちの環境(4コアAMD64、FreeBSD8)でも起きます。

$ cpuset -l 0 make test-all TESTS='test_pty.rb'

のように CPU を制限すると確実に起きます。が

$ cpuset -l 0 make test-all TESTS='-n testgetptynonexistent test_pty.rb'

のように問題のテストだけ実行すると確実に起きなくなります。
=end

#2 Updated by Kouhei Yanagita over 3 years ago

=begin
CentOS 5.5/Xeon L3426の環境ですが、改めて調べてみると

make test-all TESTS="test_pty.rb"
→たまに止まる

make test-all TESTS="-n testgetptynonexistent test_pty.rb"
→止まることはない

taskset -c 0 make test-all TESTS="test_pty.rb"
→必ず止まる

taskset -c 0 make test-all TESTS="-n testgetptynonexistent test_pty.rb"
→止まることはない

という結果になりました。

=end

#3 Updated by Tomoyuki Chikanaga over 3 years ago

=begin
うちでも起きる環境と起きない環境がありました。

起きない
RHEL ES 3.8 Kernel 2.4.21-47.ELsmp, Xeon 5160(2core) x 2CPU
Ubuntu 9.04 Kernel 2.6.28-19-generic, Pentium4 x 1CPU

起きる
RHEL ES 4.8 Kernel 2.6.9-89.ELlargesmp, Xeon X5570(2core) x 4CPU

起きた時に子プロセスのほうに gdb attach して確認してみました。
スタックトレースを最後に貼ります。

gvlacquire での l.90 あたりの nativecond_wait で固まってますが、ここに入るのがおかしくて

(gdb) p vm->gvl.waitingthreads
$2 = (struct rb
threadstruct * volatile) 0x832f8b8
(gdb) p th
$3 = (rb
threadt *) 0x82117a0
(gdb) p vm->gvl.acquired
$4 = 0
(gdb) p th->thread
id
$5 = 4160292544
(gdb) p vm->gvl.waitingthreads->threadid
$6 = 4156005280
(gdb) info thread
2 Thread 4156771232 (LWP 18806) 0xffffe410 in _kernelvsyscall ()
* 1 Thread 4160292544 (LWP 18804) 0xffffe410 in _kernelvsyscall ()

(gdb) p vm->gvl.waitingthreads->nativethreaddata.gvlnext
$7 = (struct rbthreadstruct *) 0x82117a0
(gdb) p vm->gvl.waitingthreads->nativethreaddata.gvlnext->threadid
$8 = 4160292544
(gdb) p vm->gvl.waiting
threads->nativethreaddata.gvlnext->nativethreaddata.gvlnext
$9 = (struct rbthreadstruct *) 0x0

と、おそらく vm->gvlwaitingthreads が fork 後に破棄された rbthreadt を見ているような感じです。

以下、スタックトレースを貼ります。
#0 0xffffe410 in _kernelvsyscall ()
#1 0x00837cf6 in pthreadcondwait@@GLIBC2.3.2 ()
from /lib/tls/libpthread.so.0
#2 0x0814759b in native
condwait (cond=0x8211824, mutex=0x82114f4)
at thread
pthread.c:266
#3 0x08147297 in gvlacquire (vm=0x82114f0, th=0x82117a0)
at thread
pthread.c:90
#4 0x08149478 in blockingregionend (th=0x82117a0, region=0xffffa398)
at thread.c:1048
#5 0x0814a950 in doselect (n=11, read=0x0, write=0x84aa728, except=0x0,
timeout=0x0) at thread.c:2521
#6 0x0814aaba in rb
threadwaitfdrw (fd=10, read=0) at thread.c:2575
#7 0x0814ab1b in rb
threadfdwritable (fd=10) at thread.c:2597
#8 0x0806a4ba in iofflush (fptr=0x83365a0) at io.c:676
#9 0x0806b0aa in rb
ioflush (io=137483980) at io.c:1091
#10 0x080ac18e in rb
forkerr (status=0x0, chfunc=0xf7bfc599 ,
charg=0xffffa6ac, fds=4, errmsg=0xffffa68c "", errmsg
buflen=32)
at process.c:2575
#11 0xf7bfc768 in establishShell (argc=1, argv=0xf7c351bc, info=0xffffa710,
SlaveName=0xffffa700 "/dev/pts/21") at pty.c:248
#12 0xf7bfd261 in ptygetpty (argc=1, argv=0xf7c351bc, self=137648620)
at pty.c:545
#13 0x08134070 in call
cfunc (func=0xf7bfcf07 , recv=137648620,
len=-1, argc=1, argv=0xf7c351bc) at vminsnhelper.c:316
#14 0x081349b4 in vm
callcfunc (th=0x82117a0, regcfp=0xf7cb4bbc, num=1,
recv=137648620, blockptr=0x0, me=0x849b1a8) at vminsnhelper.c:403
#15 0x08134e7e in vm
callmethod (th=0x82117a0, cfp=0xf7cb4bbc, num=1,
blockptr=0x0, flag=0, id=16952, me=0x849b1a8, recv=137648620)
at vm
insnhelper.c:525
#16 0x081394c7 in vmexeccore (th=0x82117a0, initial=0) at insns.def:1010
#17 0x081442d0 in vmexec (th=0x82117a0) at vm.c:1150
#18 0x08142fd9 in invoke
blockfromc (th=0x82117a0, block=0xf7cb4d88,
self=138349680, argc=1, argv=0xffffb170, blockptr=0x0, cref=0x0)
at vm.c:561
#19 0x081430ea in vmyield (th=0x82117a0, argc=1, argv=0xffffb170) at vm.c:591
#20 0x081404af in rb
yield0 (argc=1, argv=0xffffb170) at vmeval.c:740
#21 0x081404e8 in rbyield (val=137536340) at vmeval.c:750
#22 0x081597f7 in rbarycollect (ary=137536260) at array.c:2160
#23 0x08134080 in callcfunc (func=0x8159756 <rbarycollect>, recv=137536260,
len=0, argc=0, argv=0xf7c35108) at vm
insnhelper.c:319
#24 0x081349b4 in vmcallcfunc (th=0x82117a0, regcfp=0xf7cb4d74, num=0,
recv=137536260, blockptr=0xf7cb4d88, me=0x8272d40) at vm
insnhelper.c:403
#25 0x08134e7e in vmcallmethod (th=0x82117a0, cfp=0xf7cb4d74, num=0,
blockptr=0xf7cb4d88, flag=0, id=1568, me=0x8272d40, recv=137536260)
at vminsnhelper.c:525
#26 0x081394c7 in vm
execcore (th=0x82117a0, initial=0) at insns.def:1010
#27 0x081442d0 in vm
exec (th=0x82117a0) at vm.c:1150
#28 0x08142fd9 in invokeblockfromc (th=0x82117a0, block=0xf7cb4e38,
self=138349680, argc=1, argv=0xffffbbd0, blockptr=0x0, cref=0x0)
at vm.c:561
#29 0x081430ea in vm
yield (th=0x82117a0, argc=1, argv=0xffffbbd0) at vm.c:591
#30 0x081404af in rbyield0 (argc=1, argv=0xffffbbd0) at vmeval.c:740
#31 0x081404e8 in rb
yield (val=137648400) at vmeval.c:750
#32 0x0815793d in rb
aryeach (ary=137546080) at array.c:1429
#33 0x08134080 in call
cfunc (func=0x81578cb , recv=137546080,
len=0, argc=0, argv=0xf7c350c0) at vminsnhelper.c:319
#34 0x081349b4 in vm
callcfunc (th=0x82117a0, regcfp=0xf7cb4e24, num=0,
recv=137546080, blockptr=0xf7cb4e38, me=0x82725a8) at vminsnhelper.c:403
#35 0x08134e7e in vm
callmethod (th=0x82117a0, cfp=0xf7cb4e24, num=0,
blockptr=0xf7cb4e38, flag=0, id=424, me=0x82725a8, recv=137546080)
at vm
insnhelper.c:525
#36 0x081394c7 in vmexeccore (th=0x82117a0, initial=0) at insns.def:1010
#37 0x081442d0 in vmexec (th=0x82117a0) at vm.c:1150
#38 0x08142fd9 in invoke
blockfromc (th=0x82117a0, block=0xf7cb4f40,
self=138349680, argc=1, argv=0xffffc630, blockptr=0x0, cref=0x0)
at vm.c:561
#39 0x081430ea in vmyield (th=0x82117a0, argc=1, argv=0xffffc630) at vm.c:591
#40 0x081404af in rb
yield0 (argc=1, argv=0xffffc630) at vmeval.c:740
#41 0x081404e8 in rbyield (val=137647980) at vmeval.c:750
#42 0x0815793d in rbaryeach (ary=137546320) at array.c:1429
#43 0x08134080 in callcfunc (func=0x81578cb <rbaryeach>, recv=137546320,
len=0, argc=0, argv=0xf7c35054) at vm
insnhelper.c:319
#44 0x081349b4 in vmcallcfunc (th=0x82117a0, regcfp=0xf7cb4f2c, num=0,
recv=137546320, blockptr=0xf7cb4f40, me=0x82725a8) at vm
insnhelper.c:403
#45 0x08134e7e in vmcallmethod (th=0x82117a0, cfp=0xf7cb4f2c, num=0,
blockptr=0xf7cb4f40, flag=0, id=424, me=0x82725a8, recv=137546320)
at vminsnhelper.c:525
#46 0x081394c7 in vm
execcore (th=0x82117a0, initial=0) at insns.def:1010
#47 0x081442d0 in vm
exec (th=0x82117a0) at vm.c:1150
#48 0x08144aa6 in rbiseqevalmain (iseqval=136494340) at vm.c:1391
#49 0x08059780 in ruby
execinternal (n=0x822bd04) at eval.c:225
#50 0x08059867 in ruby
execnode (n=0x822bd04) at eval.c:272
#51 0x08059841 in ruby
run_node (n=0x822bd04) at eval.c:265
#52 0x080583ae in main (argc=4, argv=0xffffd1b4) at main.c:38

=end

#4 Updated by Motohiro KOSAKI over 3 years ago

  • Category set to core
  • Status changed from Open to Assigned
  • Assignee set to Motohiro KOSAKI

=begin
わたしの手には負えないかもしれないですけど、誰も手を動かしていない気配があるのでとりあえず自分にアサインします。誰か手を動かしている人がいれば遠慮なく奪ってください

=end

#5 Updated by Motohiro KOSAKI over 3 years ago

=begin
誰が悪いのかはわかりませんが、以下の2つのコミットが間違っていることと、現在のtrunkだと
fork時に子プロセスがGVLを再初期化するまえにGVLをacquireするケースがあるので、forkが動くのは
ただのミラクルということが分かりました。

あと、Process#fork()以外のfork(IO#popenとか)だとrbthreadatforkが呼ばれてないので、
消滅したスレッドの掃除もされないし、GVLも再初期化されませんが、これもたぶんバグです。

たぶん直せそうな気がするのでクリスマス中にでもやっときます。
詳細なレポートをくれた、Chikanagaさんには感謝いっぱいです。


commit f54418b71f57e00d4a50bd86bf7d18b8b5195287
Author: matz matz@b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Date: Mon Nov 27 09:23:38 2000 +0000

 matz


 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@1053 b2dd03c8-39d4-4d8f-98ff-823fe69

+Mon Nov 27 00:10:08 2000 Yukihiro Matsumoto matz@ruby-lang.org
+
+ * io.c (rbioctl): call ioctl/fcntl for fptr->f2 too.
+
+ * process.c (rbffork): call rbthreadatfork() after creating
+ child process.
+
+ * eval.c (rbthreadatfork): kill all other threads immediately,
+ then turn the current thread into the main thread.
+


commit e4cf9c59fda102a80aa767c3a66a4738e26f6213
Author: ko1 ko1@b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Date: Sun Nov 28 13:00:49 2010 +0000

 * thread_pthread.c: remove pthread_atfork().



 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@29963 b2dd03c8-39d4-4d8f-98ff-823fe6

=end

#6 Updated by Motohiro KOSAKI over 3 years ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

=begin
This issue was solved with changeset r30350.
Kouhei, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.

=end

Also available in: Atom PDF