Bug #4121
closedtest_getpty_nonexistentで止まることがある
Added by kyanagi (Kouhei Yanagita) about 14 years ago. Updated about 12 years ago.
Description
=begin
CentOS 5.5 で、test_pty.rbが途中で止まることがあります。
毎回止まるわけではなく、正しくテストが実行されることも多いです。
while true; do make test-all TESTS='test_pty.rb'; done
などと連続で実行させると簡単に再現できます。
Debian lennyでは問題は発生しませんでした。
$ make test-all TESTS='test_pty.rb'
./miniruby -I./lib -I.ext/common ./tool/runruby.rb --extout=.ext -- "./test/runner.rb" test_pty.rb
Run options:
Running tests:¶
.....(ここで止まるのでC-cで中断する)F......
Finished tests in 2.729398s, 4.3966 tests/s, 10.6251 assertions/s.
- Failure:
test_getpty_nonexistent(TestPTY) [/home/yanagi/tmp/ruby/test/test_pty.rb:157]:
[ruby-dev:41965]
[Errno::ENOENT] exception expected, not
Class:
Message: <"">
---Backtrace---
/home/yanagi/tmp/ruby/test/test_pty.rb:159:ingetbyte' /home/yanagi/tmp/ruby/test/test_pty.rb:159:in
getpty'
/home/yanagi/tmp/ruby/test/test_pty.rb:159:in `block (2 levels) in test_getpty_nonexistent'
12 tests, 29 assertions, 1 failures, 0 errors, 0 skips
make: *** [yes-test-all] Error 1
=end
Updated by metanest (Makoto Kishimoto) about 14 years ago
=begin
うちの環境(4コアAMD64、FreeBSD8)でも起きます。
$ cpuset -l 0 make test-all TESTS='test_pty.rb'
のように CPU を制限すると確実に起きます。が
$ cpuset -l 0 make test-all TESTS='-n test_getpty_nonexistent test_pty.rb'
のように問題のテストだけ実行すると確実に起きなくなります。
=end
Updated by kyanagi (Kouhei Yanagita) about 14 years ago
=begin
CentOS 5.5/Xeon L3426の環境ですが、改めて調べてみると
make test-all TESTS="test_pty.rb"
→たまに止まる
make test-all TESTS="-n test_getpty_nonexistent test_pty.rb"
→止まることはない
taskset -c 0 make test-all TESTS="test_pty.rb"
→必ず止まる
taskset -c 0 make test-all TESTS="-n test_getpty_nonexistent test_pty.rb"
→止まることはない
という結果になりました。
=end
Updated by nagachika (Tomoyuki Chikanaga) about 14 years ago
=begin
うちでも起きる環境と起きない環境がありました。
起きない
RHEL ES 3.8 Kernel 2.4.21-47.ELsmp, Xeon 5160(2core) x 2CPU
Ubuntu 9.04 Kernel 2.6.28-19-generic, Pentium4 x 1CPU
起きる
RHEL ES 4.8 Kernel 2.6.9-89.ELlargesmp, Xeon X5570(2core) x 4CPU
起きた時に子プロセスのほうに gdb attach して確認してみました。
スタックトレースを最後に貼ります。
gvl_acquire での l.90 あたりの native_cond_wait で固まってますが、ここに入るのがおかしくて
(gdb) p vm->gvl.waiting_threads
$2 = (struct rb_thread_struct * volatile) 0x832f8b8
(gdb) p th
$3 = (rb_thread_t *) 0x82117a0
(gdb) p vm->gvl.acquired
$4 = 0
(gdb) p th->thread_id
$5 = 4160292544
(gdb) p vm->gvl.waiting_threads->thread_id
$6 = 4156005280
(gdb) info thread
2 Thread 4156771232 (LWP 18806) 0xffffe410 in __kernel_vsyscall ()
- 1 Thread 4160292544 (LWP 18804) 0xffffe410 in __kernel_vsyscall ()
(gdb) p vm->gvl.waiting_threads->native_thread_data.gvl_next
$7 = (struct rb_thread_struct *) 0x82117a0
(gdb) p vm->gvl.waiting_threads->native_thread_data.gvl_next->thread_id
$8 = 4160292544
(gdb) p vm->gvl.waiting_threads->native_thread_data.gvl_next->native_thread_data.gvl_next
$9 = (struct rb_thread_struct *) 0x0
と、おそらく vm->gvl_waiting_threads が fork 後に破棄された rb_thread_t を見ているような感じです。
以下、スタックトレースを貼ります。
#0 0xffffe410 in __kernel_vsyscall ()
#1 0x00837cf6 in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/tls/libpthread.so.0
#2 0x0814759b in native_cond_wait (cond=0x8211824, mutex=0x82114f4)
at thread_pthread.c:266
#3 0x08147297 in gvl_acquire (vm=0x82114f0, th=0x82117a0)
at thread_pthread.c:90
#4 0x08149478 in blocking_region_end (th=0x82117a0, region=0xffffa398)
at thread.c:1048
#5 0x0814a950 in do_select (n=11, read=0x0, write=0x84aa728, except=0x0,
timeout=0x0) at thread.c:2521
#6 0x0814aaba in rb_thread_wait_fd_rw (fd=10, read=0) at thread.c:2575
#7 0x0814ab1b in rb_thread_fd_writable (fd=10) at thread.c:2597
#8 0x0806a4ba in io_fflush (fptr=0x83365a0) at io.c:676
#9 0x0806b0aa in rb_io_flush (io=137483980) at io.c:1091
#10 0x080ac18e in rb_fork_err (status=0x0, chfunc=0xf7bfc599 ,
charg=0xffffa6ac, fds=4, errmsg=0xffffa68c "", errmsg_buflen=32)
at process.c:2575
#11 0xf7bfc768 in establishShell (argc=1, argv=0xf7c351bc, info=0xffffa710,
SlaveName=0xffffa700 "/dev/pts/21") at pty.c:248
#12 0xf7bfd261 in pty_getpty (argc=1, argv=0xf7c351bc, self=137648620)
at pty.c:545
#13 0x08134070 in call_cfunc (func=0xf7bfcf07 <pty_getpty>, recv=137648620,
len=-1, argc=1, argv=0xf7c351bc) at vm_insnhelper.c:316
#14 0x081349b4 in vm_call_cfunc (th=0x82117a0, reg_cfp=0xf7cb4bbc, num=1,
recv=137648620, blockptr=0x0, me=0x849b1a8) at vm_insnhelper.c:403
#15 0x08134e7e in vm_call_method (th=0x82117a0, cfp=0xf7cb4bbc, num=1,
blockptr=0x0, flag=0, id=16952, me=0x849b1a8, recv=137648620)
at vm_insnhelper.c:525
#16 0x081394c7 in vm_exec_core (th=0x82117a0, initial=0) at insns.def:1010
#17 0x081442d0 in vm_exec (th=0x82117a0) at vm.c:1150
#18 0x08142fd9 in invoke_block_from_c (th=0x82117a0, block=0xf7cb4d88,
self=138349680, argc=1, argv=0xffffb170, blockptr=0x0, cref=0x0)
at vm.c:561
#19 0x081430ea in vm_yield (th=0x82117a0, argc=1, argv=0xffffb170) at vm.c:591
#20 0x081404af in rb_yield_0 (argc=1, argv=0xffffb170) at vm_eval.c:740
#21 0x081404e8 in rb_yield (val=137536340) at vm_eval.c:750
#22 0x081597f7 in rb_ary_collect (ary=137536260) at array.c:2160
#23 0x08134080 in call_cfunc (func=0x8159756 <rb_ary_collect>, recv=137536260,
len=0, argc=0, argv=0xf7c35108) at vm_insnhelper.c:319
#24 0x081349b4 in vm_call_cfunc (th=0x82117a0, reg_cfp=0xf7cb4d74, num=0,
recv=137536260, blockptr=0xf7cb4d88, me=0x8272d40) at vm_insnhelper.c:403
#25 0x08134e7e in vm_call_method (th=0x82117a0, cfp=0xf7cb4d74, num=0,
blockptr=0xf7cb4d88, flag=0, id=1568, me=0x8272d40, recv=137536260)
at vm_insnhelper.c:525
#26 0x081394c7 in vm_exec_core (th=0x82117a0, initial=0) at insns.def:1010
#27 0x081442d0 in vm_exec (th=0x82117a0) at vm.c:1150
#28 0x08142fd9 in invoke_block_from_c (th=0x82117a0, block=0xf7cb4e38,
self=138349680, argc=1, argv=0xffffbbd0, blockptr=0x0, cref=0x0)
at vm.c:561
#29 0x081430ea in vm_yield (th=0x82117a0, argc=1, argv=0xffffbbd0) at vm.c:591
#30 0x081404af in rb_yield_0 (argc=1, argv=0xffffbbd0) at vm_eval.c:740
#31 0x081404e8 in rb_yield (val=137648400) at vm_eval.c:750
#32 0x0815793d in rb_ary_each (ary=137546080) at array.c:1429
#33 0x08134080 in call_cfunc (func=0x81578cb <rb_ary_each>, recv=137546080,
len=0, argc=0, argv=0xf7c350c0) at vm_insnhelper.c:319
#34 0x081349b4 in vm_call_cfunc (th=0x82117a0, reg_cfp=0xf7cb4e24, num=0,
recv=137546080, blockptr=0xf7cb4e38, me=0x82725a8) at vm_insnhelper.c:403
#35 0x08134e7e in vm_call_method (th=0x82117a0, cfp=0xf7cb4e24, num=0,
blockptr=0xf7cb4e38, flag=0, id=424, me=0x82725a8, recv=137546080)
at vm_insnhelper.c:525
#36 0x081394c7 in vm_exec_core (th=0x82117a0, initial=0) at insns.def:1010
#37 0x081442d0 in vm_exec (th=0x82117a0) at vm.c:1150
#38 0x08142fd9 in invoke_block_from_c (th=0x82117a0, block=0xf7cb4f40,
self=138349680, argc=1, argv=0xffffc630, blockptr=0x0, cref=0x0)
at vm.c:561
#39 0x081430ea in vm_yield (th=0x82117a0, argc=1, argv=0xffffc630) at vm.c:591
#40 0x081404af in rb_yield_0 (argc=1, argv=0xffffc630) at vm_eval.c:740
#41 0x081404e8 in rb_yield (val=137647980) at vm_eval.c:750
#42 0x0815793d in rb_ary_each (ary=137546320) at array.c:1429
#43 0x08134080 in call_cfunc (func=0x81578cb <rb_ary_each>, recv=137546320,
len=0, argc=0, argv=0xf7c35054) at vm_insnhelper.c:319
#44 0x081349b4 in vm_call_cfunc (th=0x82117a0, reg_cfp=0xf7cb4f2c, num=0,
recv=137546320, blockptr=0xf7cb4f40, me=0x82725a8) at vm_insnhelper.c:403
#45 0x08134e7e in vm_call_method (th=0x82117a0, cfp=0xf7cb4f2c, num=0,
blockptr=0xf7cb4f40, flag=0, id=424, me=0x82725a8, recv=137546320)
at vm_insnhelper.c:525
#46 0x081394c7 in vm_exec_core (th=0x82117a0, initial=0) at insns.def:1010
#47 0x081442d0 in vm_exec (th=0x82117a0) at vm.c:1150
#48 0x08144aa6 in rb_iseq_eval_main (iseqval=136494340) at vm.c:1391
#49 0x08059780 in ruby_exec_internal (n=0x822bd04) at eval.c:225
#50 0x08059867 in ruby_exec_node (n=0x822bd04) at eval.c:272
#51 0x08059841 in ruby_run_node (n=0x822bd04) at eval.c:265
#52 0x080583ae in main (argc=4, argv=0xffffd1b4) at main.c:38
=end
Updated by kosaki (Motohiro KOSAKI) about 14 years ago
- Category set to core
- Status changed from Open to Assigned
- Assignee set to kosaki (Motohiro KOSAKI)
=begin
わたしの手には負えないかもしれないですけど、誰も手を動かしていない気配があるのでとりあえず自分にアサインします。誰か手を動かしている人がいれば遠慮なく奪ってください
=end
Updated by kosaki (Motohiro KOSAKI) about 14 years ago
=begin
誰が悪いのかはわかりませんが、以下の2つのコミットが間違っていることと、現在のtrunkだと
fork時に子プロセスがGVLを再初期化するまえにGVLをacquireするケースがあるので、forkが動くのは
ただのミラクルということが分かりました。
あと、Process#fork()以外のfork(IO#popenとか)だとrb_thread_atforkが呼ばれてないので、
消滅したスレッドの掃除もされないし、GVLも再初期化されませんが、これもたぶんバグです。
たぶん直せそうな気がするのでクリスマス中にでもやっときます。
詳細なレポートをくれた、Chikanagaさんには感謝いっぱいです。
commit f54418b71f57e00d4a50bd86bf7d18b8b5195287
Author: matz matz@b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Date: Mon Nov 27 09:23:38 2000 +0000
matz
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@1053 b2dd03c8-39d4-4d8f-98ff-823fe69
+Mon Nov 27 00:10:08 2000 Yukihiro Matsumoto matz@ruby-lang.org
+
-
* io.c (rb_io_ctl): call ioctl/fcntl for fptr->f2 too.
-
* process.c (rb_f_fork): call rb_thread_atfork() after creating
-
child process.
-
* eval.c (rb_thread_atfork): kill all other threads immediately,
-
then turn the current thread into the main thread.
commit e4cf9c59fda102a80aa767c3a66a4738e26f6213
Author: ko1 ko1@b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Date: Sun Nov 28 13:00:49 2010 +0000
* thread_pthread.c: remove pthread_atfork().
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@29963 b2dd03c8-39d4-4d8f-98ff-823fe6
=end
Updated by kosaki (Motohiro KOSAKI) about 14 years ago
- Status changed from Assigned to Closed
- % Done changed from 0 to 100
=begin
This issue was solved with changeset r30350.
Kouhei, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.
=end