Project

General

Profile

Bug #13882

Exception in `ensure` stops threads from exiting

Added by zanker (Zachary Anker) 15 days ago. Updated 9 days ago.

Status:
Assigned
Priority:
Normal
Target version:
-
[ruby-core:82720]

Description

When the Ruby process is gracefully exiting, if a thread has an exception during an ensure block it appears the Ruby process forgets it's attempting to exit and will keep running forever. Since there's still an alive thread, rb_thread_terminate_all doesn't finish (since vm_living_thread_num(vm) > 1 is still true), and the Ruby process never exits until you kill -9 it.

I was able to cause this going back as far as MRI 2.0.0, but didn't have a 1.9.3 install to double check with. Repo case:

Thread.new do
  loop do
    puts "Loop start"

    begin
      begin
        sleep
      ensure
        raise
      end
    rescue => e
      p e
    end
  end
end

sleep 1
exit

Will result in a two Loop start messages.

When running GDB on the process, we see Ruby is waiting on sleep_forever which is expected, but you can see the main thread is stuck on rb_thread_terminate_all:

(gdb) t a a bt

Thread 3 (Thread 0x7f956f36e700 (LWP 401088)):
#0  0x00000033d90df113 in poll () from /lib64/libc.so.6
#1  0x00007f9575863775 in timer_thread_sleep (p=0x7f957524c008) at thread_pthread.c:1460
#2  thread_timer (p=0x7f957524c008) at thread_pthread.c:1568
#3  0x00000033d9407aa1 in start_thread () from /lib64/libpthread.so.0
#4  0x00000033d90e893d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f956f363700 (LWP 401117)):
#0  0x00000033d940b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f957585d959 in native_cond_wait (cond=<value optimized out>, mutex=<value optimized out>) at thread_pthread.c:343
#2  0x00007f95758675b8 in native_sleep (th=0x7f956c491800, timeout_tv=0x0) at thread_pthread.c:1147
#3  0x00007f95758682f2 in sleep_forever () at thread.c:1083
#4  rb_thread_sleep_forever () at thread.c:1157
#5  0x00007f95757e1ca5 in rb_f_sleep (argc=0, argv=0x7f956bce6a78) at process.c:4393
#6  0x00007f9575890bda in vm_call_cfunc_with_frame (th=0x7f956c491800, reg_cfp=0x7f956bde6940, calling=<value optimized out>, ci=0x7f956e98c910, cc=<value optimized out>)
    at vm_insnhelper.c:1752
#7  vm_call_cfunc (th=0x7f956c491800, reg_cfp=0x7f956bde6940, calling=<value optimized out>, ci=0x7f956e98c910, cc=<value optimized out>) at vm_insnhelper.c:1847
#8  0x00007f957589f12b in vm_exec_core (th=<value optimized out>, initial=<value optimized out>) at insns.def:1066
#9  0x00007f95758a470b in vm_exec (th=0x7f956c491800) at vm.c:1727
#10 0x00007f95758abc04 in invoke_block () at vm.c:969
#11 invoke_iseq_block_from_c () at vm.c:1014
#12 invoke_block_from_c_splattable () at vm.c:1032
#13 vm_yield () at vm.c:1074
#14 rb_yield_0 () at vm_eval.c:1010
#15 loop_i () at vm_eval.c:1088
#16 0x00007f9575755954 in rb_rescue2 (b_proc=0x7f95758ab8b0 <loop_i>, data1=0, r_proc=0x7f957588b2a0 <loop_stop>, data2=0) at eval.c:838
#17 0x00007f9575890bda in vm_call_cfunc_with_frame (th=0x7f956c491800, reg_cfp=0x7f956bde69a0, calling=<value optimized out>, ci=0x7f956c5090f0, cc=<value optimized out>)
    at vm_insnhelper.c:1752
#18 vm_call_cfunc (th=0x7f956c491800, reg_cfp=0x7f956bde69a0, calling=<value optimized out>, ci=0x7f956c5090f0, cc=<value optimized out>) at vm_insnhelper.c:1847
#19 0x00007f95758a726b in vm_call_method (th=0x7f956c491800, cfp=0x7f956bde69a0, calling=<value optimized out>, ci=<value optimized out>, cc=<value optimized out>)
    at vm_insnhelper.c:2295
#20 0x00007f957589f97c in vm_exec_core (th=<value optimized out>, initial=<value optimized out>) at insns.def:967
#21 0x00007f95758a470b in vm_exec (th=0x7f956c491800) at vm.c:1727
#22 0x00007f95758a5771 in invoke_block (th=0x7f956c491800, captured=<value optimized out>, self=140279789266160, argc=<value optimized out>, argv=<value optimized out>,
    passed_block_handler=<value optimized out>, cref=0x0, splattable=0, is_lambda=0) at vm.c:969
#23 invoke_iseq_block_from_c (th=0x7f956c491800, captured=<value optimized out>, self=140279789266160, argc=<value optimized out>, argv=<value optimized out>,
    passed_block_handler=<value optimized out>, cref=0x0, splattable=0, is_lambda=0) at vm.c:1014
#24 0x00007f95758a581f in invoke_block_from_c_unsplattable (th=<value optimized out>, block=<value optimized out>, self=<value optimized out>, argc=<value optimized out>,
    argv=<value optimized out>, passed_block_handler=<value optimized out>, is_lambda=<value optimized out>) at vm.c:1101
#25 0x00007f95758a595a in vm_invoke_proc (th=0x7f956c491800, proc=0x7f956e04da50, self=140279789266160, argc=0, argv=0x7f956c547c18, passed_block_handler=0) at vm.c:1126
#26 0x00007f9575864589 in thread_do_start (th=0x7f956c491800, stack_start=0x7f956f364000) at thread.c:577
#27 thread_start_func_2 (th=0x7f956c491800, stack_start=0x7f956f364000) at thread.c:619
#28 0x00007f95758649b6 in thread_start_func_1 (th_ptr=0x7f956c491800) at thread_pthread.c:887
#29 0x00000033d9407aa1 in start_thread () from /lib64/libpthread.so.0
#30 0x00000033d90e893d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f95756ab760 (LWP 401080)):
#0  0x00000033d940b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f957585d959 in native_cond_wait (cond=<value optimized out>, mutex=<value optimized out>) at thread_pthread.c:343
#2  0x00007f95758675b8 in native_sleep (th=0x7f9575229400, timeout_tv=0x0) at thread_pthread.c:1147
#3  0x00007f9575867e34 in rb_thread_terminate_all () at thread.c:494
#4  0x00007f9575758466 in ruby_cleanup (ex=6) at eval.c:186
#5  0x00007f9575758725 in ruby_run_node (n=0x7f956c54c6b8) at eval.c:300
#6  0x00000000004008eb in main (argc=2, argv=0x7fffccd8e7c8) at main.c:36
(gdb)

Looking at the thread state shows that the rb_threadptr_to_kill executed properly, because to_kill is properly set, but the errinfo was nilled out:

(gdb) p ruby_current_thread
$1 = (rb_thread_t *) 0x7f8cd06eb800
(gdb) p ruby_current_thread->to_kill
$2 = 1
(gdb) p ruby_current_thread->status
$3 = THREAD_STOPPED
(gdb) p ruby_current_thread->errinfo
$4 = 8

I'm happy to contribute a patch, but not quite sure what the appropriate fix for this would be. It looks like the issue is an exception in ensure is causing the TAG_FATAL on errinfo to be overwritten so the thread doesn't think it should exit, and the right fix is to also check if to_kill is set.

History

#1 [ruby-core:82803] Updated by ko1 (Koichi Sasada) 9 days ago

  • Assignee set to ko1 (Koichi Sasada)
  • Status changed from Open to Assigned

Also available in: Atom PDF