Project

General

Profile

Actions

Bug #17671

closed

Possible deadlock with condition variables in Net::IMAP

Added by stanhu (Stan Hu) about 3 years ago. Updated about 3 years ago.


Description

We're using Net::IMAP via the MailRoom gem, and quite frequently we are seeing issues with the process not terminating even though we attempt to run #Thread#join with a 60-second timeout.

A GDB backtrace shows that this is stuck waiting for a conditional variable:

Thread 2 (Thread 0x7f2bf81ff700 (LWP 11)):
#0  futex_wait_cancelable (private=0, expected=0, futex_word=0x7f2bfa3df810) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x7f2bfa3df840, cond=0x7f2bfa3df7e8) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=cond@entry=0x7f2bfa3df7e8, mutex=mutex@entry=0x7f2bfa3df840) at pthread_cond_wait.c:655
#3  0x00007f2bffe343fd in rb_native_cond_wait (mutex=0x7f2bfa3df840, cond=0x7f2bfa3df7e8) at thread_pthread.c:1197
#4  native_cond_sleep (th=th@entry=0x7f2bfa3df780, rel=rel@entry=0x0) at thread_pthread.c:1197
#5  0x00007f2bffe346f4 in native_sleep (th=th@entry=0x7f2bfa3df780, rel=0x0) at thread_pthread.c:2113
#6  0x00007f2bffe376de in sleep_forever (fl=1, th=0x7f2bfa3df780) at thread.c:1252
#7  rb_thread_sleep_deadly_allow_spurious_wakeup () at thread.c:1343
#8  rb_mutex_sleep_forever (time=time@entry=8) at thread_sync.c:451
#9  0x00007f2bffcce134 in rb_ensure (b_proc=b_proc@entry=0x7f2bffe37660 <rb_mutex_sleep_forever>, data1=data1@entry=8, e_proc=e_proc@entry=0x7f2bffe36160 <mutex_lock_uninterruptible>,
    data2=data2@entry=139826840321280) at eval.c:1129
#10 0x00007f2bffe361bd in rb_mutex_sleep (self=139826840321280, timeout=<optimized out>) at thread_sync.c:477
#11 0x00007f2bffe796e1 in vm_call0_cfunc_with_frame (argv=0x7f2bf80fc858, cd=<optimized out>, calling=0x7f2bf80fc700, ec=0x7f2bfa209bd0) at vm_eval.c:91
#12 vm_call0_cfunc (argv=0x7f2bf80fc858, cd=<optimized out>, calling=0x7f2bf80fc700, ec=0x7f2bfa209bd0) at vm_eval.c:105
#13 vm_call0_body (ec=0x7f2bfa209bd0, calling=0x7f2bf80fc700, cd=<optimized out>, argv=0x7f2bf80fc858) at vm_eval.c:140
#14 0x00007f2bffe7bfd8 in rb_funcallv_with_cc (cd=0x7f2bfffa75c0 <rb_funcallv_data>, recv=139826840321280, mid=24193, argc=1, argv=<optimized out>) at vm_core.h:1805
#15 0x00007f2bffcce134 in rb_ensure (b_proc=b_proc@entry=0x7f2bffe2f360 <do_sleep>, data1=data1@entry=139826822105168, e_proc=e_proc@entry=0x7f2bffe2df60 <delete_from_waitq>, data2=data2@entry=139826822105184)
    at eval.c:1129
#16 0x00007f2bffe2f472 in rb_condvar_wait (argc=<optimized out>, argv=0x7f2bf80fcad0, self=139826858411400) at thread_sync.c:1407
#17 0x00007f2bffe796e1 in vm_call0_cfunc_with_frame (argv=0x7f2bf80fcad0, cd=<optimized out>, calling=0x7f2bf80fc980, ec=0x7f2bfa209bd0) at vm_eval.c:91
#18 vm_call0_cfunc (argv=0x7f2bf80fcad0, cd=<optimized out>, calling=0x7f2bf80fc980, ec=0x7f2bfa209bd0) at vm_eval.c:105
#19 vm_call0_body (ec=0x7f2bfa209bd0, calling=0x7f2bf80fc980, cd=<optimized out>, argv=0x7f2bf80fcad0) at vm_eval.c:140
#20 0x00007f2bffe79d5f in rb_vm_call0 (ec=ec@entry=0x7f2bfa209bd0, recv=recv@entry=139826858411400, id=id@entry=24353, argc=<optimized out>, argv=<optimized out>, me=me@entry=0x7f2bfef8b9d0, kw_splat=0)
    at vm_eval.c:52
#21 0x00007f2bffe79ff8 in rb_vm_call_kw (ec=0x7f2bfa209bd0, recv=139826858411400, id=24353, argc=<optimized out>, argc@entry=2, argv=<optimized out>, argv@entry=0x7f2bf80fcab0, me=0x7f2bfef8b9d0,
    kw_splat=<optimized out>) at vm_eval.c:268
#22 0x00007f2bffe7a453 in rb_call0 (ec=<optimized out>, recv=<optimized out>, mid=<optimized out>, argc=argc@entry=2, argv=argv@entry=0x7f2bf80fcab0, call_scope=call_scope@entry=CALL_FCALL,
    self=<optimized out>) at vm_eval.c:392
#23 0x00007f2bffe7ac7e in rb_call (scope=CALL_FCALL, argv=0x7f2bf80fcab0, argc=2, mid=<optimized out>, recv=<optimized out>) at vm_eval.c:718
#24 rb_funcallv (recv=<optimized out>, mid=<optimized out>, argc=argc@entry=2, argv=argv@entry=0x7f2bf80fcad0) at vm_eval.c:958
#25 0x00007f2bff2f62e1 in monitor_wait_for_cond_body (v=v@entry=139826822106112) at monitor.c:152
#26 0x00007f2bffcce134 in rb_ensure (b_proc=b_proc@entry=0x7f2bff2f62a0 <monitor_wait_for_cond_body>, data1=data1@entry=139826822106112, e_proc=e_proc@entry=0x7f2bff2f64a0 <monitor_enter_for_cond>,
    data2=data2@entry=139826822106112) at eval.c:1129
#27 0x00007f2bff2f641e in monitor_wait_for_cond (monitor=139826840321320, cond=<optimized out>, timeout=8) at monitor.c:180
#28 0x00007f2bffe66c59 in vm_call_cfunc_with_frame (empty_kw_splat=<optimized out>, cd=0x7f2bfe9c8b70, calling=<optimized out>, reg_cfp=0x7f2bf81fd268, ec=0x7f2bfa209bd0) at vm_insnhelper.c:2514
#29 vm_call_cfunc (ec=0x7f2bfa209bd0, reg_cfp=0x7f2bf81fd268, calling=<optimized out>, cd=0x7f2bfe9c8b70) at vm_insnhelper.c:2539
#30 0x00007f2bffe72092 in vm_sendish (block_handler=<optimized out>, method_explorer=<optimized out>, cd=<optimized out>, reg_cfp=<optimized out>, ec=<optimized out>) at vm_insnhelper.c:4023
#31 vm_exec_core (ec=0x7f2bfa3df810, initial=128) at insns.def:801
#32 0x00007f2bffe78365 in rb_vm_exec (ec=0x7f2bfa209bd0, mjit_enable_p=1) at vm.c:1929
#33 0x00007f2bffe82130 in invoke_block (captured=0x7f2bf81fd3d0, opt_pc=<optimized out>, type=<optimized out>, cref=<optimized out>, self=139826840321400, iseq=0x7f2bfef91600, ec=0x7f2bfa209bd0) at vm.c:1044
#34 invoke_iseq_block_from_c (me=0x0, is_lambda=0, cref=<optimized out>, passed_block_handler=<optimized out>, kw_splat=<optimized out>, argv=<optimized out>, argc=0, self=139826840321400,
    captured=0x7f2bf81fd3d0, ec=0x7f2bfa209bd0) at vm.c:1116
#35 invoke_block_from_c_bh (argc=<optimized out>, argv=<optimized out>, kw_splat=<optimized out>, passed_block_handler=<optimized out>, cref=<optimized out>, is_lambda=<optimized out>,
    force_blockarg=<optimized out>, block_handler=<optimized out>, ec=<optimized out>) at vm.c:1134
#36 vm_yield (argc=<optimized out>, argv=<optimized out>, kw_splat=<optimized out>, ec=<optimized out>) at vm.c:1179
#37 rb_yield_0 (argv=<optimized out>, argc=<optimized out>) at vm_eval.c:1227
#38 rb_yield_values (n=<optimized out>) at vm_eval.c:1252
#39 0x00007f2bffe82eef in vm_yield (kw_splat=0, argv=<optimized out>, argc=<optimized out>, ec=<optimized out>) at vm.c:1165
#40 rb_yield_0 (argv=<optimized out>, argc=<optimized out>) at vm_eval.c:1227
#41 rb_yield_values2 (argc=<optimized out>, argv=<optimized out>) at vm_eval.c:1273
#42 0x00007f2bffcce134 in rb_ensure (b_proc=0x7f2bff2f6380 <monitor_sync_body>, data1=139826840321320, e_proc=<optimized out>, data2=<optimized out>) at eval.c:1129
#43 0x00007f2bffe66c59 in vm_call_cfunc_with_frame (empty_kw_splat=<optimized out>, cd=0x7f2bfa4f19e0, calling=<optimized out>, reg_cfp=0x7f2bf81fd380, ec=0x7f2bfa209bd0) at vm_insnhelper.c:2514
#44 vm_call_cfunc (ec=0x7f2bfa209bd0, reg_cfp=0x7f2bf81fd380, calling=<optimized out>, cd=0x7f2bfa4f19e0) at vm_insnhelper.c:2539
#45 0x00007f2bffe72150 in vm_sendish (method_explorer=<optimized out>, block_handler=<optimized out>, cd=<optimized out>, reg_cfp=<optimized out>, ec=<optimized out>) at vm_insnhelper.c:4023
#46 vm_exec_core (ec=0x7f2bfa3df810, initial=128) at insns.def:782
#47 0x00007f2bffe78365 in rb_vm_exec (ec=0x7f2bfa209bd0, mjit_enable_p=1) at vm.c:1929
#48 0x00007f2bffe792bc in invoke_iseq_block_from_c (me=0x0, is_lambda=<optimized out>, cref=0x0, passed_block_handler=0, kw_splat=<optimized out>, argv=<optimized out>, argc=0, self=139826857723800,
    captured=<optimized out>, ec=0x7f2bfa209bd0) at vm.c:1116
#49 invoke_block_from_c_proc (me=0x0, is_lambda=<optimized out>, passed_block_handler=0, kw_splat=<optimized out>, argv=<optimized out>, argc=0, self=139826857723800, proc=0x7f2bfa209bd0, ec=0x7f2bfa209bd0)
    at vm.c:1216
#50 vm_invoke_proc (passed_block_handler=0, kw_splat=<optimized out>, argv=<optimized out>, argc=0, self=139826857723800, proc=0x7f2bfa209bd0, ec=0x7f2bfa209bd0) at vm.c:1238
#51 rb_vm_invoke_proc (ec=0x7f2bfa209bd0, proc=proc@entry=0x7f2bf98a4510, argc=0, argv=<optimized out>, kw_splat=<optimized out>, passed_block_handler=passed_block_handler@entry=0) at vm.c:1259
#52 0x00007f2bffe3646d in thread_do_start (th=0x7f2bfa3df780) at thread.c:697
#53 0x00007f2bffe384ef in thread_start_func_2 (th=0x7f2bfa3df780, stack_start=<optimized out>) at thread.c:745
#54 0x00007f2bffe38a34 in thread_start_func_1 (th_ptr=<optimized out>) at thread_pthread.c:969
#55 0x00007f2bff9c5fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#56 0x00007f2bff4574cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f2bff33ff80 (LWP 7)):
#0  0x00007f2bff44c916 in __GI_ppoll (fds=fds@entry=0x7fff52ea2f78, nfds=nfds@entry=1, timeout=<optimized out>, timeout@entry=0x7fff52ea2f80, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1  0x00007f2bffe33880 in rb_sigwait_sleep (th=th@entry=0x7f2bfee2d000, sigwait_fd=sigwait_fd@entry=3, rel=rel@entry=0x7fff52ea3088) at hrtime.h:148
#2  0x00007f2bffe34589 in native_sleep (th=0x7f2bfee2d000, rel=0x7fff52ea3088) at thread_pthread.c:2099
#3  0x00007f2bffe37b64 in rb_thread_terminate_all () at thread.c:587
#4  0x00007f2bffccfbaa in rb_ec_cleanup (ec=ec@entry=0x7f2bfee09050, ex=<optimized out>) at eval.c:218
#5  0x00007f2bffccfed3 in ruby_run_node (n=0x7f2bfa503a98) at eval.c:336
#6  0x000055dabf0fe10b in main (argc=<optimized out>, argv=<optimized out>) at ./main.c:50

Net::IMAP uses several conditional variables:

  1. idle : https://github.com/ruby/ruby/blob/48f324e92f9b36edc267f9871e35039cbd1c2eb9/lib/net/imap.rb#L965
  2. get_tagged_response: https://github.com/ruby/ruby/blob/48f324e92f9b36edc267f9871e35039cbd1c2eb9/lib/net/imap.rb#L1215
  3. send_literal: https://github.com/ruby/ruby/blob/48f324e92f9b36edc267f9871e35039cbd1c2eb9/lib/net/imap.rb#L1368

The last two do NOT have a timeout, so it's possible we're getting stuck in one of those cases.

I noticed that send_literal only checks the state of @exception after the wait returns. Do we need to do this?

diff --git a/lib/net/imap.rb b/lib/net/imap.rb
index 720acbc86d..2e1fd6e7b9 100644
--- a/lib/net/imap.rb
+++ b/lib/net/imap.rb
@@ -1365,6 +1365,7 @@ def send_literal(str, tag = nil)
         @continued_command_tag = tag
         @continuation_request_exception = nil
         begin
+          raise @exception if @exception
           @continuation_request_arrival.wait
           e = @continuation_request_exception || @exception
           raise e if e

Updated by jeremyevans0 (Jeremy Evans) about 3 years ago

  • Status changed from Open to Assigned
  • Assignee set to shugo (Shugo Maeda)

Can you please file this issue in the net/imap repository: https://github.com/ruby/net-imap

Updated by stanhu (Stan Hu) about 3 years ago

Filed under https://github.com/ruby/net-imap/issues/14. Feel free to close this.

Actions #3

Updated by jeremyevans0 (Jeremy Evans) about 3 years ago

  • Status changed from Assigned to Closed
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0