Project

General

Profile

Bug #15320

IO.popen with MJIT worker thread may deadlock

Added by k0kubun (Takashi Kokubun) 9 months ago. Updated 9 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:89877]

Description

In r65785, we added mjit_pause on before_fork_ruby. We seem to have introduced a few deadlocks by that. This was discussed in https://bugs.ruby-lang.org/issues/15316, but I filed a separate ticket because this issue is independent from postponed_job race.

Example deadlock:

_pthread_mutex_unlock_usercnt (decr=1, mutex=mutex@entry=0x55a4ee3b7520 <mjit_engine_mutex>) at pthread_mutex_unlock.c:38
>>> Machine level backtrace

Thread 2 (Thread 0x7f9ea8981700 (LWP 20561)):
#0  0x00007f9ec2b4acf6 in __GI_ppoll (fds=fds@entry=0x7f9ea897fb08, nfds=nfds@entry=1, timeout=<optimized out>, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1  0x000055a4edfd0f37 in rb_sigwait_sleep (th=th@entry=0x0, sigwait_fd=sigwait_fd@entry=3, rel=rel@entry=0x0) at /home/ko1/ruby/src/trunk-mjit-wait/thread_pthread.c:1969
#2  0x000055a4edf3b0dc in ruby_waitpid_locked (vm=vm@entry=0x55a4ee90b950, pid=pid@entry=20590, status=status@entry=0x7f9ea897fbfc, options=options@entry=0, cond=cond@entry=0x7f9ea897fc00) at /home/ko1/ruby/src/trunk-mjit-wait/process.c:1070
#3  0x000055a4eded2e82 in exec_process (path=<optimized out>, argv=argv@entry=0x7f9ebc0061e0) at /home/ko1/ruby/src/trunk-mjit-wait/mjit_worker.c:611
#4  0x000055a4eded4528 in link_o_to_so (so_file=0x7f9ea897fc70 "/tmp/_ruby_mjit_p27024u127.so", o_files=0x7f9ea897fd20) at /home/ko1/ruby/src/trunk-mjit-wait/mjit_worker.c:824
#5  convert_unit_to_func (unit=unit@entry=0x55a4eeb2f790, cc_entries=<optimized out>, is_entries=<optimized out>) at /home/ko1/ruby/src/trunk-mjit-wait/mjit_worker.c:1090
#6  0x000055a4eded4e52 in mjit_worker () at /home/ko1/ruby/src/trunk-mjit-wait/mjit_worker.c:1217
#7  0x000055a4edfcb15a in mjit_worker (arg=0x55a4eded4a20 <mjit_worker>) at /home/ko1/ruby/src/trunk-mjit-wait/thread_pthread.c:1876
#8  0x00007f9ec3a916db in start_thread (arg=0x7f9ea8981700) at pthread_create.c:463
#9  0x00007f9ec2b5788f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f9ec40df740 (LWP 27024)):
#0  __pthread_mutex_unlock_usercnt (decr=1, mutex=mutex@entry=0x55a4ee3b7520 <mjit_engine_mutex>) at pthread_mutex_unlock.c:38
#1  __GI___pthread_mutex_unlock (mutex=mutex@entry=0x55a4ee3b7520 <mjit_engine_mutex>) at pthread_mutex_unlock.c:345
#2  0x000055a4edfcf3e9 in rb_native_mutex_unlock (lock=lock@entry=0x55a4ee3b7520 <mjit_engine_mutex>) at /home/ko1/ruby/src/trunk-mjit-wait/thread_pthread.c:400
#3  0x000055a4eded6659 in CRITICAL_SECTION_FINISH (msg=0x55a4ee0d7226 "in stop_worker", level=3) at /home/ko1/ruby/src/trunk-mjit-wait/mjit_worker.c:412
#4  stop_worker () at /home/ko1/ruby/src/trunk-mjit-wait/mjit.c:689
#5  mjit_pause (wait_p=wait_p@entry=0) at /home/ko1/ruby/src/trunk-mjit-wait/mjit.c:719
#6  0x000055a4edf3e603 in before_fork_ruby () at /home/ko1/ruby/src/trunk-mjit-wait/process.c:1510
#7  rb_fork_ruby (status=status@entry=0x7ffde545cce8) at /home/ko1/ruby/src/trunk-mjit-wait/process.c:4011
#8  0x000055a4edeb1482 in pipe_open (execarg_obj=execarg_obj@entry=8, fmode=3, convconfig=convconfig@entry=0x7ffde545cdf0, modestr=<optimized out>) at /home/ko1/ruby/src/trunk-mjit-wait/io.c:6592
#9  0x000055a4edeb29e2 in rb_io_s_popen (argc=<optimized out>, argv=<optimized out>, klass=94166865734200) at /home/ko1/ruby/src/trunk-mjit-wait/io.c:6868
#10 0x000055a4ee00abc5 in vm_call_cfunc_with_frame (ci=0x55a4eef36ca0, cc=<optimized out>, calling=<optimized out>, reg_cfp=0x7f9ec40dd9b8, ec=0x55a4ee90c2e8) at /home/ko1/ruby/src/trunk-mjit-wait/vm_insnhelper.c:1908
#11 vm_call_cfunc (ec=0x55a4ee90c2e8, reg_cfp=0x7f9ec40dd9b8, calling=<optimized out>, ci=0x55a4eef36ca0, cc=<optimized out>) at /home/ko1/ruby/src/trunk-mjit-wait/vm_insnhelper.c:1924
#12 0x000055a4ee016033 in vm_call_method (ec=0x55a4ee90c2e8, cfp=0x7f9ec40dd9b8, calling=<optimized out>, ci=<optimized out>, cc=<optimized out>) at /home/ko1/ruby/src/trunk-mjit-wait/vm_insnhelper.c:2397
#13 0x000055a4ee01ce93 in vm_exec_core (ec=ec@entry=0x55a4ee90c2e8, initial=initial@entry=0) at ../../src/trunk-mjit-wait/insns.def:751
#14 0x000055a4ee01292c in rb_vm_exec (ec=0x55a4ee90c2e8, mjit_enable_p=1) at /home/ko1/ruby/src/trunk-mjit-wait/vm.c:1825
#15 0x00007f9ea8985491 in ?? ()
#16 0x000055a4ee90c2e8 in ?? ()
#17 0x00007ffde545d280 in ?? ()
#18 0x00007ffd00000000 in ?? ()
#19 0x0000000000000003 in ?? ()
#20 0x0000000000000000 in ?? ()
>>> Dump Ruby level backtrace
th: 0x55a4ee90be60, native_id: 0x7f9ec40df740
-- Control frame information -----------------------------------------------
c:0030 p:---- s:0190 e:000189 CFUNC  :popen
c:0029 p:0015 s:0184 e:000183 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/ruby/test_process.rb:1776
c:0028 p:0059 s:0177 e:000176 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit.rb:1181 [FINISH]
c:0027 p:0058 s:0171 e:000170 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/minitest/unit.rb:1291 [FINISH]
c:0026 p:0015 s:0162 e:000161 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit/testcase.rb:18
c:0025 p:0082 s:0157 e:000156 BLOCK  /home/ko1/ruby/src/trunk-mjit-wait/test/lib/minitest/unit.rb:952 [FINISH]
c:0024 p:---- s:0150 e:000149 CFUNC  :map
c:0023 p:0133 s:0146 e:000145 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/minitest/unit.rb:945
c:0022 p:0045 s:0134 e:000133 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit.rb:1044
c:0021 p:0011 s:0127 e:000125 BLOCK  /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit.rb:552 [FINISH]
c:0020 p:---- s:0121 e:000120 CFUNC  :each
c:0019 p:0057 s:0117 e:000116 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit.rb:550
c:0018 p:0009 s:0110 e:000109 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit.rb:587
c:0017 p:0460 s:0103 e:000101 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit.rb:512
c:0016 p:0048 s:0089 e:000088 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit.rb:548
c:0015 p:0009 s:0082 e:000081 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit.rb:587
c:0014 p:0155 s:0075 e:000074 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/minitest/unit.rb:888
c:0013 p:0018 s:0061 e:000060 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit.rb:981
c:0012 p:0006 s:0056 e:000055 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/minitest/unit.rb:1110
c:0011 p:0007 s:0052 e:000051 BLOCK  /home/ko1/ruby/src/trunk-mjit-wait/test/lib/minitest/unit.rb:1097 [FINISH]
c:0010 p:---- s:0048 e:000047 CFUNC  :each
c:0009 p:0058 s:0044 e:000043 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/minitest/unit.rb:1096
c:0008 p:0016 s:0039 e:000038 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/minitest/unit.rb:1084
c:0007 p:0009 s:0034 e:000033 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit.rb:790
c:0006 p:0009 s:0028 e:000027 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit.rb:620
c:0005 p:0017 s:0022 e:000021 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit.rb:33
c:0004 p:0037 s:0017 e:000016 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit.rb:1143
c:0003 p:0011 s:0013 e:000012 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit.rb:1147
c:0002 p:0261 s:0008 E:001458 EVAL   /home/ko1/ruby/src/trunk-mjit-wait/test/runner.rb:33 [FINISH]
c:0001 p:0000 s:0003 E:001230 (none) [FINISH]

As the backtrace looks similar to https://bugs.ruby-lang.org/issues/14867#note-82, I suspect IO.popen has race with MJIT worker for waiting_pids handling and RUBY_VM_CHECK_INTS inside before_fork_ruby might unexpectedly proceed SIGCHLD handler (for gcc process) while waitpid list is not prepared for it yet.


Related issues

Related to Ruby master - Bug #15316: rb_postponed_job_register not thread-safeClosedActions
Related to Ruby master - Bug #14867: Process.wait can wait for MJIT compiler processClosedActions

Associated revisions

Revision e205cd80
Added by k0kubun (Takashi Kokubun) 9 months ago

process.c: do not run signal handler before fork

to prevent from proceeding one for MJIT while it's not safe yet.
By that situation, MJIT worker could be waiting for compiler process forever
http://ci.rvm.jp/results/trunk-mjit@silicon-docker/1468033

[Bug #15320]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65807 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 65807
Added by k0kubun (Takashi Kokubun) 9 months ago

process.c: do not run signal handler before fork

to prevent from proceeding one for MJIT while it's not safe yet.
By that situation, MJIT worker could be waiting for compiler process forever
http://ci.rvm.jp/results/trunk-mjit@silicon-docker/1468033

[Bug #15320]

Revision 65807
Added by k0kubun (Takashi Kokubun) 9 months ago

process.c: do not run signal handler before fork

to prevent from proceeding one for MJIT while it's not safe yet.
By that situation, MJIT worker could be waiting for compiler process forever
http://ci.rvm.jp/results/trunk-mjit@silicon-docker/1468033

[Bug #15320]

Revision 3405411c
Added by k0kubun (Takashi Kokubun) 9 months ago

process.c: do not try to pause MJIT

while child handler is disabled.

trying to fix [Bug #15320]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65817 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 65817
Added by k0kubun (Takashi Kokubun) 9 months ago

process.c: do not try to pause MJIT

while child handler is disabled.

trying to fix [Bug #15320]

Revision 65817
Added by k0kubun (Takashi Kokubun) 9 months ago

process.c: do not try to pause MJIT

while child handler is disabled.

trying to fix [Bug #15320]

History

#1

Updated by k0kubun (Takashi Kokubun) 9 months ago

  • Related to Bug #15316: rb_postponed_job_register not thread-safe added
#2

Updated by k0kubun (Takashi Kokubun) 9 months ago

  • Related to Bug #14867: Process.wait can wait for MJIT compiler process added
#3

Updated by k0kubun (Takashi Kokubun) 9 months ago

  • Status changed from Open to Closed

Applied in changeset trunk|r65807.


process.c: do not run signal handler before fork

to prevent from proceeding one for MJIT while it's not safe yet.
By that situation, MJIT worker could be waiting for compiler process forever
http://ci.rvm.jp/results/trunk-mjit@silicon-docker/1468033

[Bug #15320]

#4

Updated by k0kubun (Takashi Kokubun) 9 months ago

  • Status changed from Closed to Open
  • Subject changed from IO.popen with MJIT worker thread may deadlock to Process.daemon in IO.popen block with MJIT worker thread may deadlock
#5

Updated by k0kubun (Takashi Kokubun) 9 months ago

  • Subject changed from Process.daemon in IO.popen block with MJIT worker thread may deadlock to IO.popen with MJIT worker thread may deadlock
#6

Updated by k0kubun (Takashi Kokubun) 9 months ago

  • Status changed from Open to Closed

Applied in changeset trunk|r65817.


process.c: do not try to pause MJIT

while child handler is disabled.

trying to fix [Bug #15320]

Also available in: Atom PDF