Backport #7134

Signal handling bug in Mac OS X

Added by Andy Kitchen almost 3 years ago. Updated over 2 years ago.

[ruby-core:47880]
Status:Closed
Priority:Normal
Assignee:Motohiro KOSAKI

Description

On Mac OS X, running the attached program causes the exception below about 1/10 times it is run.

$ ruby hup.rb
hup.rb:2:in write': deadlock; recursive locking (ThreadError)
from hup.rb:2:in
puts'
from hup.rb:2:in puts'
from hup.rb:2:in
block in '
from hup.rb:6:in call'
from hup.rb:6:in
write'
from hup.rb:6:in puts'
from hup.rb:6:in
puts'
from hup.rb:6:in `'

The expected output is:

In Hup Handler
Finished...

or

Finished...
In Hup Handler

My ruby is compiled with clang:

$ clang --version
Apple clang version 4.1 (tags/Apple/clang-421.11.65) (based on LLVM 3.1svn)
Target: x86_64-apple-darwin12.2.0
Thread model: posix

hup.rb Magnifier - execution of this script causes strange behaviour (94 Bytes) Andy Kitchen, 10/10/2012 05:36 PM

mutex_notrap.patch Magnifier (4.32 KB) Motohiro KOSAKI, 11/06/2012 04:40 AM

Associated revisions

Revision 38398
Added by Motohiro KOSAKI over 2 years ago

  • io.c (internal_write_func2): new helper function for rb_write_internal2().
  • io.c (rb_write_internal2): new function. it uses rb_thread_call_without_gvl2() instaed of rb_thread_io_blocking_region().
  • io.c (rb_binwrite_string): uses rb_write_internal2 instead of rb_write_internal. [Bug #7134]

History

#1 Updated by Gustavo Anatoly Fernandes Velasquez Solis almost 3 years ago

This error it happens when kill is called:

https://gist.github.com/3906937

Followed by:

https://gist.github.com/3906907

But I'm trying to understand the bug causes.

#2 Updated by Gustavo Anatoly Fernandes Velasquez Solis almost 3 years ago

Hi,

Could somenone please, explain if is possible use recursive lock (PTHREAD_MUTEX_RECURSIVE) attributes in pthread.h, to fix this bug?

#3 Updated by Yusuke Endoh over 2 years ago

  • Assignee set to Motohiro KOSAKI
  • Status changed from Open to Assigned

Kosaki-san, do you know anything about this?

Yusuke Endoh mame@tsg.ne.jp

#4 Updated by Gustavo Anatoly Fernandes Velasquez Solis over 2 years ago

Hi, Yusuke.

I was studying this bug and I did a little bit changes to test

git://gist.github.com/4017432.git

because self lock is not allowed on ruby, please correct me if I wrong.

Thanks.

2012/11/5 mame (Yusuke Endoh) mame@tsg.ne.jp

Issue #7134 has been updated by mame (Yusuke Endoh).

Status changed from Open to Assigned
Assignee set to kosaki (Motohiro KOSAKI)

Kosaki-san, do you know anything about this?

Yusuke Endoh mame@tsg.ne.jp

Bug #7134: Signal handling bug in Mac OS X
https://bugs.ruby-lang.org/issues/7134#change-32417

Author: auastro (Andy Kitchen)
Status: Assigned
Priority: Normal
Assignee: kosaki (Motohiro KOSAKI)
Category:
Target version: 1.9.3
ruby -v: ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin12]

On Mac OS X, running the attached program causes the exception below about
1/10 times it is run.

$ ruby hup.rb
hup.rb:2:in write': deadlock; recursive locking (ThreadError)
from hup.rb:2:in
puts'
from hup.rb:2:in puts'
from hup.rb:2:in
block in '
from hup.rb:6:in call'
from hup.rb:6:in
write'
from hup.rb:6:in puts'
from hup.rb:6:in
puts'
from hup.rb:6:in `'

The expected output is:

In Hup Handler
Finished...

or

Finished...
In Hup Handler

My ruby is compiled with clang:

$ clang --version
Apple clang version 4.1 (tags/Apple/clang-421.11.65) (based on LLVM 3.1svn)
Target: x86_64-apple-darwin12.2.0
Thread model: posix

http://bugs.ruby-lang.org/

--
» Gustavo Anatoly Fernandes Velásquez Solís
» Information Systems - Bachelor
» Gmail: gustavoanatoly@gmail.com
» Skype: gustavoanatoly
» Cell Phone: +55 (83) 9918-1763

#5 Updated by Gustavo Anatoly Fernandes Velasquez Solis over 2 years ago

Sorry, I sent wrong gist link link: https://gist.github.com/4017432

2012/11/5 Gustavo Anatoly gustavoanatoly@gmail.com

Hi, Yusuke.

I was studying this bug and I did a little bit changes to test

git://gist.github.com/4017432.git

because self lock is not allowed on ruby, please correct me if I wrong.

Thanks.

2012/11/5 mame (Yusuke Endoh) mame@tsg.ne.jp

Issue #7134 has been updated by mame (Yusuke Endoh).

Status changed from Open to Assigned
Assignee set to kosaki (Motohiro KOSAKI)

Kosaki-san, do you know anything about this?

Yusuke Endoh mame@tsg.ne.jp

Bug #7134: Signal handling bug in Mac OS X
https://bugs.ruby-lang.org/issues/7134#change-32417

Author: auastro (Andy Kitchen)
Status: Assigned
Priority: Normal
Assignee: kosaki (Motohiro KOSAKI)
Category:
Target version: 1.9.3
ruby -v: ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin12]

On Mac OS X, running the attached program causes the exception below
about 1/10 times it is run.

$ ruby hup.rb
hup.rb:2:in write': deadlock; recursive locking (ThreadError)
from hup.rb:2:in
puts'
from hup.rb:2:in puts'
from hup.rb:2:in
block in '
from hup.rb:6:in call'
from hup.rb:6:in
write'
from hup.rb:6:in puts'
from hup.rb:6:in
puts'
from hup.rb:6:in `'

The expected output is:

In Hup Handler
Finished...

or

Finished...
In Hup Handler

My ruby is compiled with clang:

$ clang --version
Apple clang version 4.1 (tags/Apple/clang-421.11.65) (based on LLVM
3.1svn)
Target: x86_64-apple-darwin12.2.0
Thread model: posix

http://bugs.ruby-lang.org/

» Gustavo Anatoly Fernandes Velásquez Solís
» Information Systems - Bachelor
» Gmail: gustavoanatoly@gmail.com
» Skype: gustavoanatoly
» Cell Phone: +55 (83) 9918-1763

--
» Gustavo Anatoly Fernandes Velásquez Solís
» Information Systems - Bachelor
» Gmail: gustavoanatoly@gmail.com
» Skype: gustavoanatoly
» Cell Phone: +55 (83) 9918-1763

#6 Updated by Motohiro KOSAKI over 2 years ago

  • Target version changed from 1.9.3 to next minor
  • Assignee changed from Motohiro KOSAKI to Nobuyoshi Nakada

Hi,

We are sorry, recursive lock is clearly unacceptable because recursive lock was not designed for trap handler. It lead to memory corruption.
When this issue was occur, C stack trace is below.

4 ruby-- 0x0000000100260061 rb_mutex_lock + 1217 (thread_pthread.c:197)
5 ruby-- 0x00000001002605c8 rb_mutex_synchronize + 24 (thread.c:4286)
6 ruby-- 0x000000010015e4f0 io_binwrite + 512 (io.c:1163)
7 ruby-- 0x000000010016156b io_write + 955 (io.c:1261)
8 ruby-- 0x0000000100254947 vm_call0_body + 743 (vm_eval.c:110)
9 ruby-- 0x000000010024a6c8 rb_funcall + 536 (vm_eval.c:49)
10 ruby-- 0x0000000100160eff rb_io_puts + 159 (io.c:6717)
11 ruby-- 0x0000000100254947 vm_call0_body + 743 (vm_eval.c:110)
12 ruby-- 0x0000000100249f20 rb_funcall2 + 256 (vm_eval.c:750)
13 ruby-- 0x0000000100256c55 vm_call_cfunc + 421 (vm_insnhelper.c:1442)
14 ruby-- 0x000000010025668f vm_call_method + 287 (vm_insnhelper.c:1689)
15 ruby-- 0x0000000100242f8c vm_exec_core + 10060 (insns.def:968)
16 ruby-- 0x000000010024f024 vm_exec + 100 (vm.c:1159)
17 ruby-- 0x000000010024e781 vm_invoke_proc + 161 (vm.c:680)
18 ruby-- 0x000000010024e6d0 rb_vm_invoke_proc + 32 (vm.c:696)
19 ruby-- 0x0000000100145695 proc_call + 117 (proc.c:567)
20 ruby-- 0x0000000100254947 vm_call0_body + 743 (vm_eval.c:110)
21 ruby-- 0x0000000100249f20 rb_funcall2 + 256 (vm_eval.c:750)
22 ruby-- 0x000000010024d486 rb_eval_cmd + 262 (vm_eval.c:1379)
23 ruby-- 0x000000010025c35a rb_threadptr_execute_interrupts_common + 330 (thread.c:1716)
24 ruby-- 0x000000010025cd44 rb_thread_io_blocking_region + 164 (thread_pthread.c:196)
25 ruby-- 0x000000010016fa3c io_binwrite_string + 60 (io.c:1116)
26 ruby-- 0x000000010013fd0d rb_ensure + 109 (eval.c:802)
27 ruby-- 0x000000010015e4f0 io_binwrite + 512 (io.c:1163)
28 ruby-- 0x000000010016156b io_write + 955 (io.c:1261)
29 ruby-- 0x0000000100254947 vm_call0_body + 743 (vm_eval.c:110)
30 ruby-- 0x000000010024a6c8 rb_funcall + 536 (vm_eval.c:49)
31 ruby-- 0x0000000100160eff rb_io_puts + 159 (io.c:6717)
32 ruby-- 0x0000000100254947 vm_call0_body + 743 (vm_eval.c:110)
33 ruby-- 0x0000000100249f20 rb_funcall2 + 256 (vm_eval.c:750)
34 ruby-- 0x0000000100256c55 vm_call_cfunc + 421 (vm_insnhelper.c:1442)
35 ruby-- 0x000000010025668f vm_call_method + 287 (vm_insnhelper.c:1689)
36 ruby-- 0x0000000100242f8c vm_exec_core + 10060 (insns.def:968)
37 ruby-- 0x000000010024f024 vm_exec + 100 (vm.c:1159)
38 ruby-- 0x000000010024f965 rb_iseq_eval_main + 405 (vm.c:1402)
39 ruby-- 0x000000010013efdf ruby_exec_internal + 111 (eval.c:251)
40 ruby-- 0x000000010013ef27 ruby_run_node + 71 (eval.c:303)
41 ruby-- 0x000000010010b17f main + 79 (main.c:36)
42 libdyld.dylib 0x00007fff8997e7e1 start + 1

Now, almost all IO hold a mutex durling io for serialization since r20144 (aka 1.9.2p0). Then, you can't
use any IOs in trap handler.

Ho hum. I think this spec is too user unfriendly and NEWS-1.9.2 seems not explain this incompatibility.
Nobu, Can you please explain current spec, your intention and your recommended trap handler writing style.

#7 Updated by Motohiro KOSAKI over 2 years ago

Note: current documentation of Process.kill() uses puts() in trap handler. So, I personally don't think
that IO in trap handler is invalid code.

/*
* call-seq:
* Process.kill(signal, pid, ...) -> fixnum
*
* Sends the given signal to the specified process id(s) if pid is positive.
* If pid is zero signal is sent to all processes whose group ID is equal
* to the group ID of the process. signal may be an integer signal number or
* a POSIX signal name (either with or without a +SIG+ prefix). If signal is
* negative (or starts with a minus sign), kills process groups instead of
* processes. Not all signals are available on all platforms.
*
* pid = fork do
* Signal.trap("HUP") { puts "Ouch!"; exit }
* # ... do some work ...
* end
* # ...
* Process.kill("HUP", pid)
* Process.wait

#8 Updated by Motohiro KOSAKI over 2 years ago

  • Target version changed from next minor to 2.0.0

sorry, fixed unintentional target version change.

#9 Updated by Gustavo Anatoly Fernandes Velasquez Solis over 2 years ago

Hi, Kosaki.

Firstly, thanks for explain the reasons, and really that memory error was
happening in my tests. I'm interested to understand how to will fix it by
the team, I want learn a little bit more.

Thanks.

2012/11/5 kosaki (Motohiro KOSAKI) kosaki.motohiro@gmail.com

Issue #7134 has been updated by kosaki (Motohiro KOSAKI).

Target version changed from next minor to 2.0.0

sorry, fixed unintentional target version change.


Bug #7134: Signal handling bug in Mac OS X
https://bugs.ruby-lang.org/issues/7134#change-32453

Author: auastro (Andy Kitchen)
Status: Assigned
Priority: Normal
Assignee: nobu (Nobuyoshi Nakada)
Category:
Target version: 2.0.0
ruby -v: ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin12]

On Mac OS X, running the attached program causes the exception below about
1/10 times it is run.

$ ruby hup.rb
hup.rb:2:in write': deadlock; recursive locking (ThreadError)
from hup.rb:2:in
puts'
from hup.rb:2:in puts'
from hup.rb:2:in
block in '
from hup.rb:6:in call'
from hup.rb:6:in
write'
from hup.rb:6:in puts'
from hup.rb:6:in
puts'
from hup.rb:6:in `'

The expected output is:

In Hup Handler
Finished...

or

Finished...
In Hup Handler

My ruby is compiled with clang:

$ clang --version
Apple clang version 4.1 (tags/Apple/clang-421.11.65) (based on LLVM 3.1svn)
Target: x86_64-apple-darwin12.2.0
Thread model: posix

http://bugs.ruby-lang.org/

--
» Gustavo Anatoly Fernandes Velásquez Solís
» Information Systems - Bachelor
» Gmail: gustavoanatoly@gmail.com
» Skype: gustavoanatoly
» Cell Phone: +55 (83) 9918-1763

#10 Updated by Motohiro KOSAKI over 2 years ago

Attached my fixing proposal. This patch masks trap handler invocation during IO lock held.
But, I guess ko1 might dislike this approach because it slow down CHECK_INTS performance....

#11 Updated by Motohiro KOSAKI over 2 years ago

  • Assignee changed from Nobuyoshi Nakada to Koichi Sasada

ko1 suggested to use rb_thread_call_without_gvl2(). but it doesn't have a way to avoid CHECK_INTS completely.
I'm waiting ko1's rb_thread_call_without_gvl2 fix.

#12 Updated by Koichi Sasada over 2 years ago

  • Assignee changed from Koichi Sasada to Motohiro KOSAKI

I fix gvl2!

#13 Updated by Motohiro KOSAKI over 2 years ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r38398.
Andy, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • io.c (internal_write_func2): new helper function for rb_write_internal2().
  • io.c (rb_write_internal2): new function. it uses rb_thread_call_without_gvl2() instaed of rb_thread_io_blocking_region().
  • io.c (rb_binwrite_string): uses rb_write_internal2 instead of rb_write_internal. [Bug #7134]

#14 Updated by Motohiro KOSAKI over 2 years ago

Fixed this by r38395 - r38401 series patches.

#15 Updated by Motohiro KOSAKI over 2 years ago

  • Target version deleted (2.0.0)
  • Assignee changed from Motohiro KOSAKI to Usaku NAKAMURA
  • Status changed from Closed to Assigned
  • Project changed from Ruby trunk to Backport193
  • Tracker changed from Bug to Backport

#16 Updated by Usaku NAKAMURA over 2 years ago

Is this list right?
r38395, r38396, r38397, r38398, r38399, r38400, r38401, r38402, r38422, r38447, r38448 and r38449
Are the fixes really completed?

And another problem.
I've not backported rb_thread_call_without_gvl2() from trunk yet.
I doubt the API is not stable.
Kosaki-san, do you think the orignal problem is very serious and necessary to backport many many changes from trunk?

#17 Updated by Usaku NAKAMURA over 2 years ago

  • Status changed from Assigned to Feedback
  • Assignee changed from Usaku NAKAMURA to Motohiro KOSAKI

#18 Updated by Motohiro KOSAKI over 2 years ago

  • Status changed from Feedback to Closed

Hmmm...
Ok, ok. I give up this claim. I hope nobody uses trap.

Also available in: Atom PDF