Backport #7134

Signal handling bug in Mac OS X

Added by Andy Kitchen over 1 year ago. Updated over 1 year ago.

[ruby-core:47880]
Status:Closed
Priority:Normal
Assignee:Motohiro KOSAKI

Description

On Mac OS X, running the attached program causes the exception below about 1/10 times it is run.

$ ruby hup.rb
hup.rb:2:in write': deadlock; recursive locking (ThreadError)
from hup.rb:2:in
puts'
from hup.rb:2:in puts'
from hup.rb:2:in
block in '
from hup.rb:6:in call'
from hup.rb:6:in
write'
from hup.rb:6:in puts'
from hup.rb:6:in
puts'
from hup.rb:6:in `'

The expected output is:

In Hup Handler
Finished...

or

Finished...
In Hup Handler

My ruby is compiled with clang:

$ clang --version
Apple clang version 4.1 (tags/Apple/clang-421.11.65) (based on LLVM 3.1svn)
Target: x86_64-apple-darwin12.2.0
Thread model: posix

hup.rb Magnifier - execution of this script causes strange behaviour (94 Bytes) Andy Kitchen, 10/10/2012 05:36 PM

mutex_notrap.patch Magnifier (4.32 KB) Motohiro KOSAKI, 11/06/2012 04:40 AM

History

#1 Updated by Gustavo Anatoly Fernandes Velasquez Solis over 1 year ago

This error it happens when kill is called:

https://gist.github.com/3906937

Followed by:

https://gist.github.com/3906907

But I'm trying to understand the bug causes.

#2 Updated by Gustavo Anatoly Fernandes Velasquez Solis over 1 year ago

Hi,

Could somenone please, explain if is possible use recursive lock (PTHREADMUTEXRECURSIVE) attributes in pthread.h, to fix this bug?

#3 Updated by Yusuke Endoh over 1 year ago

  • Status changed from Open to Assigned
  • Assignee set to Motohiro KOSAKI

Kosaki-san, do you know anything about this?

Yusuke Endoh mame@tsg.ne.jp

#4 Updated by Gustavo Anatoly Fernandes Velasquez Solis over 1 year ago

Hi, Yusuke.

I was studying this bug and I did a little bit changes to test

git://gist.github.com/4017432.git

because self lock is not allowed on ruby, please correct me if I wrong.

Thanks.

2012/11/5 mame (Yusuke Endoh) mame@tsg.ne.jp

Issue #7134 has been updated by mame (Yusuke Endoh).

Status changed from Open to Assigned
Assignee set to kosaki (Motohiro KOSAKI)

Kosaki-san, do you know anything about this?

Yusuke Endoh mame@tsg.ne.jp

Bug #7134: Signal handling bug in Mac OS X
https://bugs.ruby-lang.org/issues/7134#change-32417

Author: auastro (Andy Kitchen)
Status: Assigned
Priority: Normal
Assignee: kosaki (Motohiro KOSAKI)
Category:
Target version: 1.9.3
ruby -v: ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin12]

On Mac OS X, running the attached program causes the exception below about
1/10 times it is run.

$ ruby hup.rb
hup.rb:2:in write': deadlock; recursive locking (ThreadError)
from hup.rb:2:in
puts'
from hup.rb:2:in puts'
from hup.rb:2:in
block in '
from hup.rb:6:in call'
from hup.rb:6:in
write'
from hup.rb:6:in puts'
from hup.rb:6:in
puts'
from hup.rb:6:in `'

The expected output is:

In Hup Handler
Finished...

or

Finished...
In Hup Handler

My ruby is compiled with clang:

$ clang --version
Apple clang version 4.1 (tags/Apple/clang-421.11.65) (based on LLVM 3.1svn)
Target: x86_64-apple-darwin12.2.0
Thread model: posix

http://bugs.ruby-lang.org/

--
» Gustavo Anatoly Fernandes Velásquez Solís
» Information Systems - Bachelor
» Gmail: gustavoanatoly@gmail.com
» Skype: gustavoanatoly
» Cell Phone: +55 (83) 9918-1763

#5 Updated by Gustavo Anatoly Fernandes Velasquez Solis over 1 year ago

Sorry, I sent wrong gist link link: https://gist.github.com/4017432

2012/11/5 Gustavo Anatoly gustavoanatoly@gmail.com

Hi, Yusuke.

I was studying this bug and I did a little bit changes to test

git://gist.github.com/4017432.git

because self lock is not allowed on ruby, please correct me if I wrong.

Thanks.

2012/11/5 mame (Yusuke Endoh) mame@tsg.ne.jp

Issue #7134 has been updated by mame (Yusuke Endoh).

Status changed from Open to Assigned
Assignee set to kosaki (Motohiro KOSAKI)

Kosaki-san, do you know anything about this?

Yusuke Endoh mame@tsg.ne.jp

Bug #7134: Signal handling bug in Mac OS X
https://bugs.ruby-lang.org/issues/7134#change-32417

Author: auastro (Andy Kitchen)
Status: Assigned
Priority: Normal
Assignee: kosaki (Motohiro KOSAKI)
Category:
Target version: 1.9.3
ruby -v: ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin12]

On Mac OS X, running the attached program causes the exception below
about 1/10 times it is run.

$ ruby hup.rb
hup.rb:2:in write': deadlock; recursive locking (ThreadError)
from hup.rb:2:in
puts'
from hup.rb:2:in puts'
from hup.rb:2:in
block in '
from hup.rb:6:in call'
from hup.rb:6:in
write'
from hup.rb:6:in puts'
from hup.rb:6:in
puts'
from hup.rb:6:in `'

The expected output is:

In Hup Handler
Finished...

or

Finished...
In Hup Handler

My ruby is compiled with clang:

$ clang --version
Apple clang version 4.1 (tags/Apple/clang-421.11.65) (based on LLVM
3.1svn)
Target: x86_64-apple-darwin12.2.0
Thread model: posix

http://bugs.ruby-lang.org/

» Gustavo Anatoly Fernandes Velásquez Solís
» Information Systems - Bachelor
» Gmail: gustavoanatoly@gmail.com
» Skype: gustavoanatoly
» Cell Phone: +55 (83) 9918-1763

--
» Gustavo Anatoly Fernandes Velásquez Solís
» Information Systems - Bachelor
» Gmail: gustavoanatoly@gmail.com
» Skype: gustavoanatoly
» Cell Phone: +55 (83) 9918-1763

#6 Updated by Motohiro KOSAKI over 1 year ago

  • Assignee changed from Motohiro KOSAKI to Nobuyoshi Nakada
  • Target version changed from 1.9.3 to next minor

Hi,

We are sorry, recursive lock is clearly unacceptable because recursive lock was not designed for trap handler. It lead to memory corruption.
When this issue was occur, C stack trace is below.

4 ruby-- 0x0000000100260061 rbmutexlock + 1217 (threadpthread.c:197)
5 ruby-- 0x00000001002605c8 rb
mutexsynchronize + 24 (thread.c:4286)
6 ruby-- 0x000000010015e4f0 io
binwrite + 512 (io.c:1163)
7 ruby-- 0x000000010016156b iowrite + 955 (io.c:1261)
8 ruby-- 0x0000000100254947 vm
call0body + 743 (vmeval.c:110)
9 ruby-- 0x000000010024a6c8 rbfuncall + 536 (vmeval.c:49)
10 ruby-- 0x0000000100160eff rbioputs + 159 (io.c:6717)
11 ruby-- 0x0000000100254947 vmcall0body + 743 (vmeval.c:110)
12 ruby-- 0x0000000100249f20 rb
funcall2 + 256 (vmeval.c:750)
13 ruby-- 0x0000000100256c55 vm
callcfunc + 421 (vminsnhelper.c:1442)
14 ruby-- 0x000000010025668f vmcallmethod + 287 (vminsnhelper.c:1689)
15 ruby-- 0x0000000100242f8c vm
execcore + 10060 (insns.def:968)
16 ruby-- 0x000000010024f024 vm
exec + 100 (vm.c:1159)
17 ruby-- 0x000000010024e781 vminvokeproc + 161 (vm.c:680)
18 ruby-- 0x000000010024e6d0 rbvminvokeproc + 32 (vm.c:696)
19 ruby-- 0x0000000100145695 proc
call + 117 (proc.c:567)
20 ruby-- 0x0000000100254947 vmcall0body + 743 (vmeval.c:110)
21 ruby-- 0x0000000100249f20 rb
funcall2 + 256 (vmeval.c:750)
22 ruby-- 0x000000010024d486 rb
evalcmd + 262 (vmeval.c:1379)
23 ruby-- 0x000000010025c35a rbthreadptrexecuteinterruptscommon + 330 (thread.c:1716)
24 ruby-- 0x000000010025cd44 rbthreadioblockingregion + 164 (threadpthread.c:196)
25 ruby-- 0x000000010016fa3c io
binwritestring + 60 (io.c:1116)
26 ruby-- 0x000000010013fd0d rb
ensure + 109 (eval.c:802)
27 ruby-- 0x000000010015e4f0 iobinwrite + 512 (io.c:1163)
28 ruby-- 0x000000010016156b io
write + 955 (io.c:1261)
29 ruby-- 0x0000000100254947 vmcall0body + 743 (vmeval.c:110)
30 ruby-- 0x000000010024a6c8 rb
funcall + 536 (vmeval.c:49)
31 ruby-- 0x0000000100160eff rb
ioputs + 159 (io.c:6717)
32 ruby-- 0x0000000100254947 vm
call0body + 743 (vmeval.c:110)
33 ruby-- 0x0000000100249f20 rbfuncall2 + 256 (vmeval.c:750)
34 ruby-- 0x0000000100256c55 vmcallcfunc + 421 (vminsnhelper.c:1442)
35 ruby-- 0x000000010025668f vm
callmethod + 287 (vminsnhelper.c:1689)
36 ruby-- 0x0000000100242f8c vmexeccore + 10060 (insns.def:968)
37 ruby-- 0x000000010024f024 vmexec + 100 (vm.c:1159)
38 ruby-- 0x000000010024f965 rb
iseqevalmain + 405 (vm.c:1402)
39 ruby-- 0x000000010013efdf rubyexecinternal + 111 (eval.c:251)
40 ruby-- 0x000000010013ef27 rubyrunnode + 71 (eval.c:303)
41 ruby-- 0x000000010010b17f main + 79 (main.c:36)
42 libdyld.dylib 0x00007fff8997e7e1 start + 1

Now, almost all IO hold a mutex durling io for serialization since r20144 (aka 1.9.2p0). Then, you can't
use any IOs in trap handler.

Ho hum. I think this spec is too user unfriendly and NEWS-1.9.2 seems not explain this incompatibility.
Nobu, Can you please explain current spec, your intention and your recommended trap handler writing style.

#7 Updated by Motohiro KOSAKI over 1 year ago

Note: current documentation of Process.kill() uses puts() in trap handler. So, I personally don't think
that IO in trap handler is invalid code.

/*
* call-seq:
* Process.kill(signal, pid, ...) -> fixnum
*
* Sends the given signal to the specified process id(s) if pid is positive.
* If pid is zero signal is sent to all processes whose group ID is equal
* to the group ID of the process. signal may be an integer signal number or
* a POSIX signal name (either with or without a +SIG+ prefix). If signal is
* negative (or starts with a minus sign), kills process groups instead of
* processes. Not all signals are available on all platforms.
*
* pid = fork do
* Signal.trap("HUP") { puts "Ouch!"; exit }
* # ... do some work ...
* end
* # ...
* Process.kill("HUP", pid)
* Process.wait

#8 Updated by Motohiro KOSAKI over 1 year ago

  • Target version changed from next minor to 2.0.0

sorry, fixed unintentional target version change.

#9 Updated by Gustavo Anatoly Fernandes Velasquez Solis over 1 year ago

Hi, Kosaki.

Firstly, thanks for explain the reasons, and really that memory error was
happening in my tests. I'm interested to understand how to will fix it by
the team, I want learn a little bit more.

Thanks.

2012/11/5 kosaki (Motohiro KOSAKI) kosaki.motohiro@gmail.com

Issue #7134 has been updated by kosaki (Motohiro KOSAKI).

Target version changed from next minor to 2.0.0

sorry, fixed unintentional target version change.


Bug #7134: Signal handling bug in Mac OS X
https://bugs.ruby-lang.org/issues/7134#change-32453

Author: auastro (Andy Kitchen)
Status: Assigned
Priority: Normal
Assignee: nobu (Nobuyoshi Nakada)
Category:
Target version: 2.0.0
ruby -v: ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin12]

On Mac OS X, running the attached program causes the exception below about
1/10 times it is run.

$ ruby hup.rb
hup.rb:2:in write': deadlock; recursive locking (ThreadError)
from hup.rb:2:in
puts'
from hup.rb:2:in puts'
from hup.rb:2:in
block in '
from hup.rb:6:in call'
from hup.rb:6:in
write'
from hup.rb:6:in puts'
from hup.rb:6:in
puts'
from hup.rb:6:in `'

The expected output is:

In Hup Handler
Finished...

or

Finished...
In Hup Handler

My ruby is compiled with clang:

$ clang --version
Apple clang version 4.1 (tags/Apple/clang-421.11.65) (based on LLVM 3.1svn)
Target: x86_64-apple-darwin12.2.0
Thread model: posix

http://bugs.ruby-lang.org/

--
» Gustavo Anatoly Fernandes Velásquez Solís
» Information Systems - Bachelor
» Gmail: gustavoanatoly@gmail.com
» Skype: gustavoanatoly
» Cell Phone: +55 (83) 9918-1763

#10 Updated by Motohiro KOSAKI over 1 year ago

Attached my fixing proposal. This patch masks trap handler invocation during IO lock held.
But, I guess ko1 might dislike this approach because it slow down CHECK_INTS performance....

#11 Updated by Motohiro KOSAKI over 1 year ago

  • Assignee changed from Nobuyoshi Nakada to Koichi Sasada

ko1 suggested to use rbthreadcallwithoutgvl2(). but it doesn't have a way to avoid CHECKINTS completely.
I'm waiting ko1's rb
threadcallwithout_gvl2 fix.

#12 Updated by Koichi Sasada over 1 year ago

  • Assignee changed from Koichi Sasada to Motohiro KOSAKI

I fix gvl2!

#13 Updated by Motohiro KOSAKI over 1 year ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r38398.
Andy, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • io.c (internalwritefunc2): new helper function for rbwriteinternal2().
  • io.c (rbwriteinternal2): new function. it uses rbthreadcallwithoutgvl2() instaed of rbthreadioblockingregion().
  • io.c (rbbinwritestring): uses rbwriteinternal2 instead of rbwriteinternal. [Bug #7134]

#14 Updated by Motohiro KOSAKI over 1 year ago

Fixed this by r38395 - r38401 series patches.

#15 Updated by Motohiro KOSAKI over 1 year ago

  • Tracker changed from Bug to Backport
  • Project changed from ruby-trunk to Backport93
  • Status changed from Closed to Assigned
  • Assignee changed from Motohiro KOSAKI to Usaku NAKAMURA
  • Target version deleted (2.0.0)

#16 Updated by Usaku NAKAMURA over 1 year ago

Is this list right?
r38395, r38396, r38397, r38398, r38399, r38400, r38401, r38402, r38422, r38447, r38448 and r38449
Are the fixes really completed?

And another problem.
I've not backported rbthreadcallwithoutgvl2() from trunk yet.
I doubt the API is not stable.
Kosaki-san, do you think the orignal problem is very serious and necessary to backport many many changes from trunk?

#17 Updated by Usaku NAKAMURA over 1 year ago

  • Status changed from Assigned to Feedback
  • Assignee changed from Usaku NAKAMURA to Motohiro KOSAKI

#18 Updated by Motohiro KOSAKI over 1 year ago

  • Status changed from Feedback to Closed

Hmmm...
Ok, ok. I give up this claim. I hope nobody uses trap.

Also available in: Atom PDF