Backport #9168

Segfault with Sync_m, Queue, and garbage collection

Added by Patrick Hemmer 5 months ago. Updated 3 months ago.

[ruby-core:58652]
Status:Closed
Priority:High
Assignee:Tomoyuki Chikanaga

Description

=begin
I am able to reliably generate a segmentation fault when using Sync_m & Queue together. The issue appears garbage collection related as well, as if I (({GC.disable})) the issue goes away.

This is the bottom of the output generated by the script and the top of the debug output (the full thing is in the attachment)

...
...
...
Starting 1766
Starting 1767
Starting 1768
Starting 1769
Finalizing 1187
Finalizing 1188
Finalizing 1189
Finalizing 1190
Finalizing 1186
Finalizing 1191
Finalizing 1192
Finalizing 1193
Finalizing 1195
Finalizing 1196
Finalizing 1197
Finalizing 1198
Finalizing 1194
Finalizing 1199
Finalizing 1200
Finalizing 1201
/usr/lib64/ruby/2.0.0/thread.rb:187: [BUG] Segmentation fault
ruby 2.0.0p353 (2013-11-22 revision 43784) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0005 p:0028 s:0016 e:000015 METHOD /usr/lib64/ruby/2.0.0/thread.rb:187
c:0004 p:0083 s:0012 e:000011 BLOCK /tmp/test.rb:22 [FINISH]
c:0003 p:---- s:0008 e:000007 CFUNC :times
c:0002 p:0055 s:0005 E:001f80 EVAL /tmp/test.rb:16 [FINISH]
c:0001 p:0000 s:0002 E:001dc8 TOP [FINISH]

/tmp/test.rb:16:in <main>'
/tmp/test.rb:16:in
times'
/tmp/test.rb:22:in block in <main>'
/usr/lib64/ruby/2.0.0/thread.rb:187:in
pop'

-- C level backtrace information -------------------------------------------
/usr/lib64/libruby20.so.2.0(+0x1c1f3f) [0x7fc0ee2d9f3f] vmdump.c:647
/usr/lib64/libruby20.so.2.0(+0x66a1e) [0x7fc0ee17ea1e] error.c:283
/usr/lib64/libruby20.so.2.0(rb
bug+0xe7) [0x7fc0ee17eb3b] error.c:302
/usr/lib64/libruby20.so.2.0(+0x13a406) [0x7fc0ee252406] signal.c:672
/lib64/libc.so.6(+0x38270) [0x7fc0edda0270]
/usr/lib64/libruby20.so.2.0(+0x1bdf17) [0x7fc0ee2d5f17] vm.c:1234
/usr/lib64/libruby20.so.2.0(+0x1bc767) [0x7fc0ee2d4767] vm.c:648
/usr/lib64/libruby20.so.2.0(+0x1bc896) [0x7fc0ee2d4896] vm.c:679
/usr/lib64/libruby20.so.2.0(+0x1b8e67) [0x7fc0ee2d0e67] vmeval.c:930
/usr/lib64/libruby20.so.2.0(rb
yield+0x38) [0x7fc0ee2d0ea1] vmeval.c:940
/usr/lib64/libruby20.so.2.0(+0xc1d82) [0x7fc0ee1d9d82] numeric.c:3588
/usr/lib64/libruby20.so.2.0(+0x1a91a5) [0x7fc0ee2c11a5] vm
insnhelper.c:1336
/usr/lib64/libruby20.so.2.0(+0x1a9d2a) [0x7fc0ee2c1d2a] vminsnhelper.c:1474
/usr/lib64/libruby20.so.2.0(+0x1a9e37) [0x7fc0ee2c1e37] vm
insnhelper.c:1564
/usr/lib64/libruby20.so.2.0(+0x1aa898) [0x7fc0ee2c2898] vminsnhelper.c:1758
/usr/lib64/libruby20.so.2.0(+0x1ab0e5) [0x7fc0ee2c30e5] vm
insnhelper.c:1916
/usr/lib64/libruby20.so.2.0(+0x1ae8b9) [0x7fc0ee2c68b9] insns.def:1002
/usr/lib64/libruby20.so.2.0(+0x1bdcd9) [0x7fc0ee2d5cd9] vm.c:1201
/usr/lib64/libruby20.so.2.0(rbiseqevalmain+0x34) [0x7fc0ee2d6b22] vm.c:1449
/usr/lib64/libruby20.so.2.0(+0x6c4dc) [0x7fc0ee1844dc] eval.c:250
/usr/lib64/libruby20.so.2.0(ruby
execnode+0x24) [0x7fc0ee1845f5] eval.c:315
/usr/lib64/libruby20.so.2.0(ruby
runnode+0x3e) [0x7fc0ee1845c8] eval.c:307
ruby() [0x400ac9]
/lib64/libc.so.6(
libcstart_main+0xf5) [0x7fc0edd8c6f5]
ruby() [0x400969]

And this is the script which can reproduce the issue:

#!/usr/bin/ruby
require 'thread'
require 'sync'

def finalizer(queues, i)
proc do
puts "Finalizing #{i}"
queues.synchronize(:EX) do
queues.delete(i)
end
end
end

queues = {}
queues.extend(Syncm)
10000.times do |i|
puts "Starting #{i}"
queue = Queue.new
queues[i] = queue
ObjectSpace.define
finalizer(Object.new, finalizer(queues, i))
queue << nil
queue.pop
end

This was as simple a test case as I could create. In the finalizer, if you just call (({queues.delete()})) without the (({queues.synchronize()})), the issue does not occur.

The issue appears new to 2.0.0. I could not duplicate it with 1.9.3p448
=end

ruby-segfault (9.25 KB) Patrick Hemmer, 11/28/2013 11:17 AM

backport_r44260_to_ruby_2_0_0.patch Magnifier (2.22 KB) Tomoyuki Chikanaga, 12/21/2013 10:42 PM

Associated revisions

Revision 44576
Added by Tomoyuki Chikanaga 3 months ago

merge revision(s) 44260: [Backport #9168]

patch inspired from r44260 on trunk. [Bug #9168]

  • gc.c (run_finalizer): clear th->errinfo before invoke finalizer and
    restore afterward.

  • test/ruby/testgc.rb (testexceptioninfinalizer): add test for
    above.

  • vmtrace.c (rbthreadptrexeceventhooksorig): clear th->errinfo
    before invoke hook.

History

#1 Updated by Masaki Matsushita 5 months ago

  • Category set to core
  • Status changed from Open to Assigned
  • Target version set to 2.1.0
  • Backport changed from 1.9.3: UNKNOWN, 2.0.0: UNKNOWN to 1.9.3: DONTNEED, 2.0.0: REQUIRED

I wrote minimal code.

raise_proc = proc do
raise
end

10000.times do
ObjectSpace.definefinalizer(Object.new, raiseproc)
Thread.handleinterrupt(RuntimeError => :immediate) do # SEGV is caused in :onblocking and :never too.
break
end
end

#2 Updated by Masaya Tarui 5 months ago

  • Status changed from Assigned to Open
  • Priority changed from Normal to High

I think that this is serious.

i checked what happen at error case.
1. By break, throwobj is set to th->errinfo in throw RubyVM command.
2. In rbthreadshandleinterrupt, trap(TAG) throwobj, and call finalizer via RUBYVMCHECKINTS(th) before JUMPTAG.
3. In finalizer, th->errinfo is used by raise.

#3 Updated by Matthias Hecker 5 months ago

I'm writing because I think I encountered that bug in the ruby IRC bot rbot: http://paste.geekosphere.org/BZ
The stacktrace looks very similar, the crash only happens after a few hours.

rbot is using Queue (and Mutex to synchronize).

I can reproduce the bug with the script provided. Also using 2.0.0p353, the bot works fine in 1.9.3.

#4 Updated by Nobuyoshi Nakada 4 months ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r44260.
Patrick, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


vm_trace.c: isolate exceptions

  • vmtrace.c (rbpostponedjobflush): isolate exceptions in postponed jobs and restore outer ones. based on a patch by tarui. [Bug #9168]

#5 Updated by Tomoyuki Chikanaga 4 months ago

  • Tracker changed from Bug to Backport
  • Project changed from ruby-trunk to Backport200
  • Category deleted (core)
  • Status changed from Closed to Assigned
  • Assignee set to Tomoyuki Chikanaga
  • Target version deleted (2.1.0)

#6 Updated by Tomoyuki Chikanaga 4 months ago

Because ruby200 don't has postponedjob, r44260 cannot be backported to ruby200 cleanly.
I've made equivalent patch for ruby
200.
nobu, please review it.

Thanks in advance.

#7 Updated by Nobuyoshi Nakada 4 months ago

Why are rbseterrinfo and direct assignment mixed?

#8 Updated by Tomoyuki Chikanaga 4 months ago

Thank you for your reviewing.

Why are rbseterrinfo and direct assignment mixed?

Because rbseterrinfo() raise error when argument is Fixnum and state is TAGBREAD, TAGFATAL etc.

#9 Updated by Tomoyuki Chikanaga 3 months ago

  • Status changed from Assigned to Closed

I've committed my patch at r44576 with some modification with ChangeLog

Also available in: Atom PDF