Backport #9168

Segfault with Sync_m, Queue, and garbage collection

Added by Patrick Hemmer over 1 year ago. Updated about 1 year ago.

[ruby-core:58652]
Status:Closed
Priority:High
Assignee:Tomoyuki Chikanaga

Description

=begin
I am able to reliably generate a segmentation fault when using Sync_m & Queue together. The issue appears garbage collection related as well, as if I (({GC.disable})) the issue goes away.

This is the bottom of the output generated by the script and the top of the debug output (the full thing is in the attachment)

...
...
...
Starting 1766
Starting 1767
Starting 1768
Starting 1769
Finalizing 1187
Finalizing 1188
Finalizing 1189
Finalizing 1190
Finalizing 1186
Finalizing 1191
Finalizing 1192
Finalizing 1193
Finalizing 1195
Finalizing 1196
Finalizing 1197
Finalizing 1198
Finalizing 1194
Finalizing 1199
Finalizing 1200
Finalizing 1201
/usr/lib64/ruby/2.0.0/thread.rb:187: [BUG] Segmentation fault
ruby 2.0.0p353 (2013-11-22 revision 43784) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0005 p:0028 s:0016 e:000015 METHOD /usr/lib64/ruby/2.0.0/thread.rb:187
c:0004 p:0083 s:0012 e:000011 BLOCK /tmp/test.rb:22 [FINISH]
c:0003 p:---- s:0008 e:000007 CFUNC :times
c:0002 p:0055 s:0005 E:001f80 EVAL /tmp/test.rb:16 [FINISH]
c:0001 p:0000 s:0002 E:001dc8 TOP [FINISH]

/tmp/test.rb:16:in <main>'
/tmp/test.rb:16:in
times'
/tmp/test.rb:22:in block in <main>'
/usr/lib64/ruby/2.0.0/thread.rb:187:in
pop'

-- C level backtrace information -------------------------------------------
/usr/lib64/libruby20.so.2.0(+0x1c1f3f) [0x7fc0ee2d9f3f] vm_dump.c:647
/usr/lib64/libruby20.so.2.0(+0x66a1e) [0x7fc0ee17ea1e] error.c:283
/usr/lib64/libruby20.so.2.0(rb_bug+0xe7) [0x7fc0ee17eb3b] error.c:302
/usr/lib64/libruby20.so.2.0(+0x13a406) [0x7fc0ee252406] signal.c:672
/lib64/libc.so.6(+0x38270) [0x7fc0edda0270]
/usr/lib64/libruby20.so.2.0(+0x1bdf17) [0x7fc0ee2d5f17] vm.c:1234
/usr/lib64/libruby20.so.2.0(+0x1bc767) [0x7fc0ee2d4767] vm.c:648
/usr/lib64/libruby20.so.2.0(+0x1bc896) [0x7fc0ee2d4896] vm.c:679
/usr/lib64/libruby20.so.2.0(+0x1b8e67) [0x7fc0ee2d0e67] vm_eval.c:930
/usr/lib64/libruby20.so.2.0(rb_yield+0x38) [0x7fc0ee2d0ea1] vm_eval.c:940
/usr/lib64/libruby20.so.2.0(+0xc1d82) [0x7fc0ee1d9d82] numeric.c:3588
/usr/lib64/libruby20.so.2.0(+0x1a91a5) [0x7fc0ee2c11a5] vm_insnhelper.c:1336
/usr/lib64/libruby20.so.2.0(+0x1a9d2a) [0x7fc0ee2c1d2a] vm_insnhelper.c:1474
/usr/lib64/libruby20.so.2.0(+0x1a9e37) [0x7fc0ee2c1e37] vm_insnhelper.c:1564
/usr/lib64/libruby20.so.2.0(+0x1aa898) [0x7fc0ee2c2898] vm_insnhelper.c:1758
/usr/lib64/libruby20.so.2.0(+0x1ab0e5) [0x7fc0ee2c30e5] vm_insnhelper.c:1916
/usr/lib64/libruby20.so.2.0(+0x1ae8b9) [0x7fc0ee2c68b9] insns.def:1002
/usr/lib64/libruby20.so.2.0(+0x1bdcd9) [0x7fc0ee2d5cd9] vm.c:1201
/usr/lib64/libruby20.so.2.0(rb_iseq_eval_main+0x34) [0x7fc0ee2d6b22] vm.c:1449
/usr/lib64/libruby20.so.2.0(+0x6c4dc) [0x7fc0ee1844dc] eval.c:250
/usr/lib64/libruby20.so.2.0(ruby_exec_node+0x24) [0x7fc0ee1845f5] eval.c:315
/usr/lib64/libruby20.so.2.0(ruby_run_node+0x3e) [0x7fc0ee1845c8] eval.c:307
ruby() [0x400ac9]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7fc0edd8c6f5]
ruby() [0x400969]

And this is the script which can reproduce the issue:

#!/usr/bin/ruby
require 'thread'
require 'sync'

def finalizer(queues, i)
proc do
puts "Finalizing #{i}"
queues.synchronize(:EX) do
queues.delete(i)
end
end
end

queues = {}
queues.extend(Sync_m)
10000.times do |i|
puts "Starting #{i}"
queue = Queue.new
queues[i] = queue
ObjectSpace.define_finalizer(Object.new, finalizer(queues, i))
queue << nil
queue.pop
end

This was as simple a test case as I could create. In the finalizer, if you just call (({queues.delete()})) without the (({queues.synchronize()})), the issue does not occur.

The issue appears new to 2.0.0. I could not duplicate it with 1.9.3p448
=end

ruby-segfault (9.25 KB) Patrick Hemmer, 11/28/2013 11:17 AM

backport_r44260_to_ruby_2_0_0.patch Magnifier (2.22 KB) Tomoyuki Chikanaga, 12/21/2013 10:42 PM

Associated revisions

Revision 44260
Added by Nobuyoshi Nakada over 1 year ago

vm_trace.c: isolate exceptions

  • vm_trace.c (rb_postponed_job_flush): isolate exceptions in postponed jobs and restore outer ones. based on a patch by tarui. [Bug #9168]

Revision 44576
Added by Tomoyuki Chikanaga about 1 year ago

merge revision(s) 44260: [Backport #9168]

patch inspired from r44260 on trunk. [Bug #9168]

  • gc.c (run_finalizer): clear th->errinfo before invoke finalizer and
    restore afterward.

  • test/ruby/test_gc.rb (test_exception_in_finalizer): add test for
    above.

  • vm_trace.c (rb_threadptr_exec_event_hooks_orig): clear th->errinfo
    before invoke hook.

History

#1 Updated by Masaki Matsushita over 1 year ago

  • Backport changed from 1.9.3: UNKNOWN, 2.0.0: UNKNOWN to 1.9.3: DONTNEED, 2.0.0: REQUIRED
  • Target version set to 2.1.0
  • Status changed from Open to Assigned
  • Category set to core

I wrote minimal code.

raise_proc = proc do
raise
end

10000.times do
ObjectSpace.define_finalizer(Object.new, raise_proc)
Thread.handle_interrupt(RuntimeError => :immediate) do # SEGV is caused in :on_blocking and :never too.
break
end
end

#2 Updated by Masaya Tarui over 1 year ago

  • Status changed from Assigned to Open
  • Priority changed from Normal to High

I think that this is serious.

i checked what happen at error case.
1. By break, throwobj is set to th->errinfo in throw RubyVM command.
2. In rb_thread_s_handle_interrupt, trap(TAG) throwobj, and call finalizer via RUBY_VM_CHECK_INTS(th) before JUMP_TAG.
3. In finalizer, th->errinfo is used by raise.

#3 Updated by Matthias Hecker over 1 year ago

I'm writing because I think I encountered that bug in the ruby IRC bot rbot: http://paste.geekosphere.org/BZ
The stacktrace looks very similar, the crash only happens after a few hours.

rbot is using Queue (and Mutex to synchronize).

I can reproduce the bug with the script provided. Also using 2.0.0p353, the bot works fine in 1.9.3.

#4 Updated by Nobuyoshi Nakada over 1 year ago

  • % Done changed from 0 to 100
  • Status changed from Open to Closed

This issue was solved with changeset r44260.
Patrick, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


vm_trace.c: isolate exceptions

  • vm_trace.c (rb_postponed_job_flush): isolate exceptions in postponed jobs and restore outer ones. based on a patch by tarui. [Bug #9168]

#5 Updated by Tomoyuki Chikanaga over 1 year ago

  • Target version deleted (2.1.0)
  • Assignee set to Tomoyuki Chikanaga
  • Status changed from Closed to Assigned
  • Category deleted (core)
  • Project changed from Ruby trunk to Backport200
  • Tracker changed from Bug to Backport

#6 Updated by Tomoyuki Chikanaga over 1 year ago

Because ruby_2_0_0 don't has postponed_job, r44260 cannot be backported to ruby_2_0_0 cleanly.
I've made equivalent patch for ruby_2_0_0.
nobu, please review it.

Thanks in advance.

#7 Updated by Nobuyoshi Nakada over 1 year ago

Why are rb_set_errinfo and direct assignment mixed?

#8 Updated by Tomoyuki Chikanaga over 1 year ago

Thank you for your reviewing.

Why are rb_set_errinfo and direct assignment mixed?

Because rb_set_errinfo() raise error when argument is Fixnum and state is TAG_BREAD, TAG_FATAL etc.

#9 Updated by Tomoyuki Chikanaga about 1 year ago

  • Status changed from Assigned to Closed

I've committed my patch at r44576 with some modification with ChangeLog

Also available in: Atom PDF