Project

General

Profile

Actions

Backport #9168

closed

Segfault with Sync_m, Queue, and garbage collection

Added by phemmer (Patrick Hemmer) over 10 years ago. Updated over 10 years ago.


Description

=begin
I am able to reliably generate a segmentation fault when using Sync_m & Queue together. The issue appears garbage collection related as well, as if I (({GC.disable})) the issue goes away.

This is the bottom of the output generated by the script and the top of the debug output (the full thing is in the attachment)

...
...
...
Starting 1766
Starting 1767
Starting 1768
Starting 1769
Finalizing 1187
Finalizing 1188
Finalizing 1189
Finalizing 1190
Finalizing 1186
Finalizing 1191
Finalizing 1192
Finalizing 1193
Finalizing 1195
Finalizing 1196
Finalizing 1197
Finalizing 1198
Finalizing 1194
Finalizing 1199
Finalizing 1200
Finalizing 1201
/usr/lib64/ruby/2.0.0/thread.rb:187: [BUG] Segmentation fault
ruby 2.0.0p353 (2013-11-22 revision 43784) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0005 p:0028 s:0016 e:000015 METHOD /usr/lib64/ruby/2.0.0/thread.rb:187
c:0004 p:0083 s:0012 e:000011 BLOCK /tmp/test.rb:22 [FINISH]
c:0003 p:---- s:0008 e:000007 CFUNC :times
c:0002 p:0055 s:0005 E:001f80 EVAL /tmp/test.rb:16 [FINISH]
c:0001 p:0000 s:0002 E:001dc8 TOP [FINISH]

/tmp/test.rb:16:in <main>' /tmp/test.rb:16:in times'
/tmp/test.rb:22:in block in <main>' /usr/lib64/ruby/2.0.0/thread.rb:187:in pop'

-- C level backtrace information -------------------------------------------
/usr/lib64/libruby20.so.2.0(+0x1c1f3f) [0x7fc0ee2d9f3f] vm_dump.c:647
/usr/lib64/libruby20.so.2.0(+0x66a1e) [0x7fc0ee17ea1e] error.c:283
/usr/lib64/libruby20.so.2.0(rb_bug+0xe7) [0x7fc0ee17eb3b] error.c:302
/usr/lib64/libruby20.so.2.0(+0x13a406) [0x7fc0ee252406] signal.c:672
/lib64/libc.so.6(+0x38270) [0x7fc0edda0270]
/usr/lib64/libruby20.so.2.0(+0x1bdf17) [0x7fc0ee2d5f17] vm.c:1234
/usr/lib64/libruby20.so.2.0(+0x1bc767) [0x7fc0ee2d4767] vm.c:648
/usr/lib64/libruby20.so.2.0(+0x1bc896) [0x7fc0ee2d4896] vm.c:679
/usr/lib64/libruby20.so.2.0(+0x1b8e67) [0x7fc0ee2d0e67] vm_eval.c:930
/usr/lib64/libruby20.so.2.0(rb_yield+0x38) [0x7fc0ee2d0ea1] vm_eval.c:940
/usr/lib64/libruby20.so.2.0(+0xc1d82) [0x7fc0ee1d9d82] numeric.c:3588
/usr/lib64/libruby20.so.2.0(+0x1a91a5) [0x7fc0ee2c11a5] vm_insnhelper.c:1336
/usr/lib64/libruby20.so.2.0(+0x1a9d2a) [0x7fc0ee2c1d2a] vm_insnhelper.c:1474
/usr/lib64/libruby20.so.2.0(+0x1a9e37) [0x7fc0ee2c1e37] vm_insnhelper.c:1564
/usr/lib64/libruby20.so.2.0(+0x1aa898) [0x7fc0ee2c2898] vm_insnhelper.c:1758
/usr/lib64/libruby20.so.2.0(+0x1ab0e5) [0x7fc0ee2c30e5] vm_insnhelper.c:1916
/usr/lib64/libruby20.so.2.0(+0x1ae8b9) [0x7fc0ee2c68b9] insns.def:1002
/usr/lib64/libruby20.so.2.0(+0x1bdcd9) [0x7fc0ee2d5cd9] vm.c:1201
/usr/lib64/libruby20.so.2.0(rb_iseq_eval_main+0x34) [0x7fc0ee2d6b22] vm.c:1449
/usr/lib64/libruby20.so.2.0(+0x6c4dc) [0x7fc0ee1844dc] eval.c:250
/usr/lib64/libruby20.so.2.0(ruby_exec_node+0x24) [0x7fc0ee1845f5] eval.c:315
/usr/lib64/libruby20.so.2.0(ruby_run_node+0x3e) [0x7fc0ee1845c8] eval.c:307
ruby() [0x400ac9]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7fc0edd8c6f5]
ruby() [0x400969]

And this is the script which can reproduce the issue:

#!/usr/bin/ruby
require 'thread'
require 'sync'

def finalizer(queues, i)
proc do
puts "Finalizing #{i}"
queues.synchronize(:EX) do
queues.delete(i)
end
end
end

queues = {}
queues.extend(Sync_m)
10000.times do |i|
puts "Starting #{i}"
queue = Queue.new
queues[i] = queue
ObjectSpace.define_finalizer(Object.new, finalizer(queues, i))
queue << nil
queue.pop
end

This was as simple a test case as I could create. In the finalizer, if you just call (({queues.delete()})) without the (({queues.synchronize()})), the issue does not occur.

The issue appears new to 2.0.0. I could not duplicate it with 1.9.3p448
=end


Files

ruby-segfault (9.25 KB) ruby-segfault phemmer (Patrick Hemmer), 11/28/2013 11:17 AM
backport_r44260_to_ruby_2_0_0.patch (2.22 KB) backport_r44260_to_ruby_2_0_0.patch nagachika (Tomoyuki Chikanaga), 12/21/2013 10:42 PM

Updated by Glass_saga (Masaki Matsushita) over 10 years ago

  • Category set to core
  • Status changed from Open to Assigned
  • Target version set to 2.1.0
  • Backport changed from 1.9.3: UNKNOWN, 2.0.0: UNKNOWN to 1.9.3: DONTNEED, 2.0.0: REQUIRED

I wrote minimal code.

raise_proc = proc do
raise
end

10000.times do
ObjectSpace.define_finalizer(Object.new, raise_proc)
Thread.handle_interrupt(RuntimeError => :immediate) do # SEGV is caused in :on_blocking and :never too.
break
end
end

Updated by tarui (Masaya Tarui) over 10 years ago

  • Status changed from Assigned to Open
  • Priority changed from Normal to 5

I think that this is serious.

i checked what happen at error case.

  1. By break, throwobj is set to th->errinfo in throw RubyVM command.
  2. In rb_thread_s_handle_interrupt, trap(TAG) throwobj, and call finalizer via RUBY_VM_CHECK_INTS(th) before JUMP_TAG.
  3. In finalizer, th->errinfo is used by raise.

Updated by apoc (Matthias Hecker) over 10 years ago

I'm writing because I think I encountered that bug in the ruby IRC bot rbot: http://paste.geekosphere.org/BZ
The stacktrace looks very similar, the crash only happens after a few hours.

rbot is using Queue (and Mutex to synchronize).

I can reproduce the bug with the script provided. Also using 2.0.0p353, the bot works fine in 1.9.3.

Actions #4

Updated by nobu (Nobuyoshi Nakada) over 10 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r44260.
Patrick, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


vm_trace.c: isolate exceptions

  • vm_trace.c (rb_postponed_job_flush): isolate exceptions in
    postponed jobs and restore outer ones. based on a patch by
    tarui. [ruby-core:58652] [Bug #9168]
Actions #5

Updated by nagachika (Tomoyuki Chikanaga) over 10 years ago

  • Tracker changed from Bug to Backport
  • Project changed from Ruby master to Backport200
  • Category deleted (core)
  • Status changed from Closed to Assigned
  • Assignee set to nagachika (Tomoyuki Chikanaga)
  • Target version deleted (2.1.0)

Updated by nagachika (Tomoyuki Chikanaga) over 10 years ago

Because ruby_2_0_0 don't has postponed_job, r44260 cannot be backported to ruby_2_0_0 cleanly.
I've made equivalent patch for ruby_2_0_0.
nobu, please review it.

Thanks in advance.

Updated by nobu (Nobuyoshi Nakada) over 10 years ago

Why are rb_set_errinfo and direct assignment mixed?

Updated by nagachika (Tomoyuki Chikanaga) over 10 years ago

Thank you for your reviewing.

Why are rb_set_errinfo and direct assignment mixed?

Because rb_set_errinfo() raise error when argument is Fixnum and state is TAG_BREAD, TAG_FATAL etc.

Actions #9

Updated by nagachika (Tomoyuki Chikanaga) over 10 years ago

  • Status changed from Assigned to Closed

I've committed my patch at r44576 with some modification with ChangeLog

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0