Project

General

Profile

Actions

Bug #17529

closed

Ractor Segfaults with GC enabled

Added by prajjwal (Prajjwal Singh) over 1 year ago. Updated 2 months ago.

Status:
Closed
Priority:
Normal
Target version:
-
ruby -v:
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-linux]
[ruby-core:102008]

Description

I've been benchmarking Ractor on my machine with the following naive prime number generator:

# frozen_string_literal: true

def prime?(n)
  2.upto(n - 1).none? { |i| n % i == 0 }
end

NUM_WORKERS = ARGV[0].to_i

producer = Ractor.new do
  i = 1000000

  loop { Ractor.yield i; i += 1 }
end

workers = (1..NUM_WORKERS).map do
  Ractor.new producer do |producer|
    while n = producer.take
      Ractor.yield [n, prime?(n)]
    end
  end
end

loop do
  _r, ( number, prime ) = Ractor.select(*workers)

  p number if prime
end

The code inevitably segfaults, and it appears to be the garbage collector.

If I stick GC.disable in there, the code happily chugs along for several minutes on end without a problem.


Files

ractor.crash (22.5 KB) ractor.crash prajjwal (Prajjwal Singh), 01/12/2021 01:12 AM

Related issues 1 (0 open1 closed)

Related to Ruby master - Bug #17489: Ractor segfaultsClosedko1 (Koichi Sasada)Actions
Actions #2

Updated by marcandre (Marc-Andre Lafortune) over 1 year ago

Updated by marcandre (Marc-Andre Lafortune) over 1 year ago

Thanks for the report.

Probably the same bug as #17489

Updated by ko1 (Koichi Sasada) over 1 year ago

I couldn't reproduce it. Could you tell me ARGV[0]?

BTW please fill "ruby -v:" filed with your environment (even if it is in crash log)

ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-linux]

Updated by prajjwal (Prajjwal Singh) over 1 year ago

@ko1 (Koichi Sasada)

It crashes for any value of ARGV[0] between 1 and 25 (that I tested).

The fact that its happening so consistently for me and not for you makes me wonder if the problem stems from my version of Linux or GCC? Some other compile time option perhaps?

Here's my GCC version:

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --with-isl --with-linker-hash-style=gnu --with-system-zlib --enable-__cxa_atexit --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-install-libiberty --enable-linker-build-id --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --disable-libssp --disable-libstdcxx-pch --disable-libunwind-exceptions --disable-werror gdc_include_dir=/usr/include/dlang/gdc
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 10.2.0 (GCC) 

And Linux:

Linux Wraith 5.9.14-arch1-1 #1 SMP PREEMPT Sat, 12 Dec 2020 14:37:12 +0000 x86_64 GNU/Linux

Ruby Configure Args

'--prefix=/home/prajjwal/.rbenv/versions/3.0.0' '--enable-shared' 'LDFLAGS=-L/home/prajjwal/.rbenv/versions/3.0.0/lib ' 'CPPFLAGS=-I/home/prajjwal/.rbenv/versions/3.0.0/include '
Actions #6

Updated by prajjwal (Prajjwal Singh) over 1 year ago

  • ruby -v changed from 3.0.0 to ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-linux]

Updated by prajjwal (Prajjwal Singh) over 1 year ago

Just confirmed that it only segfaults when ruby is configured with the --enable-shared option (which rbenv does by default).

Even more info:

glibc 2.32-5

Updated by ko1 (Koichi Sasada) over 1 year ago

  • Assignee set to ko1 (Koichi Sasada)
  • Status changed from Open to Assigned
Actions #9

Updated by ko1 (Koichi Sasada) over 1 year ago

hmm I can't reproduce it yet. Can someone try it and get more information about it?

Updated by wanabe (_ wanabe) 2 months ago

I confirmed with 3.0.0 that the issue is reproducible.
According to git bisect, it seems to be fixed in fff1edf23ba28267bf57097c269f7fa87530e3fa and d0d6227a0da5925acf946a09191f172daf53baf2.

$ (git checkout origin/ruby_3_0 && git cherry-pick d0d6227a0da5925acf946a09191f172daf53baf2 fff1edf23ba28267bf57097c269f7fa87530e3fa && make miniruby -j8 ) >/dev/null 2>&1 && ./miniruby -v -W0 segv.rb
ruby 3.0.4p197 (2022-03-13 revision b04eb796e4) [x86_64-linux]

$ (git checkout origin/ruby_3_0 && make miniruby -j8 ) >/dev/null 2>&1 && ./miniruby -v -W0 segv.rbruby 3.0.4p197 (2022-03-13 revision f404b21f84) [x86_64-linux]
<internal:ractor>:627: [BUG] Segmentation fault at 0x0000000000000020
ruby 3.0.4p197 (2022-03-13 revision f404b21f84) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0003 p:0003 s:0015 e:000014 METHOD <internal:ractor>:627
c:0002 p:0019 s:0008 e:000007 BLOCK  segv.rb:13 [FINISH]
c:0001 p:---- s:0003 e:000002 (none) [FINISH]

-- Ruby level backtrace information ----------------------------------------
segv.rb:13:in `block (4 levels) in <main>'
<internal:ractor>:627:in `yield'

-- Machine register context ------------------------------------------------
 RIP: 0x0000560d0246f8d8 RBP: 0x00007f1c80f28920 RSP: 0x00007f1c80f28800
 RAX: 0x0000000000000000 RBX: 0x00007f1c80f28810 RCX: 0x0000000000000000
 RDX: 0x0000000000000001 RDI: 0x0000560d04738b98 RSI: 0x0000000000000000
  R8: 0x0000560d04738e10  R9: 0x0000000000000000 R10: 0x0000000000000001
 R11: 0x0000000000000002 R12: 0x00007f1c80f28820 R13: 0x0000560d04738b70
 R14: 0x0000560d047416b8 R15: 0x00007f1c80f28810 EFL: 0x0000000000010246

-- C level backtrace information -------------------------------------------
./miniruby(rb_vm_bugreport+0x4a4) [0x560d02565b34]
./miniruby(rb_bug_for_fatal_signal+0xf4) [0x560d02369a54]
./miniruby(sigsegv+0x4d) [0x560d024bae1d]
[0x7f1c85894520]
./miniruby(ractor_select+0x478) [0x560d0246f8d8]
./miniruby(builtin_inline_class_627+0x3e) [0x560d0247019e]
./miniruby(vm_exec_core+0x32cd) [0x560d0254d31d]
./miniruby(rb_vm_exec+0x1a2) [0x560d0254f8d2]
./miniruby(thread_do_start_proc+0x339) [0x560d025057d9]
./miniruby(thread_start_func_2+0xc84) [0x560d02506554]
./miniruby(thread_start_func_1+0xde) [0x560d0250682e]
[0x7f1c858e6947]
[0x7f1c85976a44]

-- Other runtime information -----------------------------------------------
(snip)

And the following script has been modified to make it easier to try.

1000.times do |q|
  producer = Ractor.new do
    1000.times do |i|
      Ractor.yield true
    end
  end

  workers = (1..10).map do
    Ractor.new producer do |producer|
      while n = producer.take
        Ractor.yield nil
      end
    rescue Ractor::ClosedError
    end
  end

  loop do
    _r, prime = Ractor.select(*workers)
  end
end

Updated by wanabe (_ wanabe) 2 months ago

I guess that the btest failure of ruby_3_0 branch on icc-x64 env may be fixed by git cherry-pick d0d6227a0da5925acf946a09191f172daf53baf2 fff1edf23ba28267bf57097c269f7fa87530e3fa.
(An example of this failure is http://rubyci.s3.amazonaws.com/icc-x64/ruby-3.0/log/20220321T004434Z.log.html.gz#test.rb)

Actions #12

Updated by nagachika (Tomoyuki Chikanaga) 2 months ago

  • Backport changed from 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN to 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED

Updated by nagachika (Tomoyuki Chikanaga) 2 months ago

  • Backport changed from 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED to 2.6: DONTNEED, 2.7: DONTNEED, 3.0: DONE

ruby_3_0 a72b7b898c69a116d754d599e8bb061761015255 merged revision(s) d0d6227a0da5925acf946a09191f172daf53baf2,fff1edf23ba28267bf57097c269f7fa87530e3fa.

Actions #14

Updated by nagachika (Tomoyuki Chikanaga) 2 months ago

  • Status changed from Assigned to Closed

Applied in changeset git|a72b7b898c69a116d754d599e8bb061761015255.


merge revision(s) d0d6227a0da5925acf946a09191f172daf53baf2,fff1edf23ba28267bf57097c269f7fa87530e3fa: [Backport #17529]

    alen should be actions number on ractor_select()

    alen was number of rs, but it should be actions number
    (taking ractors + receiving + yielding).
    ---
     ractor.c | 13 ++++++-------
     1 file changed, 6 insertions(+), 7 deletions(-)

    fix Ractor.yield(obj, move: true)

    Ractor.yield(obj, move: true) and
    Ractor.select(..., yield_value: obj, move: true) tried to yield a
    value with move semantices, but if the trial is faild, the obj
    should not become a moved object.

    To keep this rule, `wait_moving` wait status is introduced.

    New yield/take process:
    (1) If a ractor tried to yield (move:true), make taking racotr's
        wait status `wait_moving` and make a moved object by
        `ractor_move(obj)` and wakeup taking ractor.
    (2) If a ractor tried to take a message from a ractor waiting fo
        yielding (move:true), wakeup the ractor and wait for (1).
    ---
     bootstraptest/test_ractor.rb | 25 +++++++++++++++
     ractor.c                     | 73 +++++++++++++++++++++++++++++++++++---------
     ractor_core.h                |  1 +
     3 files changed, 84 insertions(+), 15 deletions(-)
Actions

Also available in: Atom PDF