Project

General

Profile

Actions

Bug #19395

closed

Process forking within non-main Ractor hits rb_bug()

Added by luke-gru (Luke Gruber) about 1 year ago. Updated 2 months ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:112155]

Description

def test_fork_in_ractor
  r2 = Ractor.new do
    pid = fork do
      exit Ractor.count
    end
    pid
  end
  pid = r2.take
  puts "Process #{Process.pid} waiting for #{pid}"
  _pid, status = Process.waitpid2(pid) # stuck forever
  if status.exitstatus != 1
    raise "status is #{status.exitstatus}"
  end
end
test_fork_in_ractor()

$ top # shows CPU usage is high for child process

Updated by luke-gru (Luke Gruber) about 1 year ago

  • Subject changed from Process forking within non-main Ractor creates child stuck in busy loop to Process forking within non-main Ractor causes segv

Sorry, my changes in my dev branch were causing some odd behavior. It just crashes on 3.2.0.

Updated by nobu (Nobuyoshi Nakada) about 1 year ago

luke-gru (Luke Gruber) wrote in #note-1:

It just crashes on 3.2.0.

I can't reproduce the SEGV on macOS 13.1.
What platform are you using?

Actions #3

Updated by nobu (Nobuyoshi Nakada) about 1 year ago

  • Status changed from Open to Feedback

Updated by luke-gru (Luke Gruber) about 1 year ago

  • ruby -v set to 3.2.0

Ubuntu 22.04 x86-64
Linux 5.15.0-58-generic
libpthread.so.0 (libc6,x86-64, OS ABI: Linux 3.2.0)

The issue seems to be calling rb_native_mutex_destroy on a locked mutex in ractor_free.

Relevant part of the backtrace:

/home/lukeg/.rbenv/versions/3.2.0/lib/libruby.so.3.2(die+0x0) [0x7fc1374d0e5f] /tmp/ruby-build.20230103230257.28392.jU9iPR/ruby-3.2.0/error.c:798

/home/lukeg/.rbenv/versions/3.2.0/lib/libruby.so.3.2(rb_bug) /tmp/ruby-build.20230103230257.28392.jU9iPR/ruby-3.2.0/error.c:800

/home/lukeg/.rbenv/versions/3.2.0/lib/libruby.so.3.2(rb_bug_errno+0x43) [0x7fc137579223] /tmp/ruby-build.20230103230257.28392.jU9iPR/ruby-3.2.0/error.c:829

/home/lukeg/.rbenv/versions/3.2.0/lib/libruby.so.3.2(rb_native_mutex_destroy+0x24) [0x7fc137719a24] /tmp/ruby-build.20230103230257.28392.jU9iPR/ruby-3.2.0/thread_pthread.c:603

/home/lukeg/.rbenv/versions/3.2.0/lib/libruby.so.3.2(ractor_free+0x11) [0x7fc137679991] /tmp/ruby-build.20230103230257.28392.jU9iPR/ruby-3.2.0/ractor.c:235

/home/lukeg/.rbenv/versions/3.2.0/lib/libruby.so.3.2(run_final+0xf) 

If instead you change exit 0 to exec "date", it doesn't crash. Maybe the atfork hooks need to be changed to acquire locks in parent, unlock in child.

Actions #5

Updated by luke-gru (Luke Gruber) about 1 year ago

  • Subject changed from Process forking within non-main Ractor causes segv to Process forking within non-main Ractor hits rb_bug()

Updated by luke-gru (Luke Gruber) about 1 year ago

This fixes it:

https://github.com/luke-gru/ruby/commit/16d8e7575570c6b2d24505e3685d6f0147375286

The issue is that when there's multiple ractors and you call fork, the other ractor(s) that are in the child process that aren't the new main ractor need to be GC'd, and their mutexes could be in a weird state, so either skip destruction of them or reinitialize them in the child process. Re-init works on my machine but I don't know if it works across platforms.

Updated by luke-gru (Luke Gruber) 2 months ago

I can no longer reproduce this issue, I probably had some changes in my tree that were causing the issues. Sorry! Please close.

Actions #8

Updated by byroot (Jean Boussier) 2 months ago

  • Status changed from Feedback to Closed
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0