Project

General

Profile

Actions

Bug #20456

open

Hash can get stuck marked as iterating through process forking

Added by blowfishpro (Talia Wong) about 1 month ago. Updated about 1 month ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:117711]

Description

Steps to Reproduce

  1. Iterate over a hash
  2. While that iteration is happening, fork the process
    a. This should be done in a way that causes the iteration to never finish from the child process's view, e.g. fork with a block, or iteration is happening in a different thread than fork
  3. Attempt to add a new key to the hash in the child process
    a. This can be before or after the iteration finishes in the parent process, doesn't matter

Observed Behavior

The child process can never add a new key to the hash, always fails with RuntimeError: can't add a new key into hash during iteration

Desired

The hash is no longer iterating in the child process, so it can be modified as needed

Examples

With threads:

h = { a: 1 }
t = Thread.new do
  sleep 0.05
  pid = fork do
    sleep 0.1
    puts 'child: before'
    h[:b] = 2
    puts 'child: after'
  end
  Process.wait2(pid)
end

puts 'parent: before'
h.each do
  sleep 0.1
end
puts 'parent: after'
puts t.join.value.inspect

produces:

parent: before
parent: after
child: before
can't add a new key into hash during iteration (RuntimeError)
[34450, #<Process::Status: pid 34450 exit 1>]

Without threads:

h = { a: 1 }
pid = nil
puts 'parent: before'
h.each do
  pid = fork do
    sleep 0.05
    puts 'child: before'
    h[:b] = 2
    puts 'child: after'
  end
end
puts 'parent: after'
puts Process.wait2(pid).inspect

produces:

parent: before
parent: after
child: before
can't add a new key into hash during iteration (RuntimeError)
[17809, #<Process::Status: pid 17809 exit 1>]

Platform information

This behavior has been observed in the following environments

  • Ruby 3.3.0 on Mac OS 14.4.1 (Apple M1 Max) installed via asdf
  • Ruby 2.7.5 on Mac OS 14.4.1 (Apple M1 Max) installed via asdf
  • Ruby 3.2.3 on RockyLinux 8.4 (x86_64) installed from Fullstaq

Updated by Eregon (Benoit Daloze) about 1 month ago · Edited

I would think this is a general pitfall of fork(2), it can leave things in other threads in a random inconsistent and irrecoverable state.
IMO it's a mistake that fork(2) doesn't copy all threads to the new process, but it is what it is.
Maybe this specific issue can be worked around in CRuby, but it sounds to me like it can't.

So the best is to ensure a single Ruby-level thread when calling Process.fork.
Maybe Process.fork should warn or error if that's not the case.

Updated by byroot (Jean Boussier) about 1 month ago · Edited

but it sounds to me like it can't.

It could, and there's precedent. For instance Ruby unlock mutexes owned by dead threads.

This would require to keep a list of the threads that incremented a hash's iterlevel, so definitely not free, but perhaps acceptable overhead. Would be worth trying IMO.

One possible implementation could be for each threads (or fiber?) to have a stack of VALUE, in which to store the Hash for which it bumped the iterlevel. The overhead would be an extra RArray push/pop when starting and exiting iterating on a Hash, which I think may not matter that much in the grand scheme of things.

Updated by mame (Yusuke Endoh) about 1 month ago

I think this problem is not limited to Hash's iter_lev, but potentially all states, like STR_TMPLOCK. Once you start supporting such things, it may never end. I would rather that applications that use a combination of Thread and fork be careful.

Updated by byroot (Jean Boussier) about 1 month ago

like STR_TMPLOCK.

True. That said STR_TMPLOCK is used to lock a string for mutation, so much less likely to be shared between threads.

But I guess sharing a mutable hash between threads isn't a great idea either, and frozen hashes don't increment iterlevel, so there is a relatively easy way out of that.

I also understand your point, and generally agree, but would like to state that as long as the GVL remain (and AFAIK there is no plans for it to go away) fork will remain an essential API for shared-nothing parallelism in Ruby (except for cases where Ractors are suitable). So while we should certainly accept it's an API that will always have corner cases, I don't think we should shy away from making it easier to use and more robust when we can.

Updated by Eregon (Benoit Daloze) about 1 month ago

I don't think we should shy away from making it easier to use and more robust when we can.

The best way to do that IMO is warn/error on fork when there are multiple Ruby Threads alive.

It could, and there's precedent. For instance Ruby unlock mutexes owned by dead threads.

It is a measurable overhead on Mutex performance actually (at least on TruffleRuby), and of course it's completely unsound, whatever was being protected under Mutex#synchronize is now in an inconsistent state and the next operation on it could very well segfault, race, raise, etc.
I think we should remove that, not add more hacks for something which is intrinsically unsafe.

If people want to fork and there are multiple Ruby Threads around, they should terminate those threads cleanly first.

Updated by byroot (Jean Boussier) about 1 month ago

I think we should remove [auto unlock of mutexes owned by dead thread]

I very strongly oppose that.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0