Bug #20456
openHash can get stuck marked as iterating through process forking
Description
Steps to Reproduce¶
- Iterate over a hash
- While that iteration is happening, fork the process
a. This should be done in a way that causes the iteration to never finish from the child process's view, e.g. fork with a block, or iteration is happening in a different thread than fork - Attempt to add a new key to the hash in the child process
a. This can be before or after the iteration finishes in the parent process, doesn't matter
Observed Behavior¶
The child process can never add a new key to the hash, always fails with RuntimeError: can't add a new key into hash during iteration
Desired¶
The hash is no longer iterating in the child process, so it can be modified as needed
Examples¶
With threads:¶
h = { a: 1 }
t = Thread.new do
sleep 0.05
pid = fork do
sleep 0.1
puts 'child: before'
h[:b] = 2
puts 'child: after'
end
Process.wait2(pid)
end
puts 'parent: before'
h.each do
sleep 0.1
end
puts 'parent: after'
puts t.join.value.inspect
produces:
parent: before
parent: after
child: before
can't add a new key into hash during iteration (RuntimeError)
[34450, #<Process::Status: pid 34450 exit 1>]
Without threads:¶
h = { a: 1 }
pid = nil
puts 'parent: before'
h.each do
pid = fork do
sleep 0.05
puts 'child: before'
h[:b] = 2
puts 'child: after'
end
end
puts 'parent: after'
puts Process.wait2(pid).inspect
produces:
parent: before
parent: after
child: before
can't add a new key into hash during iteration (RuntimeError)
[17809, #<Process::Status: pid 17809 exit 1>]
Platform information¶
This behavior has been observed in the following environments
Updated by Eregon (Benoit Daloze) 5 months ago · Edited
I would think this is a general pitfall of fork(2)
, it can leave things in other threads in a random inconsistent and irrecoverable state.
IMO it's a mistake that fork(2)
doesn't copy all threads to the new process, but it is what it is.
Maybe this specific issue can be worked around in CRuby, but it sounds to me like it can't.
So the best is to ensure a single Ruby-level thread when calling Process.fork
.
Maybe Process.fork
should warn or error if that's not the case.
Updated by byroot (Jean Boussier) 5 months ago · Edited
but it sounds to me like it can't.
It could, and there's precedent. For instance Ruby unlock mutexes owned by dead threads.
This would require to keep a list of the threads that incremented a hash's iterlevel
, so definitely not free, but perhaps acceptable overhead. Would be worth trying IMO.
One possible implementation could be for each threads (or fiber?) to have a stack of VALUE
, in which to store the Hash for which it bumped the iterlevel. The overhead would be an extra RArray push/pop
when starting and exiting iterating on a Hash, which I think may not matter that much in the grand scheme of things.
Updated by mame (Yusuke Endoh) 5 months ago
I think this problem is not limited to Hash's iter_lev
, but potentially all states, like STR_TMPLOCK. Once you start supporting such things, it may never end. I would rather that applications that use a combination of Thread and fork be careful.
Updated by byroot (Jean Boussier) 5 months ago
like STR_TMPLOCK.
True. That said STR_TMPLOCK
is used to lock a string for mutation, so much less likely to be shared between threads.
But I guess sharing a mutable hash between threads isn't a great idea either, and frozen hashes don't increment iterlevel, so there is a relatively easy way out of that.
I also understand your point, and generally agree, but would like to state that as long as the GVL remain (and AFAIK there is no plans for it to go away) fork
will remain an essential API for shared-nothing parallelism in Ruby (except for cases where Ractors are suitable). So while we should certainly accept it's an API that will always have corner cases, I don't think we should shy away from making it easier to use and more robust when we can.
Updated by Eregon (Benoit Daloze) 5 months ago
I don't think we should shy away from making it easier to use and more robust when we can.
The best way to do that IMO is warn/error on fork when there are multiple Ruby Threads alive.
It could, and there's precedent. For instance Ruby unlock mutexes owned by dead threads.
It is a measurable overhead on Mutex performance actually (at least on TruffleRuby), and of course it's completely unsound, whatever was being protected under Mutex#synchronize
is now in an inconsistent state and the next operation on it could very well segfault, race, raise, etc.
I think we should remove that, not add more hacks for something which is intrinsically unsafe.
If people want to fork and there are multiple Ruby Threads around, they should terminate those threads cleanly first.
Updated by byroot (Jean Boussier) 5 months ago
I think we should remove [auto unlock of mutexes owned by dead thread]
I very strongly oppose that.