Bug #21840
openLocking a mutex can lead to starvation
Description
Continually locking a mutex m can lead to starvation if all other threads are on the waitq of m.
Let T be the thread that keeps on acquiring mutex m in a loop.
Iteration 1:
-
Tlocks mutexm - All other threads attempt to acquire
mand end up on its waitq -
Treleases the GVL, doesn't wake any other threads because they're all waiting onm -
Treturns from the blocking function, resets itsrunning_timeto 0, acquires the GVL and continues running -
Tunlocksm. It adds the head of the waitq (T2) to the thread readyq.Tcontinues running
Iteration 2:
-
Tlocks mutexm -
Tcalls a blocking function. This time, it dequeues the readyq (ex:T2) and sends it the wakeup signal.Truns its blocking function -
T2wakes up, acquires the GVL and attempts to lock mutexm. It fails and goes back asleep, putting itself back on the waitq ofm. -
Treturns from the blocking function, sets itsrunning_timeback to 0, acquires the GVL and keeps running -
Tunlocksm. It adds the head of the waitq (ex:T3) to the end of the thread readyq.Tcontinues running. Repeat Iteration 2 except the head of readyq is nowT3
The problem is that T can never be pre-empted.
Example script:
m = Mutex.new
def fib(n)
return n if n <= 1
fib(n - 1) + fib(n - 2)
end
t1_running = false
t1 = Thread.new do
t1_running = true
loop do
fib(20)
m.synchronize do
$stderr.puts "t1 iter"
end
end
end
loop until t1_running
5.times.map do
Thread.new do
m.synchronize do
end
end
end.each(&:join)
Updated by luke-gru (Luke Gruber) 28 days ago
- Target version changed from 4.0 to 4.1
- ruby -v set to 4.1.0dev
- Backport changed from 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN to 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: REQUIRED, 4.0: REQUIRED
Updated by Anonymous 28 days ago
- Status changed from Open to Closed
Applied in changeset git|994257ab06072df38de024e70a60aa9a87e36089.
Prevent starvation when acquiring mutex over and over (#15877)
Continually locking a mutex m can lead to starvation if all other threads are on the waitq of m.
See https://bugs.ruby-lang.org/issues/21840 for more details.
Solution:
When a thread T1 wakes up T2 during mutex unlock but T1 or any other thread successfully acquires it
before T2, then we record the running_time of the thread during mutex acquisition. Then during unlock, if
that thread's running_time is less than the saved running time, we set it back to the saved time.
Fixes [Bug #21840]
Updated by ivoanjo (Ivo Anjo) 25 days ago
Really cool! I wonder if this will end up solving https://bugs.ruby-lang.org/issues/19717 as well / https://github.com/ruby/ruby/pull/8331 is an old PR for it.
Updated by luke-gru (Luke Gruber) 25 days ago
ยท Edited
- Status changed from Closed to Open
I wasn't aware of those old issues. I'll take a look, thanks!
I reverted the commit because of issues with a Monitor test in CI. I thought it was related to this change, but the test kept failing even after the revert. It turns out it was a badly designed test and I've fixed it. I plan on recommitting this fix, but I may change the implementation.