Project

General

Profile

Actions

Bug #21840

open

Locking a mutex can lead to starvation

Bug #21840: Locking a mutex can lead to starvation

Added by luke-gru (Luke Gruber) about 1 month ago. Updated 25 days ago.


Description

Continually locking a mutex m can lead to starvation if all other threads are on the waitq of m.

Let T be the thread that keeps on acquiring mutex m in a loop.

Iteration 1:

  1. T locks mutex m
  2. All other threads attempt to acquire m and end up on its waitq
  3. T releases the GVL, doesn't wake any other threads because they're all waiting on m
  4. T returns from the blocking function, resets its running_time to 0, acquires the GVL and continues running
  5. T unlocks m. It adds the head of the waitq (T2) to the thread readyq. T continues running

Iteration 2:

  1. T locks mutex m
  2. T calls a blocking function. This time, it dequeues the readyq (ex: T2) and sends it the wakeup signal. T runs its blocking function
  3. T2 wakes up, acquires the GVL and attempts to lock mutex m. It fails and goes back asleep, putting itself back on the waitq of m.
  4. T returns from the blocking function, sets its running_time back to 0, acquires the GVL and keeps running
  5. T unlocks m. It adds the head of the waitq (ex: T3) to the end of the thread readyq. T continues running. Repeat Iteration 2 except the head of readyq is now T3

The problem is that T can never be pre-empted.

Example script:

m = Mutex.new

def fib(n)
  return n if n <= 1
  fib(n - 1) + fib(n - 2)
end

t1_running = false
t1 = Thread.new do
  t1_running = true
  loop do
    fib(20)
    m.synchronize do
      $stderr.puts "t1 iter"
    end
  end
end

loop until t1_running

5.times.map do
  Thread.new do
    m.synchronize do
    end
  end
end.each(&:join)

Updated by luke-gru (Luke Gruber) 28 days ago Actions #1

  • Target version changed from 4.0 to 4.1
  • ruby -v set to 4.1.0dev
  • Backport changed from 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN to 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: REQUIRED, 4.0: REQUIRED

Updated by Anonymous 28 days ago Actions #2

  • Status changed from Open to Closed

Applied in changeset git|994257ab06072df38de024e70a60aa9a87e36089.


Prevent starvation when acquiring mutex over and over (#15877)

Continually locking a mutex m can lead to starvation if all other threads are on the waitq of m.

See https://bugs.ruby-lang.org/issues/21840 for more details.

Solution:

When a thread T1 wakes up T2 during mutex unlock but T1 or any other thread successfully acquires it
before T2, then we record the running_time of the thread during mutex acquisition. Then during unlock, if
that thread's running_time is less than the saved running time, we set it back to the saved time.

Fixes [Bug #21840]

Updated by ivoanjo (Ivo Anjo) 25 days ago Actions #3 [ruby-core:124639]

Really cool! I wonder if this will end up solving https://bugs.ruby-lang.org/issues/19717 as well / https://github.com/ruby/ruby/pull/8331 is an old PR for it.

Updated by luke-gru (Luke Gruber) 25 days ago ยท Edited Actions #4 [ruby-core:124640]

  • Status changed from Closed to Open

I wasn't aware of those old issues. I'll take a look, thanks!

I reverted the commit because of issues with a Monitor test in CI. I thought it was related to this change, but the test kept failing even after the revert. It turns out it was a badly designed test and I've fixed it. I plan on recommitting this fix, but I may change the implementation.

Actions

Also available in: PDF Atom