Project

General

Profile

Actions

Bug #17300

closed

The Fiber scheduler does not work with ConditionVariable

Added by Eregon (Benoit Daloze) over 3 years ago. Updated over 3 years ago.

Status:
Closed
Target version:
-
ruby -v:
ruby 3.0.0dev (2020-10-31T02:56:41Z master 4f8d9b0db8) [x86_64-linux]
[ruby-core:100680]

Description

When looking at replacing kernel_sleep by blocking, I found an independent bug.
ConditionVariable does not seem to work with the Fiber scheduler currently.
There is an existing test in https://github.com/ruby/ruby/blob/4f8d9b0db84c42c8d37f75de885de1c0a5cb542c/test/fiber/test_mutex.rb#L105-L140 on which I based this reproduction example.
The test should always have signalled==3, but the check is only > 1.
The test is also racy, as ConditionVariable#signal has no effect if no other Thread/Fiber is in ConditionVariable#wait.

Here is the reproduction, by default it runs without the scheduler. Pass it scheduler as an argument to use the test Scheduler.
I save the script under test/fiber for convenience.

require_relative 'scheduler'

USE_SCHEDULER = ARGV.delete('scheduler')

mutex = Mutex.new
condition = ConditionVariable.new

signalled = 0

q = Queue.new

a = Thread.new do
  Thread.current.scheduler = Scheduler.new if USE_SCHEDULER
  
  body = -> do
    mutex.synchronize do
      3.times do |i|
        q << :ready
        p [:wait, i]
        condition.wait(mutex)
        raise unless mutex.owned?
        signalled += 1
      end
    end
  end
  
  USE_SCHEDULER ? Fiber.schedule(&body) : body.call
end

b = Thread.new do
  Thread.current.scheduler = Scheduler.new if USE_SCHEDULER
  
  body = -> do
    puts "Thread 2 starting"
    3.times do |i|
      q.pop # Only acquire Mutex once the other thread is in wait
      puts "Thread 2 locking Mutex"
      mutex.synchronize do
        p [:signal, i]
        condition.signal
      end

      sleep 1 # 0.1
    end
  end
  
  USE_SCHEDULER ? Fiber.schedule(&body) : body.call
end

a.join
b.join

p signalled
$ ruby condvar2.rb          
Thread 2 starting
[:wait, 0]
Thread 2 locking Mutex
[:signal, 0]
[:wait, 1]
Thread 2 locking Mutex
[:signal, 1]
[:wait, 2]
Thread 2 locking Mutex
[:signal, 2]
3
ruby condvar2.rb scheduler
Thread 2 starting
Thread 2 locking Mutex
[:wait, 0]
[:signal, 0]
# hangs

Updated by ioquatix (Samuel Williams) over 3 years ago

@Eregon (Benoit Daloze) thanks for this report, I will investigate it.

Updated by ioquatix (Samuel Williams) over 3 years ago

It looks like memory corruption.

th_mutex = 0x55613ba19018
-> th_mutex = 0x8

Some how the mutex linked list ends up with Qnil... not sure how yet.

Updated by ioquatix (Samuel Williams) over 3 years ago

Okay, I found an unrelated bug, and also the root cause. PR incoming.

Updated by Eregon (Benoit Daloze) over 3 years ago

  • Status changed from Open to Closed

Thanks for the quick fix.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0