Bug #6174
closedFix collision of ConditionVariable#wait timeout and #signal (+ other cosmetic changes)
Description
- Currently, when Thread wakes on timeout, it could not remove itself from ConditionVariable waiters until it acquires lock.
So that, when ConditionVariable#signal is called, it will try to wakeup
such thread, instead of some one else.
https://github.com/funny-falcon/ruby/commit/24a9b6112477b2107ac9a19d0189a11fb97aa891
https://github.com/funny-falcon/ruby/commit/24a9b6112477b2107ac9a19d0189a11fb97aa891.patch
Simple way to avoid it, is to allow Mutex#sleep to recieve a block, which will be called right after Thread will awake, but before Mutex will be tried to lock
https://github.com/funny-falcon/ruby/commit/9e9157c5318926331dbe883416b69d38a58fea5d
https://github.com/funny-falcon/ruby/commit/9e9157c5318926331dbe883416b69d38a58fea5d.patch
-
Since MatzRuby use GVL for thread isolation, and native method could not be interrupted (unless it will), we could remove couple of calles to Mutex#synchronize
https://github.com/funny-falcon/ruby/commit/a9ad8d274b96f14519643fc63327394f72b83516
https://github.com/funny-falcon/ruby/commit/a9ad8d274b96f14519643fc63327394f72b83516.patch -
Usage of hash with
compare_by_identity
allows remove call to Array#include? in a Queue. Also it allows to remove other call Mutex#synchronize from ConditionVariable#wait in case when we rely on GVL.
https://github.com/funny-falcon/ruby/commit/0da1887a04f7a0e4f9289d2167c2a6d0073651e1
https://github.com/funny-falcon/ruby/commit/0da1887a04f7a0e4f9289d2167c2a6d0073651e1.patch -
And cosmetic changes to SizedQueue
https://github.com/funny-falcon/ruby/commit/60ed97557c8178bc78edf670f3d53d761e627bf0
https://github.com/funny-falcon/ruby/commit/60ed97557c8178bc78edf670f3d53d761e627bf0.patch
Pull request at once:
https://github.com/ruby/ruby/pull/104
https://github.com/ruby/ruby/pull/104.diff
https://github.com/ruby/ruby/pull/104.patch
Updated by funny_falcon (Yura Sokolov) almost 13 years ago
and native method could not be interrupted (unless it will)
Sorry, it should be read as "(unless it wish)"
Updated by mame (Yusuke Endoh) over 12 years ago
- Status changed from Open to Assigned
- Assignee set to ko1 (Koichi Sasada)
Updated by kosaki (Motohiro KOSAKI) over 12 years ago
Hi Yura,
If nobody reviews months, I'll reveiw this. The description is completely sane. :)
Updated by ko1 (Koichi Sasada) over 12 years ago
- Assignee changed from ko1 (Koichi Sasada) to kosaki (Motohiro KOSAKI)
Thanks Kosaki-san!
Updated by kosaki (Motohiro KOSAKI) over 12 years ago
- Status changed from Assigned to Rejected
I've reviewed your code. Unofrtunately your testcase is not correct.
- def test_condvar_wait_timeout
- serialize = Serializer.new
- mutex = Mutex.new
- condvar = ConditionVariable.new
- def condvar.waiters
-
@waiters
- end
- thread = Thread.new do
-
serialize.signal
-
mutex.synchronize do
-
condvar.wait(mutex, 0.001)
-
end
- end
- serialize.wait{ thread.run }
- mutex.synchronize do
-
sleep(0.01)
-
assert_not_includes(condvar.waiters, thread)
- end
- end
this code behave as following.
- main thread wait at serialize.wait
- sub thread wake main thread up by serialize signal
- sub thread wait at condvar.wait(mutex, 0.001)
- main thread wake sub thread up by thread.run
- main thread take 'mutex'
- main thread sleep 0.01 seconds. but sub thread can't wake up because it can't take a 'mutex'.
- main thread inspect condvar.waiters and it found sub thread.
But this is not a bug. main thread shouldn't take a mutex.
Moreover, your callback approach have a race. Think, if trap is happen when waiting in mutex.sleep.
your callback don't have any guarantee to be called.
I'm sorry. we can't take this patch.
Updated by funny_falcon (Yura Sokolov) over 12 years ago
- serialize.wait{ thread.run }
This could be
- serialize.wait{ }
Test is failing as well, so that, step 4 doesn't matter.
In fact, step 4 runs before step 2 in every run, you could easily check it by putting "puts", so that
thread.run wakes up sub thread at the beginning, not at the condvar.wait !!!!
Then steps are looks like:
- main thread wait at serialize.wait
- sub thread wake main thread up by serialize signal
- sub thread wait at condvar.wait(mutex, 0.001)
- main thread take 'mutex'
- main thread sleep 0.01 seconds (imitating some work).
- sub thread tries to wake up, so that it should not be present in waiters.
But he can't wake up because it can't take a 'mutex'. - main thread inspect condvar.waiters and it found sub thread.
I cann't understand, why "main thread shouldn't take a mutex"? In real application it could be very other thread, not main.
Holding mutex by main thread were intentionally to expose this race. Race could happen without it, but it would be hard to catch.
I see, that callback should be wrapped in ensure inside of mutex.sleep . Thank you for this point, I'll try to fix it.
Updated by funny_falcon (Yura Sokolov) over 12 years ago
Good day, Kosaki-san.
Ok, now callback is wrapped to rb_ensure inside of rb_mutex_sleep_forever and rb_mutex_wait_for.
Also, I replace serialize.wait{ thread.run } to serialize.wait inside of test to be less confusing.
Would you mind recheck patch and make consideration again, please?
(it is at the same place:
https://github.com/ruby/ruby/pull/104
https://github.com/ruby/ruby/pull/104.patch
)
With regards,
Sokolov Yura aka funny-falcon.
Updated by kosaki (Motohiro KOSAKI) over 12 years ago
I cann't understand, why "main thread shouldn't take a mutex"? In real application it could be very other thread, not main.
Holding mutex by main thread were intentionally to expose this race. Race could happen without it, but it would be hard to catch.
While the mutex is held by any thread, condvar.wait can't be return. It's one of basic condvar semantics. and sleep(0.01) don't release
mutex at all. Thus, sleep(0.01) of main thread can't help to wakeup and finish sub thread.
Updated by kosaki (Motohiro KOSAKI) over 12 years ago
Ok, now callback is wrapped to rb_ensure inside of rb_mutex_sleep_forever and rb_mutex_wait_for.
Unfortunately, this doesn't help us. Even if using rb_ensure, still your callback can be interruptible. and we have no guarantee to execute
a callback.
I believe we need to implement [Feature #6762].