Bug #21836
closedRUBY_MN_THREADS deadlock and sleep issues
Description
I created a benchmark for the purpose of testing a fix for Issue 21685
The benchmark is inline below and saved as 100usec.rb. It should always take just a bit over 10 seconds, regardless of how many cores or threads are assigned. It should look like this:
> time taskset --cpu-list 1 ./ruby 100usec.rb 1
real 0m10.130s
user 0m0.094s
sys 0m0.218s
It works fine with MN_THREADS disabled. However, with RUBY_MN_THREADS=1, I see two separate issues.
Issue #1 is that when the number of cores is set to more than 1, the benchmark completes too fast.¶
These results are consistent across the ruby versions I tried. I did a little debugging and found that the call to sleep(0.001) (1ms) is returning after only 0.000005 seconds (5us).
> time RUBY_MN_THREADS=1 taskset --cpu-list 1,2 ./ruby 100usec.rb 1
real 0m0.359s
user 0m0.075s
sys 0m0.134s
Issue #2 shows up when the number of cores is limited to 1.¶
- On ruby 3.4.7 it runs very slow, about 6x slower than expected.
> time RUBY_MN_THREADS=1 taskset --cpu-list 1 ./ruby 100usec.rb 1
real 1m2.277s
user 0m0.249s
sys 0m0.408s
- On ruby 4.0.0-preview2 it usually deadlocks, but sometimes it segfaults.
> time RUBY_MN_THREADS=1 taskset --cpu-list 1 ./ruby ruby/100usec.rb 1
[BUG] unreachable
ruby 4.0.0preview2 (2025-11-17 master 4fa6e9938c) +MN +PRISM [aarch64-linux]
-- Control frame information -----------------------------------------------
-- Threading information ---------------------------------------------------
Total ractor count: 1
Ruby thread count for this ractor: 1
-- C level backtrace information -------------------------------------------
[BUG] Illegal instruction at 0x0000ffff93777250
ruby 4.0.0preview2 (2025-11-17 master 4fa6e9938c) +MN +PRISM [aarch64-linux]
Crashed while printing bug report
Illegal instruction
or
> time RUBY_MN_THREADS=1 taskset --cpu-list 1 ./ruby ruby/100usec.rb 1
<<DEADLOCK>>
The benchmark code (100usec.rb)¶
ITRCOUNT = 10000
def inner_test
r, w = IO.pipe
reader = Thread.new do
ITRCOUNT.times.map {|i|
r.getbyte
}
end
ITRCOUNT.times.map {|i|
sleep 0.0001
w.write('0')
}
reader.join
end
def outer_test(count)
count.times.map{|j|
Thread.new do
inner_test
end
}.each{|t| t.join}
end
outer_test(ARGV[0].to_i)
Updated by luke-gru (Luke Gruber) about 1 month ago
· Edited
I'm confused about what should happen. Shouldn't it return roughly after 1 second instead of 10 seconds? I'll look into the sleep issue with RUBY_MN_THREADS=1, but I can't reproduce the deadlock or segfault with a more recent commit (ad6b85450d).
Edit: It's this line: https://github.com/ruby/ruby/blob/master/thread_pthread.c#L2953. If the deadline for a timeout is less than 1ms, we wake the thread anyway. It's probably too relaxed, maybe we should change it to a few microseconds or remove it altogether.
Updated by khasinski (Chris Hasiński) 28 days ago
- Status changed from Open to Closed
Applied in changeset git|5add7c3ea9a13e657fc7cba78b2633b9548a55aa.
Fix RUBY_MN_THREADS sleep returning prematurely (#15868)
timer_thread_check_exceed() was returning true when the remaining time
was less than 1ms, treating it as "too short time". This caused
sub-millisecond sleeps (like sleep(0.0001)) to return immediately
instead of actually sleeping.
The fix removes this optimization that was incorrectly short-circuiting
short sleep durations. Now the timeout is only considered exceeded when
the actual deadline has passed.
Note: There's still a separate performance issue where MN_THREADS mode
is slower for sub-millisecond sleeps due to the timer thread using
millisecond-resolution polling. This will require a separate fix to
use sub-millisecond timeouts in kqueue/epoll.
[Bug #21836]
Updated by jpl-coconut (Jacob Lacouture) 28 days ago
To close the loop here, yes, I was wrong: 1sec should be expected, not 10.
Without MN_THREADS, the 0.1msec sleep becomes a 1msec sleep. With MN_THREADS, the 0.1msec sleep is skipped completely. I see the second of theses issues is now fixed.
Thanks for the discussion and fix!