Project

General

Profile

Actions

Bug #21836

closed

RUBY_MN_THREADS deadlock and sleep issues

Bug #21836: RUBY_MN_THREADS deadlock and sleep issues

Added by jpl-coconut (Jacob Lacouture) about 1 month ago. Updated 28 days ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 3.4.7 (2025-11-08) +PRISM [aarch64-linux]
[ruby-core:124518]

Description

I created a benchmark for the purpose of testing a fix for Issue 21685

The benchmark is inline below and saved as 100usec.rb. It should always take just a bit over 10 seconds, regardless of how many cores or threads are assigned. It should look like this:

> time taskset --cpu-list 1 ./ruby 100usec.rb 1
real	0m10.130s
user	0m0.094s
sys	0m0.218s

It works fine with MN_THREADS disabled. However, with RUBY_MN_THREADS=1, I see two separate issues.

Issue #1 is that when the number of cores is set to more than 1, the benchmark completes too fast.

These results are consistent across the ruby versions I tried. I did a little debugging and found that the call to sleep(0.001) (1ms) is returning after only 0.000005 seconds (5us).

> time RUBY_MN_THREADS=1 taskset --cpu-list 1,2 ./ruby 100usec.rb 1
real	0m0.359s
user	0m0.075s
sys	0m0.134s

Issue #2 shows up when the number of cores is limited to 1.

  • On ruby 3.4.7 it runs very slow, about 6x slower than expected.
> time RUBY_MN_THREADS=1 taskset --cpu-list 1 ./ruby 100usec.rb 1
real	1m2.277s
user	0m0.249s
sys	0m0.408s
  • On ruby 4.0.0-preview2 it usually deadlocks, but sometimes it segfaults.
> time RUBY_MN_THREADS=1 taskset --cpu-list 1 ./ruby ruby/100usec.rb 1
[BUG] unreachable
ruby 4.0.0preview2 (2025-11-17 master 4fa6e9938c) +MN +PRISM [aarch64-linux]

-- Control frame information -----------------------------------------------


-- Threading information ---------------------------------------------------
Total ractor count: 1
Ruby thread count for this ractor: 1

-- C level backtrace information -------------------------------------------
[BUG] Illegal instruction at 0x0000ffff93777250
ruby 4.0.0preview2 (2025-11-17 master 4fa6e9938c) +MN +PRISM [aarch64-linux]

Crashed while printing bug report
Illegal instruction

or

> time RUBY_MN_THREADS=1 taskset --cpu-list 1 ./ruby ruby/100usec.rb 1
<<DEADLOCK>>

The benchmark code (100usec.rb)

ITRCOUNT = 10000

def inner_test
	r, w = IO.pipe
	reader = Thread.new do
		ITRCOUNT.times.map {|i|
			r.getbyte
		}
	end

	ITRCOUNT.times.map {|i|
		sleep 0.0001
		w.write('0')
	}

	reader.join
end

def outer_test(count)
	count.times.map{|j|
		Thread.new do
			inner_test
		end
	}.each{|t| t.join}
end

outer_test(ARGV[0].to_i)

Updated by luke-gru (Luke Gruber) about 1 month ago · Edited 1Actions #1 [ruby-core:124611]

I'm confused about what should happen. Shouldn't it return roughly after 1 second instead of 10 seconds? I'll look into the sleep issue with RUBY_MN_THREADS=1, but I can't reproduce the deadlock or segfault with a more recent commit (ad6b85450d).

Edit: It's this line: https://github.com/ruby/ruby/blob/master/thread_pthread.c#L2953. If the deadline for a timeout is less than 1ms, we wake the thread anyway. It's probably too relaxed, maybe we should change it to a few microseconds or remove it altogether.

Updated by khasinski (Chris Hasiński) 28 days ago 1Actions #2

  • Status changed from Open to Closed

Applied in changeset git|5add7c3ea9a13e657fc7cba78b2633b9548a55aa.


Fix RUBY_MN_THREADS sleep returning prematurely (#15868)

timer_thread_check_exceed() was returning true when the remaining time
was less than 1ms, treating it as "too short time". This caused
sub-millisecond sleeps (like sleep(0.0001)) to return immediately
instead of actually sleeping.

The fix removes this optimization that was incorrectly short-circuiting
short sleep durations. Now the timeout is only considered exceeded when
the actual deadline has passed.

Note: There's still a separate performance issue where MN_THREADS mode
is slower for sub-millisecond sleeps due to the timer thread using
millisecond-resolution polling. This will require a separate fix to
use sub-millisecond timeouts in kqueue/epoll.

[Bug #21836]

Updated by jpl-coconut (Jacob Lacouture) 28 days ago Actions #3 [ruby-core:124630]

To close the loop here, yes, I was wrong: 1sec should be expected, not 10.

Without MN_THREADS, the 0.1msec sleep becomes a 1msec sleep. With MN_THREADS, the 0.1msec sleep is skipped completely. I see the second of theses issues is now fixed.

Thanks for the discussion and fix!

Actions

Also available in: PDF Atom