Bug #20076


M:N scheduler is stuck on macOS with RUBY_MN_THREADS=1

Added by hsbt (Hiroshi SHIBATA) 7 months ago. Updated 4 months ago.

Target version:


This is known issue. I already shared this to ko1.

The version of is crashed with make exam.

This is happend with webrick test on make test-tool.

My environment is macOS Sonoma 14.3 beta1 and

$ pkgutil
volume: /
location: /
install-time: 1702331495

Updated by jpcamara (JP Camara) 7 months ago

@hsbt (Hiroshi SHIBATA) is it crashing, or hanging? For me, I am seeing the following tests hang:


If I comment those out, make exam and make test-tool succeed. Is that your experience as well?

Updated by jpcamara (JP Camara) 7 months ago

Originally I thought it was the kqueue MN PR, but it may have been the changes applied before it that are causing this. If I go back to the commit before kqueue was merged this issue is still occurring (using the original epoll code). If I go back here, right before the rb_thread_io_blocking_call commits, it starts working again. Maybe this is something you both already realized.

Updated by jpcamara (JP Camara) 7 months ago

Confirmed that if I change all of the rb_thread_io_blocking_call calls to hard-code 0 as the last argument (instead of RB_WAITFD_IN or RB_WAITFD_OUT) then make test-tool works again with macOS for me, using Sonoma 14.1.

Using docker + ubuntu to test it there, once I make those values 0 I start getting segfaults when running make test-tool.

Actions #4

Updated by hsbt (Hiroshi SHIBATA) 7 months ago

  • Subject changed from M:N scheduler crashes on macOS with RUBY_MN_THREADS=1 to M:N scheduler is stuck on macOS with RUBY_MN_THREADS=1

Updated by hsbt (Hiroshi SHIBATA) 7 months ago

is it crashing, or hanging?

Thanks, It's hanging (Stuck?).

Updated by jpcamara (JP Camara) 7 months ago

A one-line change fixes it for me, and fixes almost every failure I was seeing when running test-all using RUBY_MN_THREADS=1. Here is the change:

static ssize_t
rb_io_read_memory(rb_io_t *fptr, void *buf, size_t count)
    VALUE scheduler = rb_fiber_scheduler_current();
    if (scheduler != Qnil) {
        VALUE result = rb_fiber_scheduler_io_read_memory(scheduler, fptr->self, buf, count, 0);

        if (!UNDEF_P(result)) {
            return rb_fiber_scheduler_io_result_apply(result);

    struct io_internal_read_struct iis = {
        .th = rb_thread_current(),
        .fptr = fptr,
        .nonblock = 0,
        .fd = fptr->fd,

        .buf = buf,
        .capa = count,
        .timeout = NULL,

    struct timeval timeout_storage;

    if (fptr->timeout != Qnil) {
        timeout_storage = rb_time_interval(fptr->timeout);
        iis.timeout = &timeout_storage;

    // previous line 
    // VVVVVV
    // return (ssize_t)rb_thread_io_blocking_call(internal_read_func, &iis, fptr->fd, RB_WAITFD_IN);
    return (ssize_t)rb_thread_io_blocking_call(internal_read_func, &iis, fptr->fd, RB_WAITFD_IN | RB_WAITFD_OUT);

There's almost no way that it actually fixes the core issue. It has to be that it causes a fallback to some native 1:1 thread behavior. But it is extremely curious that this single change to rb_io_read_memory has such a far-reaching effect.

Updated by hsbt (Hiroshi SHIBATA) 4 months ago

  • Status changed from Open to Closed

This issue is no longer happened while 3 months. I'll close this.


Also available in: Atom PDF