Project

General

Profile

Actions

Bug #20076

open

M:N scheduler is stuck on macOS with RUBY_MN_THREADS=1

Added by hsbt (Hiroshi SHIBATA) 2 months ago. Updated 2 months ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:115831]

Description

This is known issue. I already shared this to ko1.

The version of https://github.com/ruby/ruby/commit/28e3886689c71b22487dd5d0cb62f3b5ed0a77cc is crashed with make exam.

This is happend with webrick test on make test-tool.

My environment is macOS Sonoma 14.3 beta1 and

$ pkgutil --pkg-info=com.apple.pkg.CLTools_Executables
package-id: com.apple.pkg.CLTools_Executables
version: 15.1.0.0.1.1700200546
volume: /
location: /
install-time: 1702331495

Updated by jpcamara (JP Camara) 2 months ago

@hsbt (Hiroshi SHIBATA) is it crashing, or hanging? For me, I am seeing the following tests hang:

tool/test/webrick/test_server.rb#test_restart_after_stop
tool/test/webrick/test_server.rb#test_port_numbers

If I comment those out, make exam and make test-tool succeed. Is that your experience as well?

Updated by jpcamara (JP Camara) 2 months ago

Originally I thought it was the kqueue MN PR, but it may have been the changes applied before it that are causing this. If I go back to the commit before kqueue was merged this issue is still occurring (using the original epoll code). If I go back here https://github.com/ruby/ruby/commit/28a6e4ea9d9379a654a8f7c4b37fa33aa3ccd0b7, right before the rb_thread_io_blocking_call commits, it starts working again. Maybe this is something you both already realized.

Updated by jpcamara (JP Camara) 2 months ago

Confirmed that if I change all of the rb_thread_io_blocking_call calls to hard-code 0 as the last argument (instead of RB_WAITFD_IN or RB_WAITFD_OUT) then make test-tool works again with macOS for me, using Sonoma 14.1.

Using docker + ubuntu to test it there, once I make those values 0 I start getting segfaults when running make test-tool.

Actions #4

Updated by hsbt (Hiroshi SHIBATA) 2 months ago

  • Subject changed from M:N scheduler crashes on macOS with RUBY_MN_THREADS=1 to M:N scheduler is stuck on macOS with RUBY_MN_THREADS=1

Updated by hsbt (Hiroshi SHIBATA) 2 months ago

is it crashing, or hanging?

Thanks, It's hanging (Stuck?).

Updated by jpcamara (JP Camara) 2 months ago

A one-line change fixes it for me, and fixes almost every failure I was seeing when running test-all using RUBY_MN_THREADS=1. Here is the change: https://github.com/ruby/ruby/pull/9344/files

static ssize_t
rb_io_read_memory(rb_io_t *fptr, void *buf, size_t count)
{
    VALUE scheduler = rb_fiber_scheduler_current();
    if (scheduler != Qnil) {
        VALUE result = rb_fiber_scheduler_io_read_memory(scheduler, fptr->self, buf, count, 0);

        if (!UNDEF_P(result)) {
            return rb_fiber_scheduler_io_result_apply(result);
        }
    }

    struct io_internal_read_struct iis = {
        .th = rb_thread_current(),
        .fptr = fptr,
        .nonblock = 0,
        .fd = fptr->fd,

        .buf = buf,
        .capa = count,
        .timeout = NULL,
    };

    struct timeval timeout_storage;

    if (fptr->timeout != Qnil) {
        timeout_storage = rb_time_interval(fptr->timeout);
        iis.timeout = &timeout_storage;
    }

    // previous line 
    // VVVVVV
    // return (ssize_t)rb_thread_io_blocking_call(internal_read_func, &iis, fptr->fd, RB_WAITFD_IN);
    return (ssize_t)rb_thread_io_blocking_call(internal_read_func, &iis, fptr->fd, RB_WAITFD_IN | RB_WAITFD_OUT);
}

There's almost no way that it actually fixes the core issue. It has to be that it causes a fallback to some native 1:1 thread behavior. But it is extremely curious that this single change to rb_io_read_memory has such a far-reaching effect.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like1Like0