Bug #20076
closedM:N scheduler is stuck on macOS with RUBY_MN_THREADS=1
Description
This is known issue. I already shared this to ko1.
The version of https://github.com/ruby/ruby/commit/28e3886689c71b22487dd5d0cb62f3b5ed0a77cc is crashed with make exam
.
This is happend with webrick test on make test-tool
.
My environment is macOS Sonoma 14.3 beta1 and
$ pkgutil --pkg-info=com.apple.pkg.CLTools_Executables
package-id: com.apple.pkg.CLTools_Executables
version: 15.1.0.0.1.1700200546
volume: /
location: /
install-time: 1702331495
Updated by jpcamara (JP Camara) about 1 year ago
@hsbt (Hiroshi SHIBATA) is it crashing, or hanging? For me, I am seeing the following tests hang:
tool/test/webrick/test_server.rb#test_restart_after_stop
tool/test/webrick/test_server.rb#test_port_numbers
If I comment those out, make exam
and make test-tool
succeed. Is that your experience as well?
Updated by jpcamara (JP Camara) about 1 year ago
Originally I thought it was the kqueue MN PR, but it may have been the changes applied before it that are causing this. If I go back to the commit before kqueue was merged this issue is still occurring (using the original epoll code). If I go back here https://github.com/ruby/ruby/commit/28a6e4ea9d9379a654a8f7c4b37fa33aa3ccd0b7, right before the rb_thread_io_blocking_call
commits, it starts working again. Maybe this is something you both already realized.
Updated by jpcamara (JP Camara) about 1 year ago
Confirmed that if I change all of the rb_thread_io_blocking_call
calls to hard-code 0 as the last argument (instead of RB_WAITFD_IN
or RB_WAITFD_OUT
) then make test-tool
works again with macOS for me, using Sonoma 14.1.
Using docker + ubuntu to test it there, once I make those values 0 I start getting segfaults when running make test-tool
.
Updated by hsbt (Hiroshi SHIBATA) about 1 year ago
- Subject changed from M:N scheduler crashes on macOS with RUBY_MN_THREADS=1 to M:N scheduler is stuck on macOS with RUBY_MN_THREADS=1
Updated by hsbt (Hiroshi SHIBATA) about 1 year ago
is it crashing, or hanging?
Thanks, It's hanging (Stuck?).
Updated by jpcamara (JP Camara) about 1 year ago
A one-line change fixes it for me, and fixes almost every failure I was seeing when running test-all
using RUBY_MN_THREADS=1
. Here is the change: https://github.com/ruby/ruby/pull/9344/files
static ssize_t
rb_io_read_memory(rb_io_t *fptr, void *buf, size_t count)
{
VALUE scheduler = rb_fiber_scheduler_current();
if (scheduler != Qnil) {
VALUE result = rb_fiber_scheduler_io_read_memory(scheduler, fptr->self, buf, count, 0);
if (!UNDEF_P(result)) {
return rb_fiber_scheduler_io_result_apply(result);
}
}
struct io_internal_read_struct iis = {
.th = rb_thread_current(),
.fptr = fptr,
.nonblock = 0,
.fd = fptr->fd,
.buf = buf,
.capa = count,
.timeout = NULL,
};
struct timeval timeout_storage;
if (fptr->timeout != Qnil) {
timeout_storage = rb_time_interval(fptr->timeout);
iis.timeout = &timeout_storage;
}
// previous line
// VVVVVV
// return (ssize_t)rb_thread_io_blocking_call(internal_read_func, &iis, fptr->fd, RB_WAITFD_IN);
return (ssize_t)rb_thread_io_blocking_call(internal_read_func, &iis, fptr->fd, RB_WAITFD_IN | RB_WAITFD_OUT);
}
There's almost no way that it actually fixes the core issue. It has to be that it causes a fallback to some native 1:1 thread behavior. But it is extremely curious that this single change to rb_io_read_memory
has such a far-reaching effect.
Updated by hsbt (Hiroshi SHIBATA) 9 months ago
- Status changed from Open to Closed
This issue is no longer happened while 3 months. I'll close this.