Bug #17527

rb_io_wait_readable/writable with scheduler don't check errno

Added by ysbaddaden (Julien Portalier) about 2 months ago. Updated about 1 month ago.

Target version:
ruby -v:
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-linux]



Playing with the new Fiber Scheduler, I noticed that TCPServer#accept would hung forever after closing the server from another Fiber. I expected it to be resumed and fail with IOError, as it happens with threads.


What happens is that the accept4 call in rsock_s_accept fails and sets errno to Errno::EBADF, it then checks a few memory/limit related errnos, then calls rb_io_wait_readable expecting it to handle the current errno for IO errors. But when a scheduler is set, it immediately delegates to Scheduler#io_wait and doesn't check the current errno! In my case (nio4r), the io_wait hook returns a ready state, which causes rsock_s_accept to loop forever.

I tried to manually check in the io_wait hook whether the IO is closed, but the fd is never updated (AFAIK never set to -1) so io.closed? is always false. I'm not sure schedulers should check whether the fd is closed, thought.

Proposed solution

A solution is to follow what happens for threads, and only check the scheduler when errno is EAGAIN or EWOULDBLOCK. I believe it's the only errors where we're expected to wait. This change also means that EINTR will be handled, too, and other errnos to raise an exception.

Instead of raising"closed stream") as it happens for threads, it raises Errno::EBADF when a Scheduler is set. I suppose in the thread branches, it updates the IO at some point and calls rb_io_check_closed with the updated fd —maybe with GetOpenFile (RB_IO_POINTER) — and we ought to do the same at some point?

Another solution it to not delegate to the scheduler inside rb_io_wait_readable because it will eventually call rb_wait_for_single_fd that will check for the scheduler, but we can avoid some function calls, as well as thread-related debug information that could be confusing. It also won't raise help to raise IOError.

I'm attaching a patch that implements the first solution. It fixes both rb_io_wait_readable and rb_io_wait_writable since the latter may exhibit the same kind of issue in another scenario. This is speculative, I didn't hit one, yet.



Updated by nobu (Nobuyoshi Nakada) about 2 months ago

  • Backport changed from 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN to 2.5: DONTNEED, 2.6: DONTNEED, 2.7: DONTNEED, 3.0: UNKNOWN

Updated by nobu (Nobuyoshi Nakada) about 2 months ago

Yet another solution is to check if in schedulers?

Updated by ioquatix (Samuel Williams) about 1 month ago

I checked the PR, I understand, thanks for the clear bug report.

I will review it in more detail, but it seems like a reasonable approach.

Updated by ioquatix (Samuel Williams) about 1 month ago

  • Assignee set to ioquatix (Samuel Williams)

Also available in: Atom PDF