Bug #17527
closedrb_io_wait_readable/writable with scheduler don't check errno
Description
Problem¶
Playing with the new Fiber Scheduler, I noticed that TCPServer#accept
would hung forever after closing the server from another Fiber. I expected it to be resumed and fail with IOError, as it happens with threads.
Analysis¶
What happens is that the accept4
call in rsock_s_accept
fails and sets errno to Errno::EBADF
, it then checks a few memory/limit related errnos, then calls rb_io_wait_readable
expecting it to handle the current errno for IO errors. But when a scheduler is set, it immediately delegates to Scheduler#io_wait
and doesn't check the current errno! In my case (nio4r), the io_wait
hook returns a ready state, which causes rsock_s_accept
to loop forever.
I tried to manually check in the io_wait
hook whether the IO is closed, but the fd is never updated (AFAIK never set to -1) so io.closed?
is always false. I'm not sure schedulers should check whether the fd is closed, thought.
Proposed solution¶
A solution is to follow what happens for threads, and only check the scheduler when errno is EAGAIN or EWOULDBLOCK. I believe it's the only errors where we're expected to wait. This change also means that EINTR will be handled, too, and other errnos to raise an exception.
Instead of raising IOError.new("closed stream")
as it happens for threads, it raises Errno::EBADF
when a Scheduler is set. I suppose in the thread branches, it updates the IO at some point and calls rb_io_check_closed
with the updated fd —maybe with GetOpenFile
(RB_IO_POINTER
) — and we ought to do the same at some point?
Another solution it to not delegate to the scheduler inside rb_io_wait_readable
because it will eventually call rb_wait_for_single_fd
that will check for the scheduler, but we can avoid some function calls, as well as thread-related debug information that could be confusing. It also won't raise help to raise IOError.
I'm attaching a patch that implements the first solution. It fixes both rb_io_wait_readable
and rb_io_wait_writable
since the latter may exhibit the same kind of issue in another scenario. This is speculative, I didn't hit one, yet.
Files