Bug #17527
closedrb_io_wait_readable/writable with scheduler don't check errno
Description
Problem¶
Playing with the new Fiber Scheduler, I noticed that TCPServer#accept
would hung forever after closing the server from another Fiber. I expected it to be resumed and fail with IOError, as it happens with threads.
Analysis¶
What happens is that the accept4
call in rsock_s_accept
fails and sets errno to Errno::EBADF
, it then checks a few memory/limit related errnos, then calls rb_io_wait_readable
expecting it to handle the current errno for IO errors. But when a scheduler is set, it immediately delegates to Scheduler#io_wait
and doesn't check the current errno! In my case (nio4r), the io_wait
hook returns a ready state, which causes rsock_s_accept
to loop forever.
I tried to manually check in the io_wait
hook whether the IO is closed, but the fd is never updated (AFAIK never set to -1) so io.closed?
is always false. I'm not sure schedulers should check whether the fd is closed, thought.
Proposed solution¶
A solution is to follow what happens for threads, and only check the scheduler when errno is EAGAIN or EWOULDBLOCK. I believe it's the only errors where we're expected to wait. This change also means that EINTR will be handled, too, and other errnos to raise an exception.
Instead of raising IOError.new("closed stream")
as it happens for threads, it raises Errno::EBADF
when a Scheduler is set. I suppose in the thread branches, it updates the IO at some point and calls rb_io_check_closed
with the updated fd —maybe with GetOpenFile
(RB_IO_POINTER
) — and we ought to do the same at some point?
Another solution it to not delegate to the scheduler inside rb_io_wait_readable
because it will eventually call rb_wait_for_single_fd
that will check for the scheduler, but we can avoid some function calls, as well as thread-related debug information that could be confusing. It also won't raise help to raise IOError.
I'm attaching a patch that implements the first solution. It fixes both rb_io_wait_readable
and rb_io_wait_writable
since the latter may exhibit the same kind of issue in another scenario. This is speculative, I didn't hit one, yet.
Files
Updated by nobu (Nobuyoshi Nakada) almost 4 years ago
- Backport changed from 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN to 2.5: DONTNEED, 2.6: DONTNEED, 2.7: DONTNEED, 3.0: UNKNOWN
Updated by nobu (Nobuyoshi Nakada) almost 4 years ago
Yet another solution is to check if in schedulers?
Updated by ioquatix (Samuel Williams) almost 4 years ago
I checked the PR, I understand, thanks for the clear bug report.
I will review it in more detail, but it seems like a reasonable approach.
Updated by ioquatix (Samuel Williams) almost 4 years ago
- Assignee set to ioquatix (Samuel Williams)
Updated by ioquatix (Samuel Williams) over 3 years ago
I think I ran into this bug on the write code path. I'm just going to confirm it.
Updated by ioquatix (Samuel Williams) over 3 years ago
Updated by ioquatix (Samuel Williams) over 3 years ago
Okay, I confirmed the fix is required for async-io
to use the native IO methods.
samuel@Fukurou ~/D/s/async-io (master) [1]> chruby ruby-3
samuel@Fukurou ~/D/s/async-io (master)> bundle exec rspec ./spec/async/io/stream_spec.rb:81
Run options: include {:locations=>{"./spec/async/io/stream_spec.rb"=>[81]}}
Async::IO::Stream
socket I/O
#drain_write_buffer
10.02s error: Async::Task [oid=0x943e50] [pid=1398] [2021-03-30 20:00:10 +1300]
| Async::TimeoutError: run time exceeded duration 10s:
| #<Async::Reactor:0x943e28 2 children (running)>
| #<Async::Task:0x943e3c timer task duration=10 (running)>
| → /Users/samuel/Documents/socketry/async/lib/async/task.rb:91:in `backtrace'
| /Users/samuel/Documents/socketry/async/lib/async/task.rb:91:in `backtrace'
| /Users/samuel/Documents/socketry/async/lib/async/node.rb:322:in `print_backtrace'
| /Users/samuel/Documents/socketry/async/lib/async/node.rb:315:in `block in print_hierarchy'
| /Users/samuel/Documents/socketry/async/lib/async/node.rb:298:in `traverse'
| /Users/samuel/Documents/socketry/async/lib/async/node.rb:301:in `block in traverse'
| /Users/samuel/Documents/socketry/async/lib/async/node.rb:88:in `each'
| /Users/samuel/Documents/socketry/async/lib/async/node.rb:300:in `traverse'
| /Users/samuel/Documents/socketry/async/lib/async/node.rb:310:in `print_hierarchy'
| /Users/samuel/.gem/ruby/3.0.0/gems/async-rspec-1.15.1/lib/async/rspec/reactor.rb:48:in `block in run_in_reactor'
| /Users/samuel/Documents/socketry/async/lib/async/task.rb:265:in `block in make_fiber'
| #<Async::Task:0x943e50 RSpec::ExampleGroups::AsyncIOStream::SocketIO::DrainWriteBuffer (running)>
| → /Users/samuel/Documents/socketry/async/lib/async/task.rb:62:in `yield'
| /Users/samuel/Documents/socketry/async/lib/async/task.rb:62:in `yield'
| /Users/samuel/Documents/socketry/async/lib/async/condition.rb:40:in `wait'
| /Users/samuel/Documents/socketry/async/lib/async/task.rb:150:in `wait'
| /Users/samuel/.gem/ruby/3.0.0/gems/async-rspec-1.15.1/lib/async/rspec/reactor.rb:69:in `block in run_in_reactor'
| /Users/samuel/Documents/socketry/async/lib/async/task.rb:265:in `block in make_fiber'
| #<Async::Task:0x943e64 running example (running)>
| → /Users/samuel/Documents/socketry/async/lib/async/task.rb:62:in `yield'
| /Users/samuel/Documents/socketry/async/lib/async/task.rb:62:in `yield'
| /Users/samuel/Documents/socketry/async/lib/async/condition.rb:40:in `wait'
| /Users/samuel/Documents/socketry/async/lib/async/task.rb:150:in `wait'
| /Users/samuel/Documents/socketry/async-io/spec/async/io/stream_spec.rb:90:in `block (5 levels) in <top (required)>'
| /Users/samuel/.gem/ruby/3.0.0/gems/rspec-expectations-3.10.1/lib/rspec/matchers/built_in/raise_error.rb:59:in `matches?'
| /Users/samuel/.gem/ruby/3.0.0/gems/rspec-expectations-3.10.1/lib/rspec/expectations/handler.rb:51:in `block in handle_matcher'
| /Users/samuel/.gem/ruby/3.0.0/gems/rspec-expectations-3.10.1/lib/rspec/expectations/handler.rb:27:in `with_matcher'
| /Users/samuel/.gem/ruby/3.0.0/gems/rspec-expectations-3.10.1/lib/rspec/expectations/handler.rb:48:in `handle_matcher'
| /Users/samuel/.gem/ruby/3.0.0/gems/rspec-expectations-3.10.1/lib/rspec/expectations/expectation_target.rb:65:in `to'
| /Users/samuel/.gem/ruby/3.0.0/gems/rspec-expectations-3.10.1/lib/rspec/expectations/expectation_target.rb:101:in `to'
| /Users/samuel/Documents/socketry/async-io/spec/async/io/stream_spec.rb:89:in `block (4 levels) in <top (required)>'
| /Users/samuel/.gem/ruby/3.0.0/gems/rspec-core-3.10.1/lib/rspec/core/example.rb:262:in `instance_exec'
| /Users/samuel/.gem/ruby/3.0.0/gems/rspec-core-3.10.1/lib/rspec/core/example.rb:262:in `block in run'
| /Users/samuel/.gem/ruby/3.0.0/gems/rspec-core-3.10.1/lib/rspec/core/example.rb:508:in `block in with_around_and_singleton_context_hooks'
| /Users/samuel/.gem/ruby/3.0.0/gems/rspec-core-3.10.1/lib/rspec/core/example.rb:465:in `block in with_around_example_hooks'
| /Users/samuel/.gem/ruby/3.0.0/gems/rspec-core-3.10.1/lib/rspec/core/hooks.rb:486:in `block in run'
| /Users/samuel/.gem/ruby/3.0.0/gems/rspec-core-3.10.1/lib/rspec/core/hooks.rb:626:in `block in run_around_example_hooks_for'
| /Users/samuel/.gem/ruby/3.0.0/gems/rspec-core-3.10.1/lib/rspec/core/example.rb:350:in `call'
| /Users/samuel/.gem/ruby/3.0.0/gems/async-rspec-1.15.1/lib/async/rspec/reactor.rb:92:in `block (3 levels) in <module:RSpec>'
| /Users/samuel/.gem/ruby/3.0.0/gems/async-rspec-1.15.1/lib/async/rspec/reactor.rb:61:in `block (2 levels) in run_in_reactor'
| /Users/samuel/Documents/socketry/async/lib/async/task.rb:265:in `block in make_fiber'
| #<Async::Task:0x943e78 (running)>
| → /Users/samuel/Documents/socketry/async/lib/async/task.rb:62:in `yield'
| /Users/samuel/Documents/socketry/async/lib/async/task.rb:62:in `yield'
| /Users/samuel/Documents/socketry/async/lib/async/wrapper.rb:233:in `wait_for'
| /Users/samuel/Documents/socketry/async/lib/async/wrapper.rb:139:in `wait_writable'
| /Users/samuel/Documents/socketry/async/lib/async/scheduler.rb:61:in `io_wait'
| /Users/samuel/.rubies/ruby-3.0.0/lib/ruby/3.0.0/forwardable.rb:238:in `write'
| /Users/samuel/.rubies/ruby-3.0.0/lib/ruby/3.0.0/forwardable.rb:238:in `write'
| /Users/samuel/Documents/socketry/async-io/lib/async/io/stream.rb:160:in `block in flush'
| /Users/samuel/Documents/socketry/async/lib/async/semaphore.rb:80:in `acquire'
| /Users/samuel/Documents/socketry/async-io/lib/async/io/stream.rb:155:in `flush'
| /Users/samuel/Documents/socketry/async-io/spec/async/io/stream_spec.rb:86:in `block (5 levels) in <top (required)>'
| /Users/samuel/Documents/socketry/async/lib/async/task.rb:265:in `block in make_fiber'
| → /Users/samuel/.gem/ruby/3.0.0/gems/async-rspec-1.15.1/lib/async/rspec/reactor.rb:51 in `block in run_in_reactor'
| /Users/samuel/Documents/socketry/async/lib/async/task.rb:265 in `block in make_fiber'
handles write failures (FAILED - 1)
Finished in 10.05 seconds (files took 0.30495 seconds to load)
1 example, 1 failure
Failed examples:
rspec ./spec/async/io/stream_spec.rb:81 # Async::IO::Stream socket I/O #drain_write_buffer handles write failures
samuel@Fukurou ~/D/s/async-io (master) [1]> chruby ruby-head
samuel@Fukurou ~/D/s/async-io (master)> bundle exec rspec ./spec/async/io/stream_spec.rb:81
warning: parser/current is loading parser/ruby30, which recognizes
warning: 3.0.x-compliant syntax, but you are running 3.1.0.
warning: please see https://github.com/whitequark/parser#compatibility-with-ruby-mri.
Run options: include {:locations=>{"./spec/async/io/stream_spec.rb"=>[81]}}
Async::IO::Stream
socket I/O
#drain_write_buffer
rb_io_wait_writable(13) -> errno=32
0.0s error: Async::Task [oid=0x11e4] [pid=1457] [2021-03-30 20:01:06 +1300]
| Errno::EPIPE: Broken pipe
| → /Users/samuel/.rubies/ruby-head/lib/ruby/3.1.0/forwardable.rb:238 in `write'
| /Users/samuel/.rubies/ruby-head/lib/ruby/3.1.0/forwardable.rb:238 in `write'
| /Users/samuel/Documents/socketry/async-io/lib/async/io/stream.rb:160 in `block in flush'
| /Users/samuel/Documents/socketry/async/lib/async/semaphore.rb:80 in `acquire'
| /Users/samuel/Documents/socketry/async-io/lib/async/io/stream.rb:155 in `flush'
| /Users/samuel/Documents/socketry/async-io/spec/async/io/stream_spec.rb:86 in `block (5 levels) in <top (required)>'
| /Users/samuel/Documents/socketry/async/lib/async/task.rb:265 in `block in make_fiber'
handles write failures
Finished in 0.04441 seconds (files took 0.46472 seconds to load)
1 example, 0 failures
This happens when using a socket pair, explicitly closing one end, and trying to write to the other end.
Updated by ioquatix (Samuel Williams) over 3 years ago
I added a failing spec https://github.com/ruby/ruby/pull/4338/commits/e111d99ae1b58e1ffe73c167c039b2fc0728cb08 which passes with the proposed fix.
Updated by ioquatix (Samuel Williams) over 3 years ago
- Backport changed from 2.5: DONTNEED, 2.6: DONTNEED, 2.7: DONTNEED, 3.0: UNKNOWN to 2.5: DONTNEED, 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED
I would advise we should back port this to 3.0.1 if possible. The original patch attached to this issue is sufficient.
Updated by ioquatix (Samuel Williams) over 3 years ago
- Status changed from Open to Closed
Updated by ioquatix (Samuel Williams) over 3 years ago
@nagachika (Tomoyuki Chikanaga) can we please backport this for 3.0.2?
Updated by nagachika (Tomoyuki Chikanaga) over 3 years ago
Added https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/a9c5c2d614f30a616970245fef3e7ffc151e2ecf as a related commit.
I will backport the changesets.
Updated by nagachika (Tomoyuki Chikanaga) over 3 years ago
- Backport changed from 2.5: DONTNEED, 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED to 2.5: DONTNEED, 2.6: DONTNEED, 2.7: DONTNEED, 3.0: DONE
ruby_3_0 13f93ad16d3d1ecf96ece229cd4bc5ea294e1a71 merged revision(s) 611e711085c7e3984555a79626d025c8b876eced,a9c5c2d614f30a616970245fef3e7ffc151e2ecf.