Bug #21166
closedFiber Scheduler is unable to be interrupted by `IO#close`.
Description
Background¶
Ruby's IO#close can cause IO#read, IO#write, IO#wait, IO#wait_readable and IO#wait_writable to be interrupted with an IOError: stream closed in another thread. For reference, IO#select cannot be interrupted in this way.
r, w = IO.pipe
thread = Thread.new do
  r.read(1)
end
Thread.pass until thread.status == "sleep"
r.close
thread.join
# ./test.rb:6:in 'IO#read': stream closed in another thread (IOError)
Problem¶
The fiber scheduler provides hooks for io_read, io_write and io_wait which are used by IO#read, IO#write, IO#wait, IO#wait_readable and IO#wait_writable, but those hooks are not interrupted when IO#close is invoked. That is because rb_notify_fd_close is not scheduler aware, and the fiber scheduler is unable to register itself into the "waiting file descriptor" list.
#!/usr/bin/env ruby
require 'async'
r, w = IO.pipe
thread = Thread.new do
  Async do
    r.wait_readable
  end
end
Thread.pass until thread.status == "sleep"
r.close
thread.join
In this test program, rb_notify_fd_close will incorrectly terminate the entire fiber scheduler thread:
#<Thread:0x00007faa5b161bf8 /home/samuel/Developer/socketry/io-event/test.rb:7 run> terminated with exception (report_on_exception is true):
/home/samuel/Developer/socketry/io-event/lib/io/event/selector/select.rb:470:in 'IO.select': closed stream (IOError)
  from /home/samuel/Developer/socketry/io-event/lib/io/event/selector/select.rb:470:in 'block in IO::Event::Selector::Select#select'
  from /home/samuel/Developer/socketry/io-event/lib/io/event/selector/select.rb:468:in 'Thread.handle_interrupt'
  from /home/samuel/Developer/socketry/io-event/lib/io/event/selector/select.rb:468:in 'IO::Event::Selector::Select#select'
  from /home/samuel/.gem/ruby/3.4.1/gems/async-2.23.0/lib/async/scheduler.rb:396:in 'Async::Scheduler#run_once!'
...
Solution¶
We need a mechanism to ensure fibers are treated the same as threads, and interrupted correctly. We do this by:
- Introducing VALUE rb_thread_io_interruptible_operation(VALUE self, VALUE(*function)(VALUE), VALUE argument)which allows us to execute a callback that may be interrupted. Internally, this registers the current execution context into the existingwaiting_fdslist.
- We update all the relevant fiber scheduler hooks to use rb_thread_io_interruptible_operation, e.g.io_wait,io_read,io_writeand so on.
- We introduce VALUE rb_fiber_scheduler_fiber_interrupt(VALUE scheduler, VALUE fiber, VALUE exception)which can be used to interrupt a fiber, e.g. with an IOError exception.
- 
rb_notify_fd_closeis modified to correctly interrupt fibers using the new rb_fiber_scheduler_fiber_interrupt` function.
See https://github.com/ruby/ruby/pull/12839 for the proposed implementation.
        
           Updated by ioquatix (Samuel Williams) 8 months ago
          Updated by ioquatix (Samuel Williams) 8 months ago
          
          
        
        
      
      - Description updated (diff)
        
           Updated by luke-gru (Luke Gruber) 8 months ago
          Updated by luke-gru (Luke Gruber) 8 months ago
          
          
        
        
      
      I'm not knowledgeable when it comes to the fiber scheduler, but why add this new method io.interruptible_operation when you can accomplish the same thing by allowing the io operation to be interrupted by a close in every case like the regular thread scheduler does? Are there cases where you don't want the IO operation to be interrupted be a close?
        
           Updated by ioquatix (Samuel Williams) 8 months ago
          Updated by ioquatix (Samuel Williams) 8 months ago
          
          
        
        
      
      when you can accomplish the same thing by allowing the io operation to be interrupted by a close in every case like the regular thread scheduler does
Because the point at which IO operations become interruptible is not the entire operation, but usually the point at where the fiber yields back to the event loop, e.g. https://github.com/socketry/io-event/pull/130/files#diff-3e7a68b220a9360ead1d2e7a5ec23e7d36de591eab138721efdc61b565fc5194R552 - adding interrupts around the entire operation is unlikely to be safe nor desirable.
Maybe you can propose in more detail how your suggestion would work. I suppose you are suggesting to wrap the scheduler code in rb_fiber_scheduler_io_wait and so on?
        
           Updated by ioquatix (Samuel Williams) 8 months ago
          
          · Edited
          Updated by ioquatix (Samuel Williams) 8 months ago
          
          · Edited
        
        
      
      The fiber scheduler hook rb_fiber_scheduler_io_wait could be implemented like this:
static VALUE
fiber_scheduler_io_wait(VALUE _argument) {
    VALUE *arguments = (VALUE*)_argument;
    return rb_funcallv(arguments[0], id_io_wait, 3, arguments + 1);
}
VALUE
rb_fiber_scheduler_io_wait(VALUE scheduler, VALUE io, VALUE events, VALUE timeout)
{
    VALUE arguments[] = {
        scheduler, io, events, timeout
    };
    return rb_io_interruptible_operation(io, fiber_scheduler_io_wait, (VALUE)&arguments);
}
I'll consider how to modify io_read and io_write hooks.
        
           Updated by luke-gru (Luke Gruber) 8 months ago
          Updated by luke-gru (Luke Gruber) 8 months ago
          
          
        
        
      
      Yeah, this makes more sense to me. The other way introduces a method on all IO objects that's only useful in the fiber scheduler context, and I could see users forgetting
to call it. I think having it be the default is the more sensible approach, and it avoids an API.
        
           Updated by ioquatix (Samuel Williams) 8 months ago
          Updated by ioquatix (Samuel Williams) 8 months ago
          
          
        
        
      
      - Description updated (diff)
- Assignee set to ioquatix (Samuel Williams)
        
           Updated by ioquatix (Samuel Williams) 8 months ago
          Updated by ioquatix (Samuel Williams) 8 months ago
          
          
        
        
      
      - Description updated (diff)
        
           Updated by ioquatix (Samuel Williams) 8 months ago
          Updated by ioquatix (Samuel Williams) 8 months ago
          
          
        
        
      
      I have updated the proposal based on the discussion.
        
           Updated by ko1 (Koichi Sasada) 8 months ago
          
          · Edited
          Updated by ko1 (Koichi Sasada) 8 months ago
          
          · Edited
        
        
      
      could you make clear
- 
rb_thread_io_interruptible_operationwhy notrb_ioprefix? is it public c-api?
- who/when/how unregister the hooks?
- could you clear how fiber scheduler use it? no io_closeevent (callback)?
- which context the callback function is called?
        
           Updated by ioquatix (Samuel Williams) 8 months ago
          Updated by ioquatix (Samuel Williams) 8 months ago
          
          
        
        
      
      rb_thread_io_interruptible_operation why not rb_io prefix? is it public c-api?
It's not public API, and we can change the name. It is defined in thread.c and thread.h since that is where waiting_fd is defined/used. waiting_fd is not exposed outside of thread.c.
who/when/how unregister the hooks?
rb_thread_io_interruptible_operation uses a callback, so before entry to the callback, the operation is registered in waiting_fds, and on exit from the callback, it is removed.
- Entry: https://github.com/ruby/ruby/pull/12839/files#diff-161b2a279f4c67a1ab075a7890ecf6f3f1d483d910910fa52b3715e25cfdcbd7R1753
- Exit: https://github.com/ruby/ruby/pull/12839/files#diff-161b2a279f4c67a1ab075a7890ecf6f3f1d483d910910fa52b3715e25cfdcbd7R1740
could you clear how fiber scheduler use it? no io_close event (callback)?
The fiber scheduler implementation does not need to be changed, since we added this to the scheduler.c implementation.
which context the callback function is called?
It's used to wrap the execution of io_read/io_write and io_wait scheduler hooks: https://github.com/ruby/ruby/pull/12839/files#diff-29a83910395dea702ac0c02b4cd9976ba252ff50f133d7cd8109beffbd2eab1d
        
           Updated by ioquatix (Samuel Williams) 7 months ago
          Updated by ioquatix (Samuel Williams) 7 months ago
          
          
        
        
      
      After discussing this PR with @ko1 (Koichi Sasada):
- waiting_fds -> ractor local
- document fiber_interrupt can be called in another thread
- 
rb_thread_io_interruptible_operation(func)->
 rb_thread_io_with_waiting_file_descriptor(func)
 (Internal function, not public API)
- Update the PR for per-IO waiting_fd (separate follow up).
        
           Updated by hsbt (Hiroshi SHIBATA) 7 months ago
          Updated by hsbt (Hiroshi SHIBATA) 7 months ago
          
          
        
        
      
      - Status changed from Open to Assigned
        
           Updated by ioquatix (Samuel Williams) 5 months ago
          Updated by ioquatix (Samuel Williams) 5 months ago
          
          
        
        
      
      https://github.com/ruby/ruby/pull/13127 has been merged, allowing us to move forward with this feature.
        
           Updated by ioquatix (Samuel Williams) 5 months ago
          Updated by ioquatix (Samuel Williams) 5 months ago
          
          
        
        
      
      I've renamed rb_thread_io_interruptible_operation to rb_thread_io_blocking_operation which better suits the naming convention internally.
        
           Updated by ioquatix (Samuel Williams) 5 months ago
          Updated by ioquatix (Samuel Williams) 5 months ago
          
          
        
        
      
      - Status changed from Assigned to Closed
Merged in 73c9d6ccaa2045a011ed991dc29633bd0443971a