Project

General

Profile

Feature #13821

Allow fibers to be resumed across threads

Added by cremes (Chuck Remes) 3 months ago. Updated about 2 months ago.

Status:
Assigned
Priority:
Normal
Target version:
-
[ruby-core:82402]

Description

Given a Fiber created in ThreadA, Ruby 2.4.1 (and earlier releases) raise a FiberError if the fiber is resumed in ThreadB or any other thread other than the one that created the original Fiber.

Sample code attached to demonstrate problem.

If Fibers are truly encapsulating all of the data for the continuation, we should be allowed to move them between Threads and resume their operation.

Why?

One use-case is to support the async-await asynchronous programming model. In that model, a method marked async runs synchronously until the #await method is encountered. At that point the method is suspended and control is returned to the caller. When the #await method completes (asynchronously) then it may resume the suspended method and continue. The only way to capture this program state, suspend and resume, is via a Fiber.

example:

class Wait
  include AsyncAwait

  def dofirst
    async do
      puts 'Synchronously print dofirst.'
      result = await { dosecond }
      puts 'dosecond is complete'
      result
    end
  end

  def dosecond
    async do
      puts 'Synchronously print dosecond from async task.'
      slept = await { sleep 3 }
      puts 'Sleep complete'
      slept
    end
  end

  def run
    task = dofirst
    puts 'Received task'
    p AsyncAwait::Task.await(task)
  end
end

Wait.new.run
# Expected output:
# Synchronous print dofirst.
# Received task
# Synchronously print dosecond from async task.
# Sleep complete
# dosecond is complete
# 3

Right now the best way to accomplish suspension of the #dofirst and #dosecond commands and allow them to run asynchronously is by passing those blocks to another thread (other than the callers thread) so they can be encapsulated in a new Fiber and then yielded. When it's time to resume after #await completes, that other thread must lookup the fiber and resume it. This is lots of extra code and logic to make sure that fibers are only resumed on the threads that created them. Allowing Fibers to migrate between threads would eliminate this problem.

fiber_across_threads.rb (377 Bytes) fiber_across_threads.rb reproduction code cremes (Chuck Remes), 08/16/2017 05:01 PM
wait.rb (728 Bytes) wait.rb use-case cremes (Chuck Remes), 08/16/2017 05:13 PM

History

#1 [ruby-core:82659] Updated by kernigh (George Koehler) 3 months ago

Fibers still can't move across threads in

ruby 2.5.0dev (2017-09-04 trunk 59742) [x86_64-openbsd6.1]

Because of this, I can't take an Enumerator across threads:

count = 1.step
puts count.next    #=> 1
puts count.next    #=> 2
Thread.new {
  puts count.next  #=> FiberError
}.join

If Ruby would allow fibers to cross threads, then it might be possible with only some platforms. I find that Ruby (in cont.c) has three different ways for fibers.

  1. It uses CreateFiber/SwitchToFiber in Microsoft Windows.
  2. It uses makecontext/swapcontext in some POSIX systems (but not NetBSD, Solaris, Hurd).
  3. It uses continuations in all other platforms.

Each fiber needs its own stack for C code. With continuations, each fiber continues on the stack of its thread. When Ruby switches fibers, it copies their stacks to and from the thread stack. C code can make pointers to the stack, so the address of the stack can never change. With continuations, if Ruby resumes a fiber on a different thread, then it would copy the fiber stack to a different thread stack, the address would change, and C code would crash. Therefore, fibers can't cross threads in platforms using continuations.

I don't know whether fibers can cross threads in platforms using CreateFiber or makecontext. I also don't know whether Ruby can garbage-collect a thread that created fibers that crossed to other threads.

#2 [ruby-core:82758] Updated by Eregon (Benoit Daloze) 2 months ago

I think this is first of all a problem for semantics.

If we allow fibers to be resumed on another Thread, we allow multiple fibers originally from the same thread to execute concurrently
(so they no longer see the effects of each other perfectly but are exposed to race conditions like threads).

It also means before and after Fiber.yield, the value of Thread.current can change if the Fiber is resumed on another Thread.
This in turns breaks Fiber-locals with the current Thread.current[key] API.

It's also problematic for locks and other resources which are per-thread (some of them native so they cannot be tricked to use the initial Thread of the Fiber):

Fiber.new { shared = Object.new; lock.synchronize { shared.a += 1; Fiber.yield; shared.b -= 1 } }

The unlock operation will fail because it's on a different thread than the lock operation if the fiber is resumed on another thread.

#3 [ruby-core:82761] Updated by cremes (Chuck Remes) 2 months ago

I understand how this request could allow for race conditions between Fibers. Right now we are relying on the fact that they can only run on a single thread to enforce this particular semantic. I also agree that this is useful in certain situations.

But it is still quite useful to allow for Fibers to migrate between threads. Perhaps we could allow for both possibilities with a minor change to the Fiber API.

class Fiber
  def initialize(migrate: false)
    ...
  end

  def [](index)
    ...
  end

  def []=(index, value)
    ...
  end
end

By default we would retain the existing behavior where the Fiber is "locked" to its originating Thread. But if you call Fiber.new(migrate: true) then the Fiber is free to float among multiple threads. When doing so, the programmer is explicitly agreeing to no longer rely upon the original semantics. If they yield a fiber inside of a synchronized section then they understand it will likely break if resumed on another thread. Likewise, they do not rely upon Thread#[] and related methods to set/get fiber locals.

That Thread API for fiber locals is broken anyway... the #[] and #[]= methods on Thread should set/get thread locals as they did originally. There should be Fiber#[] and Fiber#[]= methods on the Fiber class. Conflating the two separate concepts all into the Thread class is no good. With Ruby 3 on the way this is the perfect time to fix problems like that. I'll open a separate ticket to suggest that as an improvement to the Thread and Fiber classes.

#4 [ruby-core:82763] Updated by cremes (Chuck Remes) 2 months ago

Added ticket 13893 (https://bugs.ruby-lang.org/issues/13893) to track a feature request to cleanup fiber-local and thread-local handling in the Fiber and Thread classes.

#5 [ruby-core:82765] Updated by Eregon (Benoit Daloze) 2 months ago

cremes (Chuck Remes) wrote:

By default we would retain the existing behavior where the Fiber is "locked" to its originating Thread. But if you call Fiber.new(migrate: true) then the Fiber is free to float among multiple threads. When doing so, the programmer is explicitly agreeing to no longer rely upon the original semantics. If they yield a fiber inside of a synchronized section then they understand it will likely break if resumed on another thread. Likewise, they do not rely upon Thread#[] and related methods to set/get fiber locals.

This would give up on using these fibers with any library using Mutex, Thread.current and similar Thread primitives if any of these is used across a Fiber.yield.
It's a considerable cost for reusing code.

Could you share the code you have for the implementation of AsyncAwait?

#6 [ruby-core:82814] Updated by cremes (Chuck Remes) 2 months ago

Yes, the Fiber.new(migrate: true) would mean the programmer is taking responsibility for NOT wrapping that Fiber up in mutexes or relying on the default behavior. I think this is reasonable.

As for the async/await code I've written, it hasn't been published yet. I can shoot you a tarball if you want to look at it but it's still alpha quality (no tests). I'll ping you on the TruffleRuby project to get your email.

#7 [ruby-core:82818] Updated by cremes (Chuck Remes) 2 months ago

I took a look at the C++ Boost library boost::fiber documentation. It allows fibers to be detached/attached between threads. Perhaps an explicit API like this is a better approach? See here: http://www.boost.org/doc/libs/1_62_0/libs/fiber/doc/html/fiber/migration.html

This puts the responsibility onto the programmer to Fiber#detach from its current thread and Fiber#attach(thread) to a new thread. The limitation is that a Fiber cannot be moved if it is blocked or if it is currently running.

By making the detach/attach explicit, then the programmer is assuming 100% responsibility to make sure the fiber hasn't yielded while holding locks or other operations that assume the Fiber is locked to a thread.

#8 [ruby-core:82994] Updated by shyouhei (Shyouhei Urabe) about 2 months ago

  • Assignee set to ko1 (Koichi Sasada)
  • Status changed from Open to Assigned

In the today's developer meeting Ko1 said that migrating fibers across threads is currently not possible. I think he would like to explain why, so let me assign this issue to him.

Also available in: Atom PDF