Project

General

Profile

Feature #16786

Light-weight scheduler for improved concurrency.

Added by ioquatix (Samuel Williams) 8 months ago. Updated 15 days ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:97878]

Description

Abstract

We propose to introduce a light weight fiber scheduler, to improve the concurrency of Ruby code with minimal changes.

Background

We have been discussing and considering options to improve Ruby scalability for several years. More context can be provided by the following discussions:

The final Ruby Concurrency report provides some background on the various issues considered in the latest iteration: https://www.codeotaku.com/journal/2020-04/ruby-concurrency-final-report/index

Proposal

We propose to introduce the following concepts:

  • A Scheduler interface which provides hooks for user-supplied event loops.
  • Non-blocking Fiber which can invoke the scheduler when it would otherwise block.

Scheduler

The per-thread fiber scheduler interface is used to intercept blocking operations. A typical implementation would be a wrapper for a gem like EventMachine or Async. This design provides separation of concerns between the event loop implementation and application code. It also allows for layered schedulers which can perform instrumentation, enforce constraints (e.g. during testing) and provide additional logging. You can see a sample implementation here.

class Scheduler
  # Wait for the given file descriptor to become readable.
  def wait_readable(io)
  end

  # Wait for the given file descriptor to become writable.
  def wait_writable(io)
  end

  # Wait for the given file descriptor to match the specified events within
  # the specified timeout.
  # @param event [Integer] a bit mask of +IO::WAIT_READABLE+,
  #   `IO::WAIT_WRITABLE` and `IO::WAIT_PRIORITY`.
  # @param timeout [#to_f] the amount of time to wait for the event.
  def wait_any(io, events, timeout)
  end

  # Sleep the current task for the specified duration, or forever if not
  # specified.
  # @param duration [#to_f] the amount of time to sleep.
  def wait_sleep(duration = nil)
  end

  # The Ruby virtual machine is going to enter a system level blocking
  # operation.
  def enter_blocking_region
  end

  # The Ruby virtual machine has completed the system level blocking
  # operation.
  def exit_blocking_region
  end

  # Intercept the creation of a non-blocking fiber.
  def fiber(&block)
    Fiber.new(blocking: false, &block)
  end

  # Invoked when the thread exits.
  def run
    # Implement event loop here.
  end
end

A thread has a non-blocking fiber scheduler. All blocking operations on non-blocking fibers are hooked by the scheduler and the scheduler can switch to another fiber. If any mutex is acquired by a fiber, then a scheduler is not called; the same behaviour as blocking Fiber.

Schedulers can be written in Ruby. This is a desirable property as it allows them to be used in different implementations of Ruby easily.

To enable non-blocking fiber switching on blocking operations:

  • Specify a scheduler: Thread.current.scheduler = Scheduler.new.
  • Create several non-blocking fibers: Fiber.new(blocking:false) {...}.
  • As the main fiber exits, Thread.current.scheduler.run is invoked which begins executing the event loop until all fibers are finished.

Time/Duration Arguments

Tony Arcieri suggested against using floating point values for time/durations, because they can accumulate rounding errors and other issues. He has a wealth of experience in this area so his advice should be considered carefully. However, I have yet to see these issues happen in an event loop. That being said, round tripping between struct timeval and double/VALUE seems a bit inefficient. One option is to have an opaque argument that responds to to_f as well as potentially seconds and microseconds or some other such interface (could be opaque argument supported by IO.select for example).

File Descriptor Arguments

Because of the public C interface we may need to support a specific set of wrappers for CRuby.

int rb_io_wait_readable(int);
int rb_io_wait_writable(int);
int rb_wait_for_single_fd(int fd, int events, struct timeval *tv);

One option is to introduce hooks specific to CRuby:

class Scheduler
  # Wrapper for rb_io_wait_readable(int) C function.
  def wait_readable_fd(fd)
    wait_readable(::IO.from_fd(fd, autoclose: false))
  end

  # Wrapper for rb_io_wait_readable(int) C function.
  def wait_writable_fd(fd)
    wait_writable(::IO.from_fd(fd, autoclose: false))
  end

  # Wrapper for rb_wait_for_single_fd(int) C function.
  def wait_for_single_fd(fd, events, duration)
    wait_any(::IO.from_fd(fd, autoclose: false), events, duration)
  end
end

Alternatively, in CRuby, it may be possible to map from fd -> IO instance. Most C schedulers only care about file descriptor, so such a mapping will introduce a small performance penalty. In addition, most C level schedulers will not care about IO instance.

Non-blocking Fiber

We propose to introduce per-fiber flag blocking: true/false.

A fiber created by Fiber.new(blocking: true) (the default Fiber.new) becomes a "blocking Fiber" and has no changes from current Fiber implementation. This includes the root fiber.

A fiber created by Fiber.new(blocking: false) becomes a "non-blocking Fiber" and it will be scheduled by the per-thread scheduler when the blocking operations (blocking I/O, sleep, and so on) occurs.

Fiber.new(blocking: false) do
  puts Fiber.current.blocking? # false

  # May invoke `Thread.scheduler&.wait_readable`.
  io.read(...)

  # May invoke `Thread.scheduler&.wait_writable`.
  io.write(...)

  # Will invoke `Thread.scheduler&.wait_sleep`.
  sleep(n)
end.resume

Non-blocking fibers also supports Fiber#resume, Fiber#transfer and Fiber.yield which are necessary to create a scheduler.

Fiber Method

We also introduce a new method which simplifes the creation of these non-blocking fibers:

Fiber do
  puts Fiber.current.blocking? # false
end

This method invokes Scheduler#fiber(...). The purpose of this method is to allow the scheduler to internally decide the policy for when to start the fiber, and whether to use symmetric or asymmetric fibers.

If no scheduler is specified, it is a error: RuntimeError.new("No scheduler is available").

In the future we may expand this to support some kind of default scheduler.

Non-blocking I/O

IO#nonblock is an existing interface to control whether I/O uses blocking or non-blocking system calls. We can take advantage of this:

  • IO#nonblock = false prevents that particular IO from utilising the scheduler. This should be the default for stderr.
  • IO#nonblock = true enables that particular IO to utilise the scheduler. We should enable this where possible.

As proposed by Eric Wong, we believe that making I/O non-blocking by default is the right approach. We have expanded his work in the current implementation. By doing this, when the user writes Fiber do ... end they are guaranteed the best possible concurrency possible, without any further changes to code. As an example, one of the tests shows Net::HTTP.get being used in this way with no further modifications required.

To support this further, consider the counterpoint, that Net::HTTP.get(..., blocking: false) is required for concurrent requests. Library code may not expose the relevant options, sevearly limiting the user's ability to improve concurrency, even if that is what they desire.

Implementation

We have an evolving implementation here: https://github.com/ruby/ruby/pull/3032 which we will continue to update as the proposal changes.

Evaluation

This proposal provides the hooks for scheduling fibers. With regards to performance, there are several things to consider:

  • The impact of the scheduler design on non-concurrent workloads. We believe it's acceptable.
  • The impact of the scheduler design on concurrent workloads. Our results are promising.
  • The impact of different event loops on throughput and latency. We have independent tests which confirm the scalability of the approach.

We can control for the first two in this proposal, and depending on the design we may help or hinder the wrapper implementation.

In the tests, we provide a basic implementation using IO.select. As this proposal is finalised, we will introduce some basic benchmarks using this approach.

Discussion

The following points are good ones for discussion:

  • Handling of file descriptors vs IO instances.
  • Handling of time/duration arguments.
  • General design and naming conventions.
  • Potential platform issues (e.g. CRuby vs JRuby vs TruffleRuby, etc).

The following is planned to be described by Eregon (Benoit Daloze) in another design document:

  • Semantics of non-blocking mutex (e.g. Mutex.new(blocking: false) or some other approach).

In the future we hope to extend the scheduler to handle other blocking operations, including name resolution, file I/O (by io_uring) and others. We may need to introduce additional hooks. If these hooks are not defined on the scheduler implementation, we will revert back to the blocking implementation where possible.


Related issues

Related to Ruby master - Feature #16792: Make Mutex held per Fiber instead of per ThreadClosedActions
Related to Ruby master - Bug #16892: Reconsider the test directory name for schedulerClosedioquatix (Samuel Williams)Actions
Related to Ruby master - Feature #14736: Thread selector for flexible cooperative fiber based concurrencyClosedActions

Updated by shevegen (Robert A. Heiler) 8 months ago

One issue I see is that this adds another API (Scheduler) for people to have to
remember. They will have to know how/when to use Mutex, Thread, Fibers, perhaps
Guilds, and now Scheduler.

Is this really what we want to have in ruby? Aspects such as class Hash, String,
Array etc... are quite simple to use and understand. The whole parallelism part,
on the other hand, seems to spawn more and more complexity on its own.

Updated by ioquatix (Samuel Williams) 8 months ago

One issue I see is that this adds another API (Scheduler) for people to have to

In practice, users do not see this interface. If you check Async implementation, it's completely hidden from user, but allows Async to handle native Ruby I/O into it's own reactor/event loop (on scheduler branch using the proposed implementation here).

#3

Updated by ioquatix (Samuel Williams) 8 months ago

  • Description updated (diff)
#4

Updated by ioquatix (Samuel Williams) 8 months ago

  • Description updated (diff)

Updated by headius (Charles Nutter) 8 months ago

Notes from recent discussions about this on Slack:

Scheduler API should pass IO objects, not file descriptors

The current design calls for the Scheduler methods like wait_readable to pass only a numeric file descriptor as an argument. While this might model how the C code works (every IO boils down to a file descriptor), it does not match how Ruby code works. If the goal is that this API can be implemented from Ruby code, the implementation must receive IO objects. If it does not, all sorts of problems result:

  • There's nothing you can do with a raw file descriptor, so it would have to be passed back out to C or wrapped in a new IO object.
  • The new IO object would have to be cloexec: false or else it would end up closing the original fd when collected.
  • The new IO object would not reflect the original type of IO that created the fd, which means no calling File or Socket-specific APIs, and any subclassed behavior would be lost.

There are no other places in Ruby where you work with raw file descriptors. They occasionally leak out into Ruby via IO#fileno and IO.for_fd but only in the process of turning them back into IO objects.

A final minor point is that not all implementations will support raw file descriptors. JRuby running in non-native mode uses only JVM channels, which do not expose file descriptors. Most of the time JRuby is run in native mode, but this should be considered; leaking the file descriptor into Ruby-land is very un-Ruby.

Mutex will have to be addressed

This proposal punts on enhancing Mutex and considers any fiber that has locked a mutex as now being blocking. I assume this means it goes back to cooperative scheduling until the lock has been released.

I think this is going to limit many/most uses of this API. Given that context switches between threads can now occur on any IO operation, the need for synchronizing access to mutable data is even more important. There will be more locking required to ensure scheduled fibers are not stepping on each others' work, which means more cases will be forced back into blocking mode.

I understand this omission is to keep the scope small, but I think it's a big risk to go forward with this feature before making it mutex-aware.

Don't introduce Fiber()

The new Fiber do form is confusing to me and I'm pretty sure it will be confusing to users. I guarantee people will be asking when to use Fiber.new do versus Fiber do and there's no indication why this special form has been added nor what it does differently. These two forms will also be easily mistaken and result in people calling the wrong one.

In addition, I think we need an API form that always produces a nonblocking fiber. In this proposal, Fiber() calls Scheduler#fiber, which as stated:

If no scheduler is specified, it creates a normal blocking fiber. An alternative is to make it an error.

So calling the same method will have different behaviors depending on whether there's a Scheduler installed, and depending on what that Scheduler chooses to do. As with the old proposal, where fibers would magically become nonblocking when a Scheduler is installed, now we have the reverse case: fibers intended to be nonblocking will not be nonblocking depending on the behavior of the Scheduler.

If Fiber.new(blocking: false) is too much to expect, perhaps Fiber.nonblock do or similar would be a good choice?

This intertwining of behaviors between Fiber and Scheduler seems problematic to me.

Updated by headius (Charles Nutter) 8 months ago

A thought occurs: if you have to create a "special" fiber to get nonblocking behavior anyway, why isn't it named something else? How about something like Worker?

  • Create a new nonblocking worker fiber with Worker.new do
  • Workers and only workers are schedulable

I believe AutoFiber has been suggested in the past, along with other names.

We've already decided we can't change behavior of existing Fiber code. It makes sense to me that there should be a new name for this concept.

Updated by enebo (Thomas Enebo) 8 months ago

I have not had much time to digest this from a single reading but a question immediately screams out at me in reading this: Why not differentiate what we think of as Fiber today with this new type of Fiber (e.g. ScheduledFiber)?

It feels like a boolean tacked on to fiber makes the notion that it is scheduled pretty opaque. nonblocking: true does not give me any sense that how that non-blocking fiber is scheduled (since we can change the scheduling behavior). I would assume Ruby just magically handles it but then I also expect a specific set behavior for that scheduling. The fact that we can change that scheduling makes me think the noun used to describe it could make that clearer.

Updated by ioquatix (Samuel Williams) 8 months ago

Why not differentiate what we think of as Fiber today with this new type of Fiber (e.g. ScheduledFiber)?

From the user's point of view, it's still a fiber, and can be scheduled like a fiber: resume/yield/transfer and so on. The scheduler also sees it as a fiber and uses fiber methods for scheduling.

Scheduler API should pass IO objects, not file descriptors

We really only have two options that I can think of:

  • Internally have a table of fd -> IO and use this, although there are C extensions where this still won't work because there was never an IO instance so we still need to construct it. The details of constructing an IO instance in this case are trivial but there is still a cost.
  • Expose this detail in the scheduler design and leave it up to the implementation. Most scheduler designs just need the file descriptor and don't care about IO so there is little value in reconstructing the full IO object when it's immediately discarded or unused.

Don't introduce Fiber()

I'm okay with this. In Async we already have constructs that users are familiar with. I'll have to defer to matz (Yukihiro Matsumoto) for specifically what kind of interface he wants to expose. This interface was based on our initial discussion at RWC2019.

There are two benefits from introducing such a name:

  • It hides the implementation of symmetric/asymmetric switching by the scheduler.
  • It provides a uniform interface which high level libraries like Async and EventMachine can hook into. They do not need to use framework-specific constructs/methods for task construction.

Mutex will have to be addressed

The entire Async stack including Falcon works without depending on Mutex working the way you suggest it needs to. So I respectfully disagree with your assertions. The only place it's used is in signal handling setup IIRC.

Semantically, the proposed implementation doesn't change the behaviour of Mutex. We want to avoid introducing changes that break user code.

Given that context switches between threads can now occur on any IO operation

That's either wrong (do you mean non-blocking fibers?) or irrelevant to this proposal (yes, threads can always context switch on I/O operation and that's not changed by this proposal).

Updated by ioquatix (Samuel Williams) 8 months ago

In addition, I thought about it more, and I think Fiber do ... end without a scheduler should be a hard error, otherwise, the default implementation needs to expose symmetric/asymmetric co-routine (resume/transfer). In addition, even if we choose hard error now, we can extend it in the future with default scheduler or some other similar idea (not in this proposal please).

#10

Updated by ioquatix (Samuel Williams) 8 months ago

  • Description updated (diff)

Updated by ioquatix (Samuel Williams) 8 months ago

  • Description updated (diff)

Added autoclose: false to reflect discussion point from @headius.

Updated by ioquatix (Samuel Williams) 8 months ago

I looked at C interface.

We can introduce new interface, something like:

int rb_wait_readable(VALUE io);
int rb_wait_writable(VALUE io);

// Similar to wait_for_single_fd:
int rb_wait_events(VALUE io, int events, struct timeval * timeout);

We can make adaptors for existing C interface in the scheduler:

e.g.

def wait_readable(io)
end

def wait_readable_fd(fd)
  wait_readable(IO.from_fd(fd, autoclose: false))
end

An alternative implementation could wrap rb_io_wait_readable(fd) and use an internal lookup table fd -> IO. But this won't always exist, so it must lazy construct IO instances. It is a fact that kqueue/epoll/io_uring doesn't care about IO, only file descriptor. So it will immediately be called IO.fileno in 99.9% of code and there is a small performance cost.

VALUE descriptors[MAX] = {Qnil}; // should be implemented by weak map.

void rb_io_wait_readable(int fd) {
  VALUE io = descriptors[fd];
  if (io == Qnil) {
    descriptors[fd] = io = rb_io_from_fd(fd);
  }
  rb_wait_readable(io);
}

It's just idea, but I am happy to try it out. I want to hear some feedback first about which design makes most sense.

#13

Updated by Eregon (Benoit Daloze) 8 months ago

  • Related to Feature #16792: Make Mutex held per Fiber instead of per Thread added

Updated by Eregon (Benoit Daloze) 8 months ago

I created #16792 to change Mutex to be held per Fiber instead of per Thread.
Based on that it should be easy to integrate with the Scheduler.
I agree that seems an important case to address, and I think we shouldn't have any builtin operation disabling the scheduler as that is both more complicated to understand and a large limitation for scalability of the model.

Updated by Eregon (Benoit Daloze) 8 months ago

I think is a great proposal.

I think we need to try to support existing code as much as possible, because all the existing Ruby code will never be rewritten to use a different pattern.
So we need the proposal to compose really well with existing code which might use threads and Mutex for good reasons, and I think part of that is making Mutex#lock reschedule.

I'm also a bit concerned about Fiber() being rather unclear.
I would prefer to be explicit here like Thread.current.scheduler.fiber {} but that's indeed quite long.
Maybe we can make Fiber.new(blocking: false) call Thread.current.scheduler.fiber {} or Thread.current.scheduler.register(fiber_instance) ?

Updated by ioquatix (Samuel Williams) 8 months ago

Thanks Eregon (Benoit Daloze) for your feedback.

Maybe we can make Fiber.new(blocking: false) call Thread.current.scheduler.fiber {} or Thread.current.scheduler.register(fiber_instance) ?

Fiber.new is a constructor and is independent of the scheduler in every way.

The blocking or non-blocking state is simply stored into the fiber itself.

Because of that, I disagree with the constructor doing anything complicated or invoking any hook, at least at this time. Because then, the time at which you construct the fiber might impact it's behaviour in the scheduler, which I think is unnecessary and maybe confusing to user.

Additionally, we should not expose user to Fiber.new(blocking: true/false) because it's detail of scheduler implementation and to avoid breaking existing code (where Fiber.new defaults to blocking fiber which preserves existing behaviour).

Users need a simple entry point for concurrency. This is proposed as Fiber {}. I cannot make it any simpler than that. Fiber as a name is already reserved by Ruby, so making a method of the same name is similar to class Integer/Integer(...).

I've had feedback from developer over several years who told me Async {} is so simple and easy. So the ergonomics are good for users and the feedback supports that.

I cannot see any value in making it longer, more explicit, tied to the scheduler, or adding arguments to make it blocking (which we want users to avoid). Most users don't understand blocking/nonblocking so we should avoid forcing them to deal with it.

Updated by ioquatix (Samuel Williams) 8 months ago

  • Description updated (diff)

Make wait_readable, wait_writable and wait_any take IO argument. Add explicit wrappers for CRuby.

Updated by ioquatix (Samuel Williams) 8 months ago

  • Description updated (diff)

Tidy up proposal.

Updated by ioquatix (Samuel Williams) 8 months ago

I asked for more feedback from community.

https://twitter.com/ioquatix/status/1251024336502190081

  • Fiber do ... end: ~49% like it.
  • AsyncFiber do ... end: ~27% like it.
  • Fiber.new(blocking:false) do ... end.resume: ~20% like it.
  • Thread.scheduler.fiber do ... end: ~5% like it.

Some options were truncated in the poll because Twitter limits the length of the option. The sample size was ~280 people. It's not super scientific, but my experience is that polls do not change significantly after the first 100 votes.

So, I'm confident that Fiber do ... end is the right approach and the community showed strong support for it.

Also a few more notes:

  • I'm against AsyncFiber as it's confusing naming w.r.t. Async the gem I maintain.
  • I checked Worker but there is a gem called that already.

Updated by Eregon (Benoit Daloze) 8 months ago

I think Fiber() is OK after your replies, if it raises if there is no scheduler so there is no silent error. It would just need good documentation.

Fiber.new(blocking: false) should be clearly marked as "should only be used by scheduler not directly by user code", as that would miss the registration.

Updated by sam.saffron (Sam Saffron) 7 months ago

My big concern here is that this does not cover why #13618 was deficient and this complete greenfield implementation solves the issues it had?

#13618 had kqueue and epoll implementations which this would leave unimplemented, as far as I recall we were simply stuck on naming with 13618, there was nothing fundamentally objectionable there to ko1 (Koichi Sasada) and matz (Yukihiro Matsumoto)

Updated by Eregon (Benoit Daloze) 7 months ago

sam.saffron (Sam Saffron) I'll let ioquatix (Samuel Williams) reply in more details but my point of view on that is:
#13618 is not flexible, and rather hardcodes everything including the scheduler, the IO selectors, etc, which would add a huge implementation cost to alternative Ruby implementations.
This proposal is far more flexible, and a much smaller change which is far easier to review, maintain and evolve.
Also the author of that proposal seems rather inactive recently (not blaming, just a fact), which would be an issue to maintain that code.

That said, #13618 is quite similar to this proposal and so in essence not so different: they both use rb_io_wait_readable/rb_io_wait_writable/rb_wait_for_single_fd as a way to reschedule on blocking IO method calls.

Efficient selectors with kqueue/epoll can be provided by nio4r which already works on CRuby, JRuby and TruffleRuby.

Updated by ioquatix (Samuel Williams) 7 months ago

My big concern here is that this does not cover why #13618 was deficient and this complete greenfield implementation solves the issues it had?

This proposal is really an evolution of #13618. The reasons why that proposal did not move forward have already been outlined.

Personally, I'd like to take the event loop implementations from #13618 and put them into a gem, so CRuby, TruffleRuby and JRuby can all benefit. The proposal here is for the interface which allows that to happen, and we already have a proof of concept using NIO4r which has been used in production for years.

Updated by chrisseaton (Chris Seaton) 7 months ago

I recently did a deep dive into this approach and how it would fit into the Ruby ecosystem as part of my work on TruffleRuby.

I think what is being proposed here looks like a very practical idea for improving concurrency on Ruby, for the common use-case of applications doing a lot of IO with many clients. The core idea is simple to explain and understand, which I think is the real strong point here.

I also appreciate how the proposal has been architectured to have a pluggable backend. As a researcher that means we're open to experiment with some more radical ideas but running the same user code.

If you didn't know, TruffleRuby implements fibres as threads, which is also what JRuby does. This is because the JVM doesn't have any lighweight threading mechanism at the moment. The JVM will get fibres through a project called loom, and some experimental work we did to integrate this into TruffleRuby was promising. It should work the same on JRuby. I'm planning to implement this issue in TruffleRuby for experimentation even if we don't have the expected performance characteristics of fibre yet.

Updated by ko1 (Koichi Sasada) 7 months ago

Sorry for late response.
First of all, I agree to merge it and try before next release (please wait Matz's comment for merging).

There are several considerations.

non-blocking fiber creation API

For me, the name Fiber() is not clear because there are traditional fibers. How about Fiber.schedule{ ... } or something which shows the fiber will be scheduled? It should raise if the scheduler is not set to the thread.

mixing with blocking fiber

  • I heard that the root fiber (a default fiber per thread) is blocking. It should be noted.
  • What's happen when a non-blocking fiber creates a blocking fiber (Enumerator, etc) and it runs blocking I/O operation? I think it should be blocking though. However, resuming blocking fiber will be a blocking operation.

Scheduler class

Your example shows that the Scheduler class only inherits Object class. Do we need a Fiber::Scheduler class as base class?

At least it can provide Scheduler#fiber method.

Fiber#resume/.yield for non-blocking fiber

I understand they are needed to make scheduler in Ruby, but it is confusing. I think non-blocking fiber should not have an ability to context switch by users outside of scheduler.

For example, if a fiber F1 is waiting for a network IO and a fiber F2 resume F1, then it will be blocking until the IO is ready.

How about to prohibit context switch by Fiber class methods, but provide Fiber#Scheduler methods?

# like that
class Fiber::Scheduler
  def resume(fib) = native_impl
  # or transfer?
end

class SelectScheduler < Fiber::Scheduler
  def wait_readable io
    ready_io = select(...)
    ready_fiber = ...
    resume(ready_fiber)
  end
end

BTW, hooks are called on root fiber (if a fiber F1 calls blocking IO operations, back to the root fiber and call the hook?) sorry if I missed the explanation.

Scheduler hooks

  • wait_readable, wait_writable hooks are easy to understand. However, wait_any is not clear for me.
  • There is a wait_sleep, but I'm not sure the hooks are enough or not.
  • enter_blocking_region/leave_blocking_region are strongly connected to the MRI, so I'm not sure we should provide it here. For example, def notice(type, ...) which is called by an interpreter with information can hide the details from the method names (user should know the details to utilize the information).

Context switch predictability for atomic operations

I'm negative yet about this proposal because we can not predict the context switch completely. Compare with the threading, the predictability is very high, but we can not predict context switch timing 100% because most of non-blocking IO operations can be context switch points. It can violate atomic operations such as mutating Hash/Array/Object/... twice or more at once.

I know most of people include Samuel and Matz are optimistic for this issue.
I also agree the danger of this kind of violation is very low compare with threading.

How to provide a safety

There are several ideas.

  • (1) Users understand the code deeply where are context-switching points.
    • Pros. we don't need to introduce any mechanism.
    • Cons. difficult to make it perfect (human-readable is not perfect)
  • (2) Use Mutex correctly and non-blocking fibers are take care about it.
    • Pros. it is highly compatible with threading. It means we can use same code on multi-threading and multi-nonblocking fiber app.
    • Cons. users need to use Mutex correctly. Schedulers should manage Mutexs.
  • (3) Introduce new context-switch control mechanism such as Fiber.exclude{ ... } like Ruby 1.8 or Fiber.blcoking{ ... } to prevent Fiber scheduling (context-switching) in a block.
    • Pros. easy to implement.
    • Cons. users need to use this method correctly.
  • (4) Introduce non-context-switch assertion mechanism such as Fiber.should_not_switch{ ... } (user asserts that there is no context-switching point). If there is an IO operation, it cause assertion violate error even if there is only one (itself) non-blocking fiber.
    • Pros. easy to implement.
    • Cons. users need to use this method correctly.
  • (5) ((2) + (4)) Assume locking Mutex as an assertion.
    • Pros. compatible with Mutex code.
    • Cons. users need to use Mutex correctly.
  • (6) Restrict the non-blocking IOs more, for example, only net/http enables it.
    • Pros. make more predictable.
    • Cons. concurrency will be reduced.

mmm, (5) seems fine? (if any Mutex is locked by a fiber, then fiber context switch will be an error).
In general, holding Mutex's lock long time is not recommended.

((7) is using Ractor, a position talk ;p)

How to survey the existing program?

Implement (5) and run some programs can show how many code need atomic operations and can run blocking IO operations are called in such atomic operations.

Updated by ioquatix (Samuel Williams) 7 months ago

non-blocking fiber creation API

It was voted by community, strongly in favour of Fiber do ... end. If you think your suggestion is better, we should confirm with community.

Scheduler class

We can introduce Fiber::Scheduler however I don't think it's necessary. The benefit would be adding default functionality to existing schedulers.

In fact, such a default implementation could be provided by a gem or some other code which can be shared between implementations.

I heard that the root fiber (a default fiber per thread) is blocking. It should be noted.

Yes, this is outlined in the proposal, the default fiber is blocking, including the root fiber.

What's happen when a non-blocking fiber creates a blocking fiber

Resuming a blocking fiber is a blocking operation. This is good behaviour and ensures things like Enumerator won't be broken by this proposal.

For example, if a fiber F1 is waiting for a network IO and a fiber F2 resume F1, then it will be blocking until the IO is ready.

Scheduler will require access to resume/transfer/yield operations, so removing them is not realistic. In addition, Async uses these operations in Async::Semaphore & Async::Queue implementations, as well as other places.

Scheduler should be robust against spurious wake-ups (example using Async given below). However, user who calls #resume without any care will suffer the consequences if the code is not robust.

require 'async'
require 'async/io'

Async do
  i, o = Async::IO.pipe

  f1 = Async do
    puts i.read(1024)
  end

  f2 = Async do
    10.times do
      puts "Resuming f1"
      f1.instance_variable_get(:@fiber).resume
      puts "f1 yielded"
    end
  end

  o.write("Hello World")
  o.close
end

BTW, hooks are called on root fiber, if a fiber F1 calls blocking IO operations, back to the root fiber and call the hook?

My interpretation of this is you are asking if the root fiber (which is blocking) will enter a recursive loop when invoking non-blocking operations. The answer is no.

However, wait_any is not clear for me.

wait_any is modelled after https://ruby-doc.org/stdlib-2.7.1/libdoc/io/wait/rdoc/IO.html#method-i-wait

We can change name to something else, do you have better idea?

There is a wait_sleep, but I'm not sure the hooks are enough or not.

What do you mean it is enough or not? Do you mean there are other ways to sleep?

enter_blocking_region/leave_blocking_region are strongly connected to the MRI.

Yes, agreed. These hooks were added as a result of our meeting in Japan.

I'd be happy to remove it but I think it provides very valuable insight into blocking operations within MRI. Maybe other implementations can comment on whether it's useful or not. Even if it's not supported, not much functionality is lost, so I don't see the point in removing it - it's not critical, but it's very useful.

user should know the details to utilize the information

To get the method name, you can use caller, which is shown in Async::Scheduler implementation: https://github.com/socketry/async/blob/c173f5880c566724f104855941f9af12fbf4d7e7/lib/async/scheduler.rb#L100-L112

I think it's best to avoid preparing the arguments (e.g. method name) when it may not be used, to avoid the overhead on critical path.

I'm negative yet about this proposal because we can not predict the context switch completely.

To me, this is actually a major feature of the proposal. We provide the tools to make a concurrent context which enables us to progressively improve concurrency. e.g.

Fiber do
  # Any operation in here may be scheduling point.
  connect(resolve("ruby-lang.org"))
end

This proposal doesn't include concurrent DNS resolution. But with Ruby 3.1, we can introduce such a feature without impacting user code. That means, resolve("ruby-lang.org") can become switching point. The same change in Node.js requires rewriting the code, which we want to avoid. In the future I want to introduce non-blocking DNS, File, other system calls, etc.

So users should not rely on blocking operations for synchronisation.

To retain compatibility with Mutex, when a Mutex is locked on a thread, that entire thread becomes blocking w.r.t. non-blocking operations. This ensures existing code continues to work correctly, at the cost of reduced concurrency when holding a Mutex.

The next step, as proposed by Eregon (Benoit Daloze), is to make Mutex fiber aware. This improves the opportunity for concurrency but does not change the semantics of user code.

Regarding some of the other options you list, one you have not considered is this:

Fiber do
    # Non-blocking fiber.
    # io operations, etc.

    # This is effectively the same as `Fiber.exclusive`.
    Fiber.new do
        # Modify shared mutable state, any operation here is blocking so is guaranteed to be sequential.
    end.resume
end

So no new construct is required to force sequential execution.

So, in effect, this proposal is already implement (2) + (4) / (5).

In general, holding Mutex's lock long time is not recommended.

Agreed.

Updated by ioquatix (Samuel Williams) 7 months ago

  • Description updated (diff)

Specify the root fiber is also blocking.

Updated by ioquatix (Samuel Williams) 7 months ago

  • Description updated (diff)

Add clarification about introducing new hooks.

Updated by ko1 (Koichi Sasada) 7 months ago

ioquatix (Samuel Williams) wrote in #note-26:

non-blocking fiber creation API

It was voted by community, strongly in favour of Fiber do ... end. If you think your suggestion is better, we should confirm with community.

I don't think we should refer this kind of result because the voted people does not know concerns.

Scheduler class

We can introduce Fiber::Scheduler however I don't think it's necessary. The benefit would be adding default functionality to existing schedulers.

In fact, such a default implementation could be provided by a gem or some other code which can be shared between implementations.

It can be, Maybe we should discuss later after more trials.

What's happen when a non-blocking fiber creates a blocking fiber

Resuming a blocking fiber is a blocking operation. This is good behaviour and ensures things like Enumerator won't be broken by this proposal.

oK.

For example, if a fiber F1 is waiting for a network IO and a fiber F2 resume F1, then it will be blocking until the IO is ready.

Scheduler will require access to resume/transfer/yield operations, so removing them is not realistic. In addition, Async uses these operations in Async::Semaphore & Async::Queue implementations, as well as other places.

Not sure it should be implemented independently (it should be a scheduler's duty IMO). But I agree it is more flexible.

Scheduler should be robust against spurious wake-ups (example using Async given below). However, user who calls #resume without any care will suffer the consequences if the code is not robust.

ok.

BTW, hooks are called on root fiber, if a fiber F1 calls blocking IO operations, back to the root fiber and call the hook?

My interpretation of this is you are asking if the root fiber (which is blocking) will enter a recursive loop when invoking non-blocking operations. The answer is no.

My question is which fiber context is used for wait_xxx method? i asked Samuel and the answer is a fiber which called the blocking IO.

However, wait_any is not clear for me.

wait_any is modelled after https://ruby-doc.org/stdlib-2.7.1/libdoc/io/wait/rdoc/IO.html#method-i-wait

We can change name to something else, do you have better idea?

I have no knowledge about it...

There is a wait_sleep, but I'm not sure the hooks are enough or not.

What do you mean it is enough or not? Do you mean there are other ways to sleep?

Sorry, my question is, if we recognized other hooks are needed after release it, what happens?

the answer was: #28

enter_blocking_region/leave_blocking_region are strongly connected to the MRI.

Yes, agreed. These hooks were added as a result of our meeting in Japan.

I'd be happy to remove it but I think it provides very valuable insight into blocking operations within MRI. Maybe other implementations can comment on whether it's useful or not. Even if it's not supported, not much functionality is lost, so I don't see the point in removing it - it's not critical, but it's very useful.

user should know the details to utilize the information

To get the method name, you can use caller, which is shown in Async::Scheduler implementation: https://github.com/socketry/async/blob/c173f5880c566724f104855941f9af12fbf4d7e7/lib/async/scheduler.rb#L100-L112

I think it's best to avoid preparing the arguments (e.g. method name) when it may not be used, to avoid the overhead on critical path.

I doubt it is performance issue.

I'm negative yet about this proposal because we can not predict the context switch completely.

To me, this is actually a major feature of the proposal. We provide the tools to make a concurrent context which enables us to progressively improve concurrency. e.g.

Fiber do
  # Any operation in here may be scheduling point.
  connect(resolve("ruby-lang.org"))
end

This proposal doesn't include concurrent DNS resolution. But with Ruby 3.1, we can introduce such a feature without impacting user code. That means, resolve("ruby-lang.org") can become switching point. The same change in Node.js requires rewriting the code, which we want to avoid. In the future I want to introduce non-blocking DNS, File, other system calls, etc.

So users should not rely on blocking operations for synchronisation.

To retain compatibility with Mutex, when a Mutex is locked on a thread, that entire thread becomes blocking w.r.t. non-blocking operations. This ensures existing code continues to work correctly, at the cost of reduced concurrency when holding a Mutex.

The next step, as proposed by Eregon (Benoit Daloze), is to make Mutex fiber aware. This improves the opportunity for concurrency but does not change the semantics of user code.

Regarding some of the other options you list, one you have not considered is this:

Fiber do
  # Non-blocking fiber.
  # io operations, etc.

  # This is effectively the same as `Fiber.exclusive`.
  Fiber.new do
      # Modify shared mutable state, any operation here is blocking so is guaranteed to be sequential.
  end.resume
end

So no new construct is required to force sequential execution.

creating blocking fiber is interesting idea, but the backtrace is separated so it shouldn't be used IMO.

Updated by ko1 (Koichi Sasada) 7 months ago

note that I missed:

If any mutex is acquired by a fiber, then a scheduler is not called; the same behaviour as blocking Fiber.

in description, so I agree there is no issue if Mutex is used correctly.

checking such situation (io ops on Mutex locking) will be a good survey.

Updated by Dan0042 (Daniel DeLorme) 7 months ago

Really looking forward to this API, it's very promising.

What exactly are the implications of enter_blocking_region/exit_blocking_region? Does it mean the scheduler should not resume fibers even if IO is ready? Since the scheduler is meant to be written in ruby, it would be nice to provide some guidance for rubyists who may not be deeply knowledgeable about MRI internals.

I am also one of those against the Fiber method name because intuitively it sounds like it's equivalent to Fiber.new. I would actually prefer any of the alternatives since they indicate what is different about this fiber; the fact that it doesn't block on IO. So in that respect I think NonblockingFiber or Fiber.nonblocking would also be good alternatives imho. I understand that a Twitter poll is not meant to be scientific but, in addition to issues with selection bias (inherent to the internet), the questions were such that the "non-blocking name" votes (51%) were split among 3 options while the "generic name" votes all went to 1 option. The results should be taken with a pinch of salt or two.

Updated by ioquatix (Samuel Williams) 7 months ago

I don't think we should refer this kind of result because the voted people does not know concerns.

I think you are underestimating the collective knowledge of the community. That poll had almost 300 responses. I've also been working on this proposal for ~3 years and talked to many developers. So I believe Fiber do ... end is the right approach. Matz can make his decision, but my job is to present to him the proposal and the evidence.

That poll shows that community doesn't like the alternative names presented here. Can you list the concerns you have so we can see if changing the name is the right way to address them? Or maybe it's a matter of clear documentation, etc.

What exactly are the implications of enter_blocking_region/exit_blocking_region?

It's a mechanism for detecting blocking operations that release the GVL. It can allow us to report back to the user that they are performing blocking operations on the event loop thread, which will cause some pain depending on the duration of the operation.

You can see a sample implementation here: https://github.com/socketry/async/blob/scheduler/lib/async/scheduler.rb#L100-L112

Updated by matz (Yukihiro Matsumoto) 7 months ago

Accepted for experimentation.

We still have some concerns, for example, mixture with blocking and non-blocking fibers. mame (Yusuke Endoh) will describe the concern.
In addition, I don't like the method name Fiber, since the fiber created by the method is not the original fiber at all. It is not a good idea to steal the role of existing concept in the language. We need a new name.

Matz.

Updated by ioquatix (Samuel Williams) 7 months ago

Thanks Matz.

since the fiber created by the method is not the original fiber at all.

Can you clarify "not the original fiber at all"? It's the same way Integer(...) creates instance of class Integer.

Updated by duerst (Martin Dürst) 7 months ago

ioquatix (Samuel Williams) wrote in #note-34:

Thanks Matz.

since the fiber created by the method is not the original fiber at all.

Can you clarify "not the original fiber at all"? It's the same way Integer(...) creates instance of class Integer.

I can't speak for Matz, but my guess is that he meant "not the original type of fiber", i.e. not the same as you'd get e.g. with Fiber.new.

Updated by ioquatix (Samuel Williams) 7 months ago

Using latest master:

class Scheduler
  def fiber(&block)
    fiber = Fiber.new(blocking: false, &block)

    fiber.resume

    return fiber
  end
end

Thread.current.scheduler = Scheduler.new

f1 = Fiber do
  puts "Hello World"
end

puts f1.class
# Fiber

f2 = Fiber.new do
  puts "Hello World"
end

f2.resume

puts f2.class
# Fiber
#37

Updated by hsbt (Hiroshi SHIBATA) 7 months ago

  • Related to Bug #16892: Reconsider the test directory name for scheduler added

Updated by matz (Yukihiro Matsumoto) 7 months ago

ioquatix (Samuel Williams) I was well represented by Martin-sensei (duerst (Martin Dürst)).
The fiber created by Fiber() do ...end does context-switch on I/O operations. The traditional (or original) fibers don't.
So naming the function Fiber may indicate that all fibers can switch context upon I/O operations.
Am I worrying too much?

Matz.

Updated by midnight (Sarun R) 7 months ago

Hi, I am a Ruby user that would probably vote for

Fiber do
end

only if I see the poll back then.

The result should be taken as a grain of salt.
I second ko1 (Koichi Sasada) on the point that the people who voted for the choice just doesn't know concerns.

If everything were decided by voting, we would not have the sane language that we grew to love.
Reasons should be the first choice to decide, and if reasons just don't cut it, relying on the gut feeling of someone knowledgeable in the area should be better than using voting results.

Don't worry too much about the community because the language is already flexible enough for wrapping or sugar-coating the core API into the form we like at the expense of some performance cost. (For someone who truly opinionates about it.)

Updated by mgomes (Mauricio Gomes) 6 months ago

What about borrowing a little from Crystal? The non-blocking API could be:

spawn do
  # non-blocking fiber
end

I like how it has a completely different interface without introducing a new term.

#41

Updated by Eregon (Benoit Daloze) 5 months ago

  • Related to Feature #14736: Thread selector for flexible cooperative fiber based concurrency added

Updated by ciconia (Sharon Rosner) 5 months ago

I've been working a lot with fibers recently, and I would like to share my thoughts on this issue. Disclaimer: I'm currently working on a concurrency solution for Ruby using Fibers, but which takes a very different approach: https://github.com/digital-fabric/polyphony/

Here are my objections:

  1. This proposal introduces an interface (the Scheduler API) that will probably be implemented only for one or two different gems (ioquatix mentions nio4r and EventMachine). It's not like there's going to be a lot of different schedulers to choose from, so why introduce another public interface? In the case of EventMachine, there has already been an attempt (em-synchrony) at converting it from callbacks to fibers, which didn't gather much steam and is currently inactive. I don't think this solution will be relevant to existing code based on EventMachine. In the case of nio4r, which as far as I know is the only currently existing implementation of a Scheduler, there is already a thriving ecosystem around ioquatix's excellent Async, Falcon et al. Frankly, this proposal seems to me a bit like a tailor-made solution for integrating nio4r into Ruby.

  2. In order to cover all blocking behavior in Ruby, you'll need to touch a lot of core C code. This proposal introduces an additional layer of complexity on top of existing code that has a long history and is already complex. Based on my own experience debugging problems using stdlib APIs and other gems with C extensions in a multi-fiber context, I can say that this proposal has the potential to introduce new bugs and undefined behaviors, which will then have to be resolved, so more work for core Ruby devs. This is especially true if the intention is for all this to be "automatic", such that developers won't even be aware it's there. The end user experience might be less than optimal.

  3. The proposed interface assumes a reactor-based design. Note that reactor != scheduler. This solution would not help us integrate newer technologies like io_uring. I think this should also be taken into account, since any concurrency solution should also be future-proof.

  4. This proposal introduces two kinds of fibers with differing behaviors, which is analogous to the problem introduced by async/await mechanisms ("What color is your function?"). This creates ambiguity where previously there was none.

  5. The proposed Fiber do notation is very problematic in my opinion. First of all, it's a global method call but it breaks the Ruby convention of methods being all lower case. Secondly, it's too similar to Fiber.new, people reading a piece of code might mistake one for the other. Thirdly, it does not communicate what it does. Since method names are normally verbs, what does it mean "to Fiber" and why is it capitalized?

  6. The performance implications, if there are any, have not been shared here. On the impact on non-concurrent code, you state: "We believe it's acceptable," which implies there is some slowdown. Any performance improvement or degradation should be shared.

In conclusion, in my opinion this should not be made part of Ruby (yes, I know the pull request has already been merged into master). One can debate the technical merits and faults of this proposal, but in the end of the day it feels to me like a tailor-made solution that will most probably be implemented by a single library, namely nio4r. This proposal might also have side effects that we can't see at the present moment.

In my opinion, such a fiber scheduling solution should not be baked into core Ruby. At the very least, if the Ruby gods decide to favor this endeavor, it might be wiser to use a different class than Fiber (perhaps a subclass of Fiber), and change the Fiber method name such that the API be less misleading. Ruby developers are already a bit "spooked" by fibers, there's no need to add to the confusion.

Updated by Eregon (Benoit Daloze) 5 months ago

Replying to only one of your points for now:

like a tailor-made solution for integrating nio4r into Ruby.

I see it as a way to cleanly integrate non-blocking concurrency with Fibers in Ruby, without forcing a single implementation like #13618.

In fact, maybe Polyphony could use the hooks introduced in this PR instead of extensive monkey patching?

This is just my opinion, but I would feel uncomfortable to run large monkey patches like
https://github.com/digital-fabric/polyphony/tree/master/lib/polyphony/extensions in production.
Also, I think it is brittle and e.g. reimplemented logic is going to inevitably behave differently (also because behavior might differ between Ruby versions).
Is there a plan to address these monkey patches in Polyphony? I think this feature can potentially help there.

It is cool that Polyphony can hook existing classes and add concurrency kind of automatically (although one of course still needs to create new Fibers with spin).
I think this proposal is also aiming for something like that, mostly transparent non-blocking concurrency, but via hooks instead of monkey patching.

Updated by ioquatix (Samuel Williams) 4 months ago

There has been some discussion about the interface of the Scheduler.

C Interface Exposure

It was largely copied from the existing Ruby and C interfaces where it seemed to make sense. For example rb_wait_for_single_fd -> wait_for_single_fd, etc.

We discussed current (public) C interface, which is:

int rb_io_wait_readable(int);
int rb_io_wait_writable(int);
int rb_wait_for_single_fd(int fd, int events, struct timeval *tv);

ko1 (Koichi Sasada) said he doesn't want to expose these methods to scheduler, and he would rather have an implicit (non-cached) IO.from_fd for every operation so that the scheduler only sees IO instances.

  • This may introduce performance issue.
  • This may introduce consistency problem.

Right now in the proof of concept scheduler, it will cache fd -> IO and IO -> Wrapper. A wrapper contains the cached state of epoll/kqueue registration which requires one system call to register and one system call to deregister. I agree, that ideally all IO is represented by a unique IO object, so that we can cache this correctly. However, by creating an IO instance for each read and write call, not only do we create a lot of garbage, we also introduce duplicate IOs for the same underlying fd.

So, these issues need to be addressed some how. Ultimately, I'm fine with removing the C _fd wrappers, provided that:

  • We deprecate the existing C functions which take file descriptors.
  • We introduce new C functions which take IO instances.
  • We update all code in CRuby to use these new functions.
  • In order to remove those C wrappers from the Scheduler, we need to implement some kind of fd -> IO cache.

Ultimately, it makes the interface of the scheduler simpler, so I'm happy with that. But it's a lot of work and the current proposal is working.

C Interface "Surface Area"

The proposed scheduler replicates methods from IO, including IO#wait_readable -> Scheduler#wait_readable, IO#wait_writable -> Scheduler#wait_writable and IO#wait -> Scheduler#wait_any.

It was brought to my attention that the surface area of this was too big.

I'm also okay with this point. However, my original design was to avoid making changes and to follow the existing interfaces.

That being said, if change is desired here, after discussion, this is what I would suggest.

  • We should introduce IO#wait_priority. It's a 3rd kind of event which currently not handled.
  • We should change IO#wait to take some bitmask of flags, e.g. IO::READABLE, IO::WRITABLE, IO::PRIORITY. There are sometimes system specific flags too. The order of arguments is also cumbersome if you don't want to specify timeout.
  • We should rewrite IO#wait_$event(timeout) into IO#wait($EVENT, timeout) as a defined implementation.
  • IO#wait should be redirected to Scheduler#wait (or #wait_io).
  • As part of this, we should rename wait_sleep to sleep if we want to try and be consistent with wait naming (i.e. Kernel#sleep -> Scheduler#sleep and IO#wait -> Scheduler#wait.

This greatly reduces the surface area of the functions that get forwarded into the scheduler and should also reduce the size of the CRuby implementation. It provides a nice central funnel for the wait event and using an integer bitmask is much better for forwards compatibility (i.e. not just readable/writable but also priority, out of band data, other system specific events).

Blocking Hooks

The proposed scheduler introduced some new hooks. This was kind of experimental feature to detect blocking operation. However, after testing it, we found it cannot detect every blocking operation (to be expected I suppose). For example in SQLite3, the GVL is not released even for long blocking operation. So, this feature is dangerous.

However, that being said, fibers that tie up the event loop represent a significant issue to event driven code. This is well known issue from any event driven system. Care must be taken to off-load work to threads (or Ractors!). Therefore, I'm happy to remove this hooks. But we should be aware that this doesn't mitigate the need for instrumentation around fiber context switch and if we can't provide this feedback, users may have a bad experience with event driven Ruby.

My conclusion is that we need better sampling profiler which also measures fiber context switch. But it is a lot of work.

"Fiber{}" Naming

I personally like the "Fiber do...end". It's short, we don't break any existing code by using it, contrary to prior discussion, this does return a Fiber in every way shape and form, the only difference is that it defers to the scheduler for creation. It feels consistent for how we use things like Integer, Float and so on.

In Async, it actually creates an Async::Task and returns the task's fiber. ko1 (Koichi Sasada) recently created his own Scheduler and implemented a pessimistic scheduler by returning the fiber and later executing it. Async is optimistic scheduler and will execute fibers immediately, rather than adding them to work queue. The flexibility of this design was enabled by the def fiber implementation provided by the scheduler.

Since the initial proposal, we made it a failure if the scheduler is not defined, which means that the user can clearly indicate that a piece of code requires a scheduler.

Regarding the specific name, I am not convinced by any of the proposed alternatives.

  • spawn:Kernel#spawn is already defined: https://rubyapi.org/2.7/o/kernel#method-i-spawn
  • Scheduler#fiber: Calling this method directly cannot have the same convenient error checking as some top level method like Fiber{}.
  • Fiber.nonblocking{}: Well, maybe? It's longer (bad?), it's also not strictly speaking Fiber.new(nonblocking: true) because it goes via the scheduler... so... being more specific actually makes it easier to be wrong.

The logic of Fiber{} is simply: It creates a fiber, and you can run code it in, and it requires a scheduler to exist. The scheduler might not even be non-blocking scheduler. It's implementation dependent.

Let me speak frankly, I've seen so many discussions wasting so much time about naming things. It's probably best that before you suggest some alternative, that you actually spend a few hours trying out the current situation to see how it feels. If you have a strong opinion about it, let's discuss it in private so that I can compile a list of options for a final discussion. In the end, we need to choose something, and there is no perfect answer.

Updated by ioquatix (Samuel Williams) 4 months ago

Mutex Implementation

I wanted to give a brief update on this given that the work is happening on a separate ticket.

Eregon (Benoit Daloze) and myself have been working on how to handle Mutex. Because we had this as (kind of indirect) feature request from jeremyevans0 (Jeremy Evans) w.r.t. sequel, which now has Fiber awareness and should work with the fiber scheduler provided the drivers are event driven. However, the sequel connection pool (correct me if I'm wrong) still uses a global pool and thus Mutex is required to insert and remove connections. Because of this, per-Fiber needs to lock the mutex.

In order to show how we can implement mutex with a general abstraction, I implemented it in Async here: https://github.com/socketry/async/tree/mutex-support/lib/async - It's totally experimental and I'm not planning on merging it, but it shows we just need a thread safe primitive to reschedule fibers + the work Eregon (Benoit Daloze) did on the fiber-aware mutex.

With a single thread-safe entry point to schedule deferred work, we can handle mutex between plain threads and thread scheduler with minimal impact to performance. This equally applies to Queue/SizedQueue/ConditionVariable and other primitives - althought we don't have a proof of concept for those yet.

Updated by ioquatix (Samuel Williams) 4 months ago

Proposed updated scheduler interface, including some experimental parts from Mutex proposal:

class Scheduler
  # Wait for the given file descriptor to match the specified events within
  # the specified timeout.
  # @parameter event [Integer] a bit mask of +IO::WAIT_READABLE+,
  #   `IO::WAIT_WRITABLE` and `IO::WAIT_PRIORITY`.
  # @parameter timeout [#to_f] the amount of time to wait for the event.
  # @returns [Boolean] If any of the events were triggered.
  def wait(io, events, timeout)
    # Hook for IO#wait
  end

  # Sleep the current task for the specified duration, or forever if not
  # specified.
  # @parameter duration [#to_f] the amount of time to sleep.
  def sleep(duration = nil)
    # Hook for Kernel#sleep
  end

  # Intercept the creation of a non-blocking fiber.
  # @returns [Fiber] The fiber that was created (or a duck type).
  def fiber(&block)
    Fiber.new(blocking: false, &block)
  end

  # Reschedule the specific fiber on the event loop.
  # @reentrant This method is thread-safe.
  # @parameter fiber [Fiber] The fiber to execute during the next iteration of the event loop.
  # @parameter urgent [Boolean] The event loop should be immediately interrupted in order to invoke the fiber as soon as possible.
  def schedule(fiber, urgent = false)
  end

  # Execute the event loop. Invoked when the thread exits.
  def run
    # Implement event loop here.
  end

  # Schedulers may choose to expose other methods, e.g.
  # run_one_iteration or run_for_duration
end
  • Added schedule method for handling cross-thread Mutex.
  • All _fd methods would be removed pending changes to CRuby implementation.

Updated by ko1 (Koichi Sasada) 4 months ago

Nobu and I reviewed spec and implementation.
There are several topics.

[NEED TO CHANGE] doc/fiber.rdoc

The file name is fiber, but almost describe about the fiber scheduler.
Changing the file name into fiber_scheduler or describe fibers more.
I recommend to change the file name.

[NEED TO CHANGE] enter/exit_blocking_region callback

This feature should not be implemented as a scheduler's callback.

  • (1) This feature is too MRI specific.
  • (2) It is difficult to write a log simply because opening a log file also calls this callback (recursively)
  • (3) This event can help other than fiber scheduler.

I think they should be implemented as internal trace-point like newobj, freeobj events.

[MAYBE NEED TO CHANGE] API names

Now the API mixes scheduler, blocking, thread, fiber for the names.

For example:

  • Thread and Scheduler recall Thread's scheduler (scheduler for the threads).
    • Thread#scheduler
    • rb_thread_scheduler_get
  • Thread and Blocking recall thread's status (blocking/waiting).
    • rb_thread_blocking_p

I suggest rename scheduler with fiber_scheduler;

  • Thread#fiber_scheduler=
  • rb_thread_fiber_scheduler_get

I suggest rename with blocking with other name, but I have good name to describe it.

  • rb_thread_blocking_p -> ??? difficult to make a name...

Maybe there are another function names I missed.

[NEED TO CHANGE] Fiber

The name of Kernel::Fiber should be renamed with others, because Matz said it.

In addition, I don't like the method name Fiber, since the fiber created by the method is not the original fiber at all. It is not a good idea to steal the role of existing concept in the language. We need a new name.

IMO Thread.current.scheduler.schedule do ... end is better than introducing short name.
Make alias Fiber.scheduler == Thread.current.scheduler and Fiber.scheduler.add do ... end is also better than Fiber because it is very easy to understand a new fiber will be scheduled with the scheduler.

I don't think Fiber() idea (making non-blocking/blocking fibers if scheduler is/isn't) works correct for many people.
At least if I write a program with a scheduler, my program (fiber procedures) should only work with the scheduler.

[Suggestion] Callbacks

wrap IOs

In fiber.rdoc document, wait_readable is at the first. However, wait_readable_fd (and so on) is needed to define. I've wondered the fact.
I think if wait_readable_fd is not defined, the wait_readable(io) is called with default wrapping code.

After thinking about it, how about to define the Fiber::Scheduler class which defines wait_readable_fd(fd) and so on and wrap fds to IO objects?
Someone who want to retrieve the raw fd, they can override default wrapping methods.

At least, the document should describe the requirements (you need to write _fd methods).

callabck names

on comment #46, Samuel proposed sleep, wait and so on.
However, I think good prefix will help to recognize they are callback functions, so I think wait_xxx is better for me.

[Suggestion] Scheduler#run

Scheduler#run is invoked at the end of thread execution. However main thread does not run it.

I guess running Scheduler#run by the programmer explicitly is better than implicit invocation on any threads.
(1 line at the end of thread is not a cost, I guess)

[Suggestion] nonblocking IO

fiber.rdoc does not describe the relation of fiber scheduler and IO#nonblock attribute.
It should be written.

Updated by ioquatix (Samuel Williams) 4 months ago

Thanks ko1 (Koichi Sasada), we will prepare an update taking into consideration your feedback and the ongoing discussions.

Updated by Eregon (Benoit Daloze) 4 months ago

Regarding naming, based on what we discussed I think these are good candidates:

  • Scheduler#wait_io, Scheduler#wait_fiber
  • Fiber.schedule { ... } (instead of just Fiber { } which confuses many about the semantics)

About raw fds, I don't think we necessarily need to deprecate C APIs, the conversion fd->IO could be done in C (when calling the scheduler) or Ruby code (in the scheduler).
I guess the motivation is because the conversion is expected to be slow?
Probably having wait_fd + wait_io provides most flexibility.
Then the scheduler can decide if it wants all IOs or all fd's.
Passing whatever IO or fd is available at the point of call seems simplest to me.

I don't think having private copies of IO objects is a problem as they stay inside the scheduler.
The native selector APIs will use raw fds anyway, so I expect the identity of IO objects doesn't matter.

Updated by ioquatix (Samuel Williams) 4 months ago

Here is new PR:

https://github.com/ruby/ruby/pull/3434

[NEED TO CHANGE] doc/fiber.rdoc

We will update it.

[NEED TO CHANGE] enter/exit_blocking_region callback

Removed. However, I would like to add, that if we do not have some kind of support for finding blocking behaviour in Ruby code, it will be trouble for developers. So, more work is required here.

[MAYBE NEED TO CHANGE] API names

I agree, some parts can be improved. So, I'll do review and continue to think about it during development.

[NEED TO CHANGE] Fiber

After we discussed, we felt Fiber.schedule{} is good compromise. matz (Yukihiro Matsumoto) can you comment if you are happy with this?

I'm running informal poll to see community feedback: https://twitter.com/ioquatix/status/1295720461003882498

[Suggestion] Callbacks

It's good idea.

So, I discussed many points around this topic with nobu (Nobuyoshi Nakada), mame (Yusuke Endoh), ko1 (Koichi Sasada), Eregon (Benoit Daloze), and I got the general feedback:

After many discussion, I'm leaning more towards NOT exposing file descriptor anywhere. It's CRuby's problem internally, and we have some options to solve it. SO:

  • I added rb_io_wait(VALUE io, ...).
  • I reworked some existing methods to use it.
  • I made single entry point to scheduler: Scheduler#io_wait(...).
  • I use IO.for_fd internally to handle raw file descriptors.

I've also updated the naming convention for hooks. The name format is {class}_{method}.

I guess running Scheduler#run by the programmer explicitly is better than implicit invocation on any threads (1 line at the end of thread is not a cost, I guess)

We discussed. I'm following Crystal method. I think it's good. ko1 (Koichi Sasada) was concerned, how to disable scheduler, but you can assign nil if you want to cancel run at exit. We could also have separate method, e.g. Scheduler#run_at_exit. I'll think about it more and I'll also make the current implementation more consistent.

[Suggestion] nonblocking IO

Completely agree.

As this is a work in progress, documentation is not completed yet.

Finally, I'd like to bring attention to a few more exciting points of the new PR.

We are working on support for Linux io_uring and Windows IOCP. It's going to be important to shape the scheduler interface based on these modern interfaces rather than "legacy" epoll/kqueue.

dsh0416 (Delton Ding) has been working on implementing a compatible scheduler implementation and we have discussed requirements for efficient io_uring. It turns out we need to add hooks for read and write, and this allows us to implement zero-copy file-system and network I/O using event loop. (There is some related discussion here: https://bugs.ruby-lang.org/issues/17059)

One exciting area of this is the multitude of system calls that io_uring supports, including fallocate, madvise, and so on. While we would not aim for that initially, it looks to be a bright future in terms of handling these operations in a non-blocking fashion.

Such interfaces would be introduced, e.g. Scheduler#io_fallocate or similar. It's hard to anticipate all design issues, but so far I feel confident we are in a good position to leverage these interfaces (and more) as they become available/are supported. Linux itself is introducing more io_uring system calls, and not all of them would be supported by the given system (depending on Kernel version for example), so the C extension would need to handle this.

Updated by ioquatix (Samuel Williams) 4 months ago

About raw fds, I don't think we necessarily need to deprecate C APIs, the conversion fd->IO could be done in C (when calling the scheduler) or Ruby code (in the scheduler).

The latest PR does it in C, and it's implementation detail which I think we can avoid most of the time, and if we do need to do it, we could cache it pretty easily.

I guess the motivation is because the conversion is expected to be slow?

It's actually not just that, it's because you can't track anything about the original FD from Ruby land - i.e. the class, etc, it's just a generic I/O.

Probably having wait_fd + wait_io provides most flexibility.

Yes, but it also makes the scheduler interface more complicated, so I've opted for simpler and hide the details in the implementation of CRuby. Based on my initial work, I feel like we can resolve 100% of the situations in CRuby where these public interfaces are used.

Then the scheduler can decide if it wants all IOs or all fd's.

After discussion, my feeling was that exposing FDs is implementation detail.

Passing whatever IO or fd is available at the point of call seems simplest to me.

Agreed, simpler implementation but more clunky interface.

I don't think having private copies of IO objects is a problem as they stay inside the scheduler.

Agreed, that is how the proposed scheduler interface in Async works.

The native selector APIs will use raw fds anyway, so I expect the identity of IO objects doesn't matter.

This was my original argument, however, after many discussions, I feel like there was a preference towards using IO instances for the public scheduler interface because it's simpler, more consistent, and easier for JRuby.

It's all really good feedback, and we could easily enough support both if it turns out the fd conversion is a performance issue. So far, I think I prefer the simpler interface.

Updated by matz (Yukihiro Matsumoto) 3 months ago

Since this thread became long and complex, it was hard for me to grasp the latest proposal. Correct me if I missed something.

How to create async I/O fiber

I tentatively call fibers with context switching on I/O operations async I/O fiber. To create an async I/O fiber, one must do:

xxxx do
  # the block will be evaluated immediately.
  # within this block and code called from the block,
  # every I/O operation (plus sleep etc.) switches context
end

And for ordinal Ruby users, there's no need to configure any scheduler, right?

Candidates for xxxx are:

Scheduler

I am strongly opposed to the name Scheduler in this proposal. Every concurrent entity (process, thread, fiber, ractor) has its scheduler at least within implementations. Using the mere scheduler can easily cause confusion. Is Thread#scheduler a scheduler for threads? Or a scheduler for fibers within a thread? Currently, we have no other scheduler disclosed to Ruby level, but maybe in the future.

In my opinion, it should be more specific, e.g. FiberScheduler. I know it's longer and more verbose. But I think it's OK since it's (kind of) internal configuration for advanced users.

Matz.

Updated by ioquatix (Samuel Williams) 3 months ago

Thanks matz (Yukihiro Matsumoto), I'll consider your feedback, and we can make some changes after the next PR is finished.

Updated by ioquatix (Samuel Williams) 3 months ago

I have introduced experimental support for event-driven scheduling for the following primitives:

  • Mutex
  • ConditionVariable
  • Queue (& SizedQueue)

In addition, I've reworked the method names to be more consistent taking into account the feedback given here and on Slack.

Regarding naming, I think it's important to understand that the future of the scheduler interface is not just about fiber.

matz (Yukihiro Matsumoto), on the one hand you say you oppose Thread#scheduler because it can easily cause confusion, and yet you also assert that it's "kind of internal". Well, I'm not sure that together it's a strong argument.

I could understand if it's Thread#fiber_scheduler because it's Fiber.schedule. However, I'm not sure we've established the final convention for that either. In addition, Fiber.schedule is really only one part of the interface, along with hooks for many other blocking operations in the Ruby VM. Basically, the proposed scheduler is not just about fibers.

Regarding Process, Thread, Fiber or Ractor scheduler. Do you want a future for Ruby where users need to deal with more than one kind of scheduler? I cannot imagine developers making sense of Thread#this_scheduler and Thread#that_scheduler. I wish we can find a shared vision for the future of Ruby concurrency & parallelism. Can you explain in more detail what you are thinking?

The reason why I started with Thread#scheduler is because it's short and represents what I'm implementing. As you said there is no conflict with current code. I feel that Thread#fiber_scheduler is too specific and I'm not sure how being more specific here is useful - especially given that almost all the hooks are not fiber specific. In theory you could use this interface to implement a green thread scheduler for example.

I also thought about what you said about using Fiber.async. Simply, it conflicts with the semantics of the async gem.

This is something that I'm happy to discuss further but: Scheduling a fiber for execution (e.g. Fiber.schedule) is quite different from Async/Sync from the async gem. Fiber.schedule is semantically very simple but Async/Sync is more complex:

In order to explain what I mean:

class Kernel
  attr :scheduler_class

  def async(&block)
    if scheduler = Thread.scheduler?
      scheduler.fiber(&block)
    else
      scheduler = scheduler_class.new
      scheduler.fiber(&block)
      scheduler.run
    end
  end

  def sync(&block)
    if scheduler = Thread.scheduler?
      yield
    else
      async(&block).wait
    end
  end
end

# Will create a reactor and run the user's code, waiting for it to finish:
async do
  # Will create a child task, and run it, parent task is not waiting for it:
  async do
  end
end

# Will guarantee a reactor exists and run code synchronously:
sync do
  # Does not introduce a new fiber because it's already running inside one:
  sync do
  end
end

I'm happy to discuss this design further, but I do wonder if it's expanding the proposal too far. So my feeling is Fiber{} or Fiber.schedule{} is better because it's more direct and easier for library implementation to use. The reason to explore Kernel#async and Kernel#sync is because it introduces higher level interface for concurrency which could be great for users... but to me, it should be separate proposal. We should try to reserve Kernel#sync and Kernel#async for this proposal if possible and avoid introducing different meaning to Fiber#async.

Updated by ioquatix (Samuel Williams) 3 months ago

With the help of Eregon (Benoit Daloze), I have tidied up the implementation so that:

  • We found and fixed a few bugs.
  • Changing the current scheduler invokes #close on the current scheduler (before did nothing).
  • At thread & process exit, the current scheduler is set to nil (was not consistent previously).
  • #close is responsible for ensuring that any outstanding blocking operations are cancelled (either by force or by running them).

In addition:

  • Reworked mutex_lock and mutex_unlock into more general primitives for blocking operations: block(blocker, timeout) and unblock(blocker, fiber).
  • Added support for Thread#join to be non-blocking, so that threads can be easily used on the event loop.

What still remains:

  • Investigate Process#wait other similar methods to make non-blocking.
  • Generally, evaluate if other methods are blocking and can be made non-blocking.
  • Introduce Fiber#annotate (new feature).
  • Documentation & more ruby/spec.
  • Implementation in JRuby/TruffleRuby.

I think this is a sufficient subset of functionality for preview1, but I'm working on updated documentation.

Updated by Eregon (Benoit Daloze) 3 months ago

What's Fiber#annotate?

Updated by ioquatix (Samuel Williams) 2 months ago

Eregon (Benoit Daloze) wrote in #note-56:

What's Fiber#annotate?

It's a way to add a description to what a fiber is doing, it's incredibly useful for debugging. Actually, I don't mind if we don't do this, but it has been discussed a long time ago IIRC. I don't know if there is an issue for it. Basically:

Fiber.current.annotate "Waiting for incoming connection..."

It probably needs a separate issue. Async already has this but we obviously want to do it in a generic way that everyone can take advantage of.

Updated by Eregon (Benoit Daloze) 2 months ago

That sounds very similar to Thread#name=.
I think it would be worth having its own feature request if you'd like it in core.

Updated by ioquatix (Samuel Williams) 2 months ago

matz (Yukihiro Matsumoto) has said that he is happy with the current implementation, which is great news. However, there is still some discussion around naming, mostly that ko1 (Koichi Sasada) said that Thread#scheduler is not meaningful enough.

So, I wanted to give some background of where this naming convention comes from so that the meaning can be established.

Thread#scheduler came from Java's Loom proposal, as well as Crystal's scheduler implementation. It's based on existing language design and terminology. Here is the current interface:

# Returns scheduler if execution context is non-blocking:
Thread.scheduler

# Getter and setter:
Thread.current.scheduler
Thread.current.scheduler=

Eregon (Benoit Daloze) made one suggestion which was to change Thread.scheduler to Thread.active_scheduler. I'm not sure active is the right word here, but I understand the intention is to clarify under what circumstances that it returns a non-nil value (depending on non-blocking execution context). Internally, at one point we had rb_thread_scheduler_if_nonblocking but it feels so long. However, that's literally what that is doing, so maybe it should be Thread.current.scheduler_if_nonblocking. Frankly speaking, we don't expect many people to use this interface, it's mostly for tests (to check if the operation is blocking or not), so I'd be okay with that (matching the C interface). We could also check if it's possible to remove this interface completely (maybe okay).

Regarding the specific feedback:

no clue for scheduler for what? could be Process / Thread / Fiber or whatever

To me, it's the "Thread's scheduler instance" for handling scheduling operations (like switching between execution contexts). Regarding Process or Thread scheduler, that job is up to operating system so I'm not sure it's relevant distinction. But there is no reason why you can't make a multi-thread scheduler with the proposed interface, by using Thread#stop and Thread#wakeup:

class Thread::Scheduler
    def run
        # event loop
        # ...
        ready.wakeup # resume thread
    end

    def io_wait(io, events, timeout)
        # schedule in event loop ...
        Thread.current.stop
    end
end

scheduler = Thread::Scheduler.new

# Scheduler event loop (could be one or more)
Thread.new do
    # Run event loop in one thread (or more)
    scheduler.run
end

# User level thread scheduling with event loop:
Thread.new(scheduler: scheduler) do
    io.wait_readable # Invokes `Thread#sleep`
end

So to me, this interface is not about fiber scheduling, it's too specific. It also means Fiber.schedule might be wrong name. Or might want to introduce Thread.schedule too for green threads using the scheduler. These are all ideas. But the point, is, making the name scheduler more specific may be confusing as the future of this interface expands to fill the needs of users.

For more references regarding the use of this name and the meaning behind it, please see the following references:

Updated by Eregon (Benoit Daloze) 2 months ago

Since Thread.scheduler is removed, I think Thread#scheduler is unambiguous: it's a scheduler per-Thread, so a scheduler of Fibers.

Anyway, I don't imagine a useful thread scheduler or process scheduler written in Ruby anytime soon, that's the job of the operating system (as long as we have native threads). Using a multi-threads scheduler like the example above just seems like a less efficient (in footprint & switching overhead) way than using non-blocking fibers, so it seems of very limited usefulness in practice.

In any case, IMHO Thread#scheduler is fine and clear.
If we ever come with other ideas for schedulers, those can have a qualified name.
And it sounds rather unlikely to me we'll ever have other scheduler interfaces in Ruby.

Updated by matz (Yukihiro Matsumoto) 2 months ago

I am still strongly against Thread#scheduler. As Eregon (Benoit Daloze) stated, we are not going to have other schedulers any time soon. But the concept of scheduler without scheduling target could confuse users. It should be qualified from the start.

Matz.

Updated by ioquatix (Samuel Williams) 2 months ago

Okay, considering that this proposal is about concurrency (a subset of asynchronous execution), we could use the following taxonomy:

Thread#concurrency_scheduler=
Thread#concurrency_scheduler

Fiber.concurrently do
end

To me, this is a very clear expression of "what it does" rather than "how it's implemented" (which could change).

Updated by Eregon (Benoit Daloze) about 2 months ago

The term "concurrency" seems way too abstract to me for this scheduler.
I think the more obvious the better, this schedules Fibers, so I think Thread#fiber_scheduler{,=} and Fiber.schedule {} fit nicely.

Updated by matz (Yukihiro Matsumoto) 28 days ago

After discussion with ioquatix (Samuel Williams), we have decided the method to specify the fiber scheduler should be Fiber#use_scheduler(sch).
It should not be an attribute setter because it is per thread value. The use may be other words e.g. set. ioquatix (Samuel Williams) is investigating his dictionary.

I proposed Fiber.default_scheduler_class = c as well. But it is postponed after Ruby3.0 because of the potential race condition. We need more time to discuss (about implementation).

Matz.

Updated by ioquatix (Samuel Williams) 28 days ago

Here is the PR for the changes as discussed: https://github.com/ruby/ruby/pull/3742

Updated by Eregon (Benoit Daloze) 28 days ago

In the PR above, Fiber.set_scheduler(value) is used.
Fiber.set_scheduler(value) feels weird and inconsistent for Ruby. It should be Fiber.scheduler = value per Ruby naming conventions.
I think better to use another term like Fiber.use_scheduler or so.

It is very confusing that this new method doesn't clarify the scheduler is only set for the current Thread, that's IMHO a major issue of this new method. I would call it a flaw even.
Fiber.foo should either be global, or affect only the current Fiber (e.g. Fiber.current).
Not affect all Fibers of the current Thread, that's just confusing and unexpected.

Also, since there seems to be Thread.fiber_scheduler, then Thread.fiber_scheduler = value would the most obvious API, and it's very clear, isn't it? It's also what we discussed on the meeting I attended, and IMHO that was a very clear and nice API.

With a method on Fiber, we would need something ridiculously long to actually make it clear:
Fiber.use_scheduler_for_all_new_fibers_on_current_thread(scheduler).

I think there is no point to to try to hide that the level of concurrency above Fibers is Threads, that is unlikely to change, and if it would we would need to redesign many APIs anyway.

Updated by Eregon (Benoit Daloze) 28 days ago

I added this issue to the next dev meeting (#17299).

BTW, matz mentioned Fiber#use_scheduler(sch) (instance method) above but the PR is Fiber.use_scheduler(sch) (class method).
An instance method would already make more sense to me, but might be rather inconvenient:

fiber = Fiber.new do
  ...
end
fiber.use_scheduler(scheduler)
# or the obvious
fiber.scheduler = scheduler

Setting it on the Thread is much more convenient though.

Updated by matz (Yukihiro Matsumoto) 27 days ago

I am sorry. I meant Fiber.use_scheduler(sch) rather than Fiber#use_scheduler(sch). It was just a typo.

Matz.

Updated by Eregon (Benoit Daloze) 27 days ago

matz (Yukihiro Matsumoto) wrote in #note-68:

I meant Fiber.use_scheduler(sch) rather than Fiber#use_scheduler(sch)

Right, that's what I guessed.
So the problem I mentioned in https://bugs.ruby-lang.org/issues/16786#note-66 remains, it's very confusing for Fiber.foo to affect "all new fibers of the current thread" and not all fibers globally or only the current fiber.
Thread.current.fiber_scheduler = scheduler seems 100x clearer. Are there any downside to it?

Updated by Eregon (Benoit Daloze) 23 days ago

I had a call with ioquatix (Samuel Williams) .
He doesn't like Thread.current.fiber_scheduler= (mixing too many things).
He says Fiber.set_scheduler or Thread.current.scheduler= are the best options in his view.

The method names Fiber.set_scheduler or Fiber.scheduler= mean they set state on the receiver.
The receiver is the Fiber class, which is global. So it must set global state, and I would think almost every Rubyist would think like that when reading code doing Fiber.set_scheduler(SomeScheduler.new).
That's highly misleading, because it actually sets per-thread state, and that's the only meaningful place to store the scheduler.
The scheduler is per-thread. Even if we could move Fibers between threads, we'd still want per-thread schedulers (maybe connected to a global scheduler).

Given that, I think Thread.current.scheduler= would be infinitely better.
I think Thread.current.fiber_scheduler= would be fine too (length doesn't matter, it's an API that would most often be wrapped by a gem), but Samuel really doesn't like it, and this scheduler is his project so I respect that.
If the idea is to avoid Thread because it might be deprecated one day, I think that's an illusion (and even if it happened, they would be many more methods to move, so this one method is really not a big deal).
There are many cases where Threads are useful, and deprecating them would just be a huge mistake. There are plenty of code out there using Threads (including Rails, Sidekiq, etc) which either can't be replaced with Ractor or would need to be far more complicated with Ractor. Ractor is also great, but isn't always the answer, as everyone says, use the right tool for the job.

If it must absolutely be on Fiber, then I'd suggest Fiber.setup_scheduler, at least the name doesn't implies "sets state on the receiver" confusingly.

Updated by Dan0042 (Daniel DeLorme) 22 days ago

Would this be too crazy?

Fiber.schedulers #=> Hash
Fiber.schedulers.default = sch
Fiber.schedulers[Thread.current] = sch

Updated by Eregon (Benoit Daloze) 22 days ago

Dan0042 (Daniel DeLorme) wrote in #note-71:

Fiber.schedulers[Thread.current] = sch

It seems weird to me to use a Thread as a key for some state on "Fiber".

I think there is no point to hide the fact this state is per-thread. So, it should be an instance method of Thread.

Updated by ioquatix (Samuel Williams) 19 days ago

I am happy with Fiber.set_scheduler. I'm also happy with Fiber.scheduler= (canonical Ruby style) but I accept and agree with Matz's reasoning behind the former.

I originally preferred Thread.current.scheduler= because it seemed logical, but there is no case where you can do this for some other thread, i.e.

thread = Thread.new{...}
thread.scheduler = ... # invalid

In addition, I got a strong feeling from the discussion that this feature should relate to "fiber scheduler" and I agree.

Therefore, I'm happy to keep this new interface centered around Fiber. In the future, it might be process or ractor global (if Thread is removed for example). Or some alternative implementation of Ruby that doesn't support Threads. We don't know the future, and this interface hides implementation details which might change, which is one of its biggest advantages.

Updated by matz (Yukihiro Matsumoto) 15 days ago

I am happy with Fiber#set_scheduler().

Matz.

Updated by matz (Yukihiro Matsumoto) 15 days ago

When implementation fixed, could you update the document? Also, I'd like you to rename scheduler.md to something else (e.g. `asyncfiber.md').

Matz.

Also available in: Atom PDF