Project

General

Profile

Feature #17285

Less strict `Ractor.select`

Added by marcandre (Marc-Andre Lafortune) about 1 month ago. Updated about 1 month ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:100543]

Description

Summary: could we have a way for Ractor.select to skip ractors with closed queues and raise only if no ractor with an open queue remains?

Detail:

I backported Ractor for earlier Ruby versions, as I'd like to use it in some gems that would work great in 3.0 and work ok in older Rubies without rewriting. That was a lot of fun :-)

One surprise for me was that Ractor.select enforces that no given ractor is terminated(*).

This means that one must remove terminated ractors from a pool of ractors before calling select again:

pool = 20.times.map { Ractor.new{ do_processing } }

20.times do
  ractor, result = Ractor.select(*pool)
  handle(result)
  pool.delete(ractor) # necessary!
end

0) This can be tedious, but I know I'm very lazy

1) It is not convenient to share a pool between different ractors. Try writing code that starts 5 ractors that would consume the results from pool above.

2) It might require special synchronization if the ractors may yield a variable number of values:

def do_processing
  rand(10).times do {
    Ractor.yield :a_result
  }
  :finish
end

pool = 20.times.map { Ractor.new{ do_processing } }

until pool.empty? do
  ractor, result = Ractor.select(*pool)
  if result == :finish
    pool.delete(ractor)
  else
    do_something_with(result)
  end
end

I would like to propose that it would be allowed (by default or at least via keyword parameter) to call select on terminated ractors, as long as there is at least one remaining open one.

This would make it very to resolve 1 and 2 above. Here's an example combine them both together:

def do_processing
  rand(10).times do {
    Ractor.yield :a_result
  }
  Ractor.current.close # avoid yielding a value at the end
end

pool = 20.times.map { Ractor.new{ do_processing } }.freeze

5.times do # divide processing into 5 ractors
  Ractor.new(pool) do |pool|
    loop do
      _ractor, result = Ractor.select(*pool) # with my proposed lax select
      do_something_with(result)
    end
  end
end

The loop above terminates when Ractor.select raises an error once the whole pool is terminated.

I'm new to actors but my intuition currently is that I will never want to take care of a pool of Ractors myself and would always prefer if Ractor.select did it for me. Are there use-cases where Ractor.select raising an error if it encounters a closed queue is helpful?

Notes:

  • (*) Ractor.select doesn't really enforce ractors to be opened of course, it will work if the ractors are consumed in the right order, like in this example by chance:
10.times.map do
  r = 2.times.map { Ractor.new{ sleep(0.05); :ok } }
  Ractor.select(*r) # Get first available result
  # Don't remove the ractor from `r`
  Ractor.select(*r).last rescue :error  # Get second result
end
 # => [:ok, :error, :error, :error, :error, :error, :error, :ok, :ok, :ok]
  • I think Ractor.select(*pool, yield_value: 42) would raise only if the current outgoing queue is closed, even if the whole pool was terminated
  • Similarly Ractor.select(*pool, Ractor.current) would raise only if the current incomming queue is also closed.

Updated by ko1 (Koichi Sasada) about 1 month ago

For (0), I'm thinking to introduce Ractor::Selector (or similar) to manage the ractors.

sel = Rcator::Selector.new
sel.receive(r1) do |msg| ... end
sel.receive(r2) do |msg| ... end
sel.wait # Ractor.select(r1, r2)

and we can specify what happens when r1 or r2 are closed. maybe remove closed ractor on default? need more consideration.

This interface can be implemented with current Ractor.select so we can try it on ractor gem, for future trial, I guess.
(Ractor.select has performance issue, so I think Ractor::Selector should be introduced in core, but we can discuss the API in ractor gem for ruby 3.0)


1) It is not convenient to share a pool between different ractors. Try writing code that starts 5 ractors that would consume the results from pool above.

I didn't consider about this case. is it needed?

2) It might require special synchronization if the ractors may yield a variable number of values:

I'm thinking to catch the Closed exception, and Ractor::Selector can hide it. Is it enough or not enough?

Are there use-cases where Ractor.select raising an error if it encounters a closed queue is helpful?

If you want to recover (restart) the worker ractors, it will help.

Also available in: Atom PDF