Project

General

Profile

Actions

Feature #21557

open

Ractor.shareable_proc to make sharable Proc objects, safely and flexibly

Added by Eregon (Benoit Daloze) 1 day ago. Updated 1 day ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:123136]

Description

Following #21039 and #21550, this is a complete proposal which does not require reading these previous proposals (since that caused some confusion).
That way, it is hopefully as clear as possible.
It also explains how it solves everything we discussed in the previous tickets.
It solves all real-world examples from https://bugs.ruby-lang.org/issues/21550#note-7.

To use Ractor effectively, one needs to create Procs which are shareable between Ractors.
Of course, such Procs must not refer to any unshareable object (otherwise the Ractor invariant is broken and segfaults follow).

One key feature of blocks/Procs is to be able to capture outer variables, e.g.:

data = ...
task = -> { do_work(data) }

Ractor shareable procs should be able to use captured variables, because this is one of the most elegant ways to pass data/input in Ruby.

But there is a fundamental conflict there, reassigning captured variables cannot be honored by shareable procs, otherwise it breaks the Ractor invariant.
So creating a shareable proc internally makes a shallow copy of the environment, to not break the Ractor invariant.
We cannot prevent assigning local variables (i.e. raise an exception on foo = value), that would be way to weird.
But we can raise an error when trying to create a shareable proc in an incompatible situation, that makes it safe by preventing the unsafe cases.

Reassigning a captured variable inside the block

Concretely, it seems we all already agree that this should be a Ractor::IsolationError:

def example
  a = 1
  b = proc { v = a; a += 1; v }
  r = Ractor.shareable_proc(&b) # Ractor::IsolationError: cannot isolate a block because it accesses outer variables (a) which are reassigned inside the block
  [b, r]
end
example.map(&:call)

And that's because without the error the result would be [1, 1] which is unexpected (it should be [1, 2]), r.call should have updated a to 2 but it only updated a in its environment copy.
That basically breaks the lexical scoping of variables captured by blocks.
We can check this by static analysis, in fact we already use static analysis for Ractor.new: a = 0; Ractor.new { a = 2 } which gives can not isolate a Proc because it accesses outer variables (a). (ArgumentError).

Reassigning a captured variable outside the block

The second problematic case is:

# error: the code clearly assumes it can reassigns `a` but the `shareable_proc` would not respect it, i.e. `shareable_proc` would break Ruby block semantics
# Also note the Ractor.shareable_proc call might be far away from the block, so one can't tell when looking at the block that it would be broken by `shareable_proc` (if no error for this case)
def example
  a = 1
  b = proc { a }
  Ractor.shareable_proc(&b) # Ractor::IsolationError: cannot isolate a block because it accesses outer variables (a) which are reassigned outside the block
  a = 2
end

This is very similar (it is the symmetric case), the shareable_proc cannot honor the a = 2 assignment, so it should not allow creating a shareable_proc in that context and should be Ractor::IsolationError.

If you don't see the issue in that small example, let's use this example:

page_views = 0

background_jobs.schedule_every(5.seconds) {
  puts "#{page_views} page views so far"
}

threaded_webserver.on("/") do
  page_views += 1
  "Hello"
end

If background_jobs uses Thread, everything is fine.
If it uses Ractor, it needs to make that schedule_every block shareable, and if we don't add this safety check then it will always incorrectly print 0 page views so far.
This is what I mean by breaking Ruby block semantics.
In this proposal, we prevent this broken semantics situation by Ractor::IsolationError when trying to make that schedule_every block shareable.

One more reason here to forbid this case is that a block that is made shareable is never executed immediately on the current Ractor, because there is no need to make it shareable for that case. And so it means the block will be executed later, by some other Ractor.
And that block, if it expects to be executed later, then definitely expects to see up-to-date captured variables (as in the author of the block expects that).

We would check this situation by static analysis.
There are multiple ways to go about it with trade-offs between precision and implementation complexity.
I think we could simplify to: disallow Ractor.shareable_proc for any block which captures a variable which is potentially reassigned.
In other words, only allow Ractor.shareable_proc if all the variables it captures are assigned (exactly) once.
More on that later in section Edge Cases.

Ractor.new

Note that everything about Ractor.shareable_proc should also apply to Ractor.new, that way it's convenient to pass data via captured variables for Ractor.new too, example:

x = ...
y = ...
Ractor.new { compute(x, y) }

Currently Ractor.new does not allow capturing outer variables at all and needs workarounds such as:

x = ...
y = ...
Ractor.new(x, y) { |x, y| compute(x, y) }

define_method

define_method (and of course define_singleton_method too) have been an issue since the beginning of Ractors,
because methods defined by define_method just couldn't be called from a Ractor (because the block/Proc wouldn't be shareable and so can't be called from other Ractors).
A workaround is to make the block/Proc shareable, but this is inconvenient, verbose and shouldn't be necessary:

def new_ostruct_member!(name) # :nodoc:
  unless @table.key?(name) || is_method_protected!(name)
    if defined?(::Ractor)
      getter_proc = nil.instance_eval{ Proc.new { @table[name] } }
      setter_proc = nil.instance_eval{ Proc.new {|x| @table[name] = x} }
      ::Ractor.make_shareable(getter_proc)
      ::Ractor.make_shareable(setter_proc)
    else
      getter_proc = Proc.new { @table[name] }
      setter_proc = Proc.new {|x| @table[name] = x}
    end
    define_singleton_method!(name, &getter_proc)
    define_singleton_method!("#{name}=", &setter_proc)
  end
end

Instead, this proposal brings the idea for define_method to automatically call Ractor.shareable_proc on the given block/Proc (and fallback to the original Proc if it would raise), as if it was defined like:

def define_method(name, &body)
  body = Ractor.shareable_proc(self: nil, body) rescue body
  Primitive.define_method(name, &body)
end

(note that define_method knows the body Proc's self won't be the original self anyway, so it's fine to change it to nil)

This way workarounds like above are no longer needed and the code can be as simple as it used to be:

def new_ostruct_member!(name) # :nodoc:
  unless @table.key?(name) || is_method_protected!(name)
    define_singleton_method!(name) { @table[name] }
    define_singleton_method!("#{name}=") { |x| @table[name] = x }
  end
end

Much nicer, and solves a longstanding issue with Ractor.

There should be no compatibility issue since the block is only made shareable when it's safe to do so.
This is another argument for making Ractor.shareable_proc safe.

Ractor.shareable_proc and Ractor.shareable_lambda

I believe we don't need Ractor.shareable_lambda (mentioned in other tickets).
Ractor.shareable_proc should always preserve the lambda-ness (Proc#lambda?) of the given Proc.
The role of Ractor.shareable_proc is to make the Proc shareable, not change arguments handling.
If one wants a shareable lambda they can just use Ractor.shareable_proc(&-> { ... }).

BTW, the added value of Ractor.shareable_proc(self: nil, &proc) vs just Ractor.make_shareable(proc, copy: true) is that it enables changing the receiver of the Proc without needing nil.instance_eval { ... } around, and it is much clearer.

Ractor.make_shareable(proc) should be an error as mentioned here, because it would mutate the proc inplace and that's too surprising and unsafe (e.g. it would break Proc#binding on that Proc instance).
Ractor.make_shareable(proc, copy: true) can be the same as Ractor.shareable_proc(self: self, &proc) (only works if self is shareable then), or an error.

Edge Cases

For these examples I'll use enqueue, which defines a block to execute later, either in a Thread or Ractor.
For the Ractor case, enqueue would make the block shareable and send it to a Ractor to execute it.
This is a bit more realistic than using plain Ractor.shareable_proc instead of enqueue, since it makes it clearer the block won't be executed right away on the main Ractor but later on some other Ractor.

Nested Block Cases

If the assignment is in a nested block, it's an error (this case is already detected for Ractor.new BTW):

a = 1
enqueue { proc { a = 1 } } # Ractor::IsolationError: cannot isolate a block because it accesses outer variables (a) which are reassigned inside the block

Similarly, if the assignment is in an some block outside, it's the same as if it was assigned directly outside:

a = 1
p = proc { a = 2 }
enqueue { a } # Ractor::IsolationError: cannot isolate a block because it accesses outer variables (a) which are reassigned outside the block

Loop Cases

This would be a Ractor::IsolationError, because a is reassigned.
It would read a stale value and silently ignore reassignments if there was no Ractor::IsolationError.

a = 0
while condition
  enqueue { p a } # Ractor::IsolationError: cannot isolate a block because it accesses outer variables (a) which are reassigned outside the block
  a += 1
end

This is the same case, using a rescue-retry loop:

begin
  enqueue { p a } # Ractor::IsolationError: cannot isolate a block because it accesses outer variables (a) which are reassigned outside 
  a += 1
  raise
rescue
  retry if condition
end

A for loop is like a while loop because the LHS variable (a) and all variables in the loop body are actually declared outside (weird, but that's how it is).

for a in enum
  b = rand
  enqueue { p a } # Ractor::IsolationError: cannot isolate a block because it accesses outer variables (a) which are reassigned outside the block
end
binding.local_variables # => [:a, :b]

Any assignment inside one of these loops can potentially happen multiple times, so any variable assigned inside one of these loops cannot be captured by a shareable block (i.e., Ractor::IsolationError when trying to make a shareable block in such a case).
We will need the static analysis to detect such loops. That probably doesn't need a full Control Flow Graph, we just need to determine if an assignment is "inside a while/for/retry" loop (up to a scope barrier like def/class/module).

Regular "loops" using blocks are fine though, because they create a new environment/frame for each iteration.
These 2 blocks will always see [0, 1] and [0, 2], whether shareable or not:

a = 0
[1, 2].each do |e|
  enqueue { p [a, e] } # OK, each of these variables is assigned only once
end

eval and binding

Static analysis cannot detect eval or binding.
In such an extreme and very rare case the fact that shareable_proc makes a copy of the environment is visible:

a = 1
b = proc { a }
s = Ractor.shareable_proc(&b)
eval("a = 2") # or binding.local_variable_set(:a, 2), or b.binding.local_variable_set(:a, 2)
b.call # => 2
s.call # => 1

This seems unavoidable, unless we prevent shareable procs to use captured variables at all (quite restrictive).
BTW, Proc#binding is already not supported for a shareable_proc:

$ ruby -e 'nil.instance_exec { a = 1; b = proc { a }; b2 = Ractor.make_shareable(b); p b2.binding }'
-e:1:in `binding': Can't create Binding from isolated Proc (ArgumentError)

So binding/eval is in general already not fully respected with Ractor anyway (and cannot be).

Multiple Assignments Before

This simple example assigns a twice.
It would be safe because a is always assigned before (in execution, not necessarily in source order) creating the block instance/Proc instance, but it is not so easy to detect. Depending on how precise the static analysis is it might allow this case. We can always allow more later and start with something simple.

a = 1
a = 2
Ractor.shareable_proc { a } # Ractor::IsolationError if using the single-assignment-only static analysis, seems OK because not so common

Error Message

Differentiating ... which are reassigned inside/outside the block might be needlessly complicated, in such a case I think it's fine to simplify the error message and omit the part after ... which are reassigned. The important part is that the outer variable is reassigned, not whether it's inside or outside.

Alternatives

Relaxing the checks for literal blocks

Kernel#lambda for example has behavior which depends on whether it's given a literal block or not:

lambda(&proc {}) # => the lambda method requires a literal block (ArgumentError)

We could have such a difference, but I don't think it's very useful, if a variable is reassigned, it seems a bad idea to capture and shallow-copy with a shareable proc (unclear, unsafe, etc).
The semantics are also simpler if they are the same whether the block is literal or not.

Removing the checks for reassigning a captured variable outside the block

That's basically option 1 of #21550.
This would allow known unsafe behavior and break the examples shown above (i.e. it would break code in a nasty way: some assignments are silently ignored, good luck to debug that).
It would be hard to then forbid such cases later as it could then be considered incompatible.
In my opinion we would commit a big language design mistake if we just give up and allow known unsafe cases like that, people wouldn't be able to trust that local variable assignments are respected (pretty fundamental, isn't it?) and that Ruby blocks behave as they always have been (with lexical scoping for local variables).
It would also do nothing to help with define_method.
Ractor.shareable_proc(Proc) is currently unsafe (the main point of #21039), let's address it, not ignore known problems, especially after a lot of discussion and thoughts on how to solve it properly.


Related issues 2 (1 open1 closed)

Related to Ruby - Feature #21039: Ractor.make_shareable breaks block semantics (seeing updated captured variables) of existing blocksClosedko1 (Koichi Sasada)Actions
Related to Ruby - Feature #21550: Ractor.shareable_proc/shareable_lambda to make sharable Proc objectOpenko1 (Koichi Sasada)Actions
Actions #1

Updated by Eregon (Benoit Daloze) 1 day ago

  • Description updated (diff)
Actions #2

Updated by Eregon (Benoit Daloze) 1 day ago

  • Description updated (diff)
Actions #3

Updated by Eregon (Benoit Daloze) 1 day ago

  • Related to Feature #21039: Ractor.make_shareable breaks block semantics (seeing updated captured variables) of existing blocks added
  • Related to Feature #21550: Ractor.shareable_proc/shareable_lambda to make sharable Proc object added
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0