Project

General

Profile

Actions

Feature #17592

closed

Ractor should allowing reading shareable class instance variables

Added by marcandre (Marc-Andre Lafortune) 9 months ago. Updated 5 days ago.

Status:
Closed
Priority:
Normal
Target version:
-
[ruby-core:102305]

Description

It would be very helpful if Ractor was allowing reading class instance variables from non-main Ractor.

Currently is raises an IsolationError:

module Foo
  singleton_class.attr_accessor :config
  Foo.config = {example: 42}.freeze
end

Ractor.new { p Foo.config } # => IsolationError

This limitation makes it challenging to have an efficient way to store general configs, i.e. global data that mutated a few times when resources get loaded but it immutable afterwards, and needs to be read all the time.

Currently the only way to do this is to use a constant and use remove_const + const_set (which can not be made atomic easily).

I think that allowing reading only may be the best solution to avoid any race condition, e.g. two different Ractors that call @counter += 1.

The only 3 scenarios I see here are:
0) declare the constant hack the official way to store config-style data
1) allow reading of instance variables for shareable objects (as long as the data is shareable)
2) allow read-write

I prefer 1)


Related issues

Has duplicate Ruby master - Bug #18193: Accessing global configuration from RactorsClosedActions

Updated by marcandre (Marc-Andre Lafortune) 9 months ago

Forgot to mention an example use-case: the URI global register for schemes see https://github.com/ruby/uri/pull/15

Updated by Eregon (Benoit Daloze) 9 months ago

I think 1) or 2) is much better than 0).
And also this change will make it significantly easier to run existing code on Ractor, or to change the code to make it run with Ractor.

I think read-write would also be fine. But read-only is definitely a good step.
Concurrent @counter += 1 would already be an issue with threads, so it doesn't seem a new issue, callers need to care if they use the previous value.
For the frequent case of assigning a new shareable object, there is no problematic race there.

I suspect internally, reads and writes for module ivars will need synchronization anyway, so I guess implementation-wise it's easy to allow writes.

Something mentioned not so explicitly above is, of course, accessing @ivars of a Module in a non-main Ractor should only be allowed if the value is shareable.
If the value is not shareable, then it must be IsolationError.

Updated by marcandre (Marc-Andre Lafortune) 9 months ago

From a discussion with ko1 (Koichi Sasada), config could be (or should be?) using TVar.

We need a good solution that is builtin.

If TVar becomes builtin, then that is a possible solution. It seems like overkill for a mostly constant config (e.g. URI scheme list) and is not backwards compatible though.

Allowing reading from instance variables has the advantage of being simple and backward compatible. Just to be clear: TVar seems like interesting addition, but not particularly for config-style global state.

Updated by blowfishpro (Joseph Wong) 9 months ago

Would it be reasonable to force a class/module to be frozen (and its instance variables deep frozen) before being able to access class instance variables from a non-main Ractor?

There would probably need to be a new interface to do this as Ractor.shareable? already returns true for a module and Ractor.make_shareable does nothing to it.

Updated by marcandre (Marc-Andre Lafortune) 9 months ago

blowfishpro (Joseph Wong) wrote in #note-4:

Would it be reasonable to force a class/module to be frozen (and its instance variables deep frozen) before being able to access class instance variables from a non-main Ractor?

Not really, no. Not only is it quite restrictive, but this gains nothing over status quo; you might as well use a constant, and deep-freeze it before you access it from non-main Ractor.

Updated by Dan0042 (Daniel DeLorme) 9 months ago

If a class instance variable refers to a shareable object it would make sense to be able to read it in ractors. But what about reassigning the instance variable? How does the proposal work then?

## in main ractor:
@a = Ractor.make_shareable(data)
# @a is readable in ractors
@a = data
# @a is no longer readable in ractors? assignment disallowed?

## in non-main ractor:
@a = Ractor.make_shareable(data)
# ???

Updated by Eregon (Benoit Daloze) 9 months ago

For simplicity, I think it's probably best to disallow reassigning in non-main Ractors.
It could be allowed though, as long as the value is shareable, then all other Ractors would notice the new value.

Reassigning in main Ractor should be allowed, and then raise when trying to read a non-shareable values from other Ractors.
i.e., the semantics of the main Ractor should not change whether there is only the main Ractor or multiple Ractors.

Updated by marcandre (Marc-Andre Lafortune) 9 months ago

Right. My proposal disallows reassigning from non-main ractors.

Reassigning from main Ractor to a non-shareable data is no problem, but future attempts to read it from non-main Ractor will result in IsolationError (as they do currently).

# Main ractor:
String.singleton_class.attr_accessor :a
String.a = [].freeze
Ractor.new { p Ractor.a }.take # => []
Ractor.new { Ractor.a = nil }.take # => IsolationError (can't reassign from non-main Ractor)
String.a = []
Ractor.new { p Ractor.a } # => IsolationError (can't read non-shareable from non-main Ractor)

Updated by Dan0042 (Daniel DeLorme) 9 months ago

That makes sense, but I'm curious about the implementation. I believe this requires synchronization of every access to a class instance variable?

Updated by marcandre (Marc-Andre Lafortune) 9 months ago

Dan0042 (Daniel DeLorme) wrote in #note-9:

That makes sense, but I'm curious about the implementation. I believe this requires synchronization of every access to a class instance variable?

I'm not sure, but I think not. From non-main Ractor, grab the current value of the class instance variable. I think that can be made safe, e.g. if the ivar storage data uses immutable data structures.

Before returning the value, verify that it is shareable.

Objects have a "shareable" bit, but that might not be set for some shareable objects, so testing if an object is shareable might "modify" it by setting this "shareable" bit (see rb_ractor_shareable_p_continue) but I believe that this can be done simultaneously by different Ractors without issue as it is idempotent and happens only on immutable objects. Of course, ko1 will know if I'm missing something 😅

Updated by Eregon (Benoit Daloze) 9 months ago

Dan0042 (Daniel DeLorme) wrote in #note-9:

That makes sense, but I'm curious about the implementation. I believe this requires synchronization of every access to a class instance variable?

I think essentially all accesses to mutable data on a module must have some form of synchronization, because they can be mutated from the main thread in parallel.
So I'd think it's already synchronized.

This is BTW one thing I dislike about Ractor: @ivar accesses depend on the receiver, if it's a module it's completely different than for all other objects, and requires synchronization.
But, that's already there, so might as well allow reading from other Ractors than being both inconsistent and inconvenient.

Updated by Eregon (Benoit Daloze) 9 months ago

To be more precise, and looking at the source, currently there is no synchronization for module ivars, only a check that only the main Ractor can access them:
https://github.com/ruby/ruby/blob/5803ac1c734568837d2010bd38f122ba24cbae2b/variable.c#L906-L913

The fast path for the "ivar set" bytecode already doesn't handle T_MODULE/T_CLASS:
https://github.com/ruby/ruby/blob/5803ac1c734568837d2010bd38f122ba24cbae2b/vm_insnhelper.c#L1250-L1253

And it ends up here which has the Module-specific logic:
https://github.com/ruby/ruby/blob/5803ac1c734568837d2010bd38f122ba24cbae2b/variable.c#L1480-L1484

Other state like constants and methods already have synchronization, e.g.,
https://github.com/ruby/ruby/blob/5803ac1c734568837d2010bd38f122ba24cbae2b/variable.c#L3621-L3625

Updated by ko1 (Koichi Sasada) 8 months ago

This is a summary for tomorrow's dev-meeting by my understanding.


  • ko1: This ticket proposal: To maintain global configuration (mutable information), class/module instance variables should be readable from other ractors.
module Foo
  singleton_class.attr_accessor :config
  Foo.config = {example: 42}.freeze
end

# Current
Ractor.new{ p Foo.config } # => IsolationError

# Proposal
Ractor.new do
  p Foo.config #=> {example: 42}
  p Foo.config = {example: 43}.freeze
    #=> IsolationError, beacuse it is read-only from non-main ractors
end

Foo.config = {example: 44}.freeze # allowed updating from the main ractor
  • ko1: There are two concerns: (1) atomicity concern and (2) performance concern.

(1) Atomicity concern

If two or more ivars (named @a and @b) should be update atomic, but threre is no way to synchronize them.

class C
  @a = @b = 0
  def self.update
    # assertion: @a, @b should be equal
    @a += 1
    @b += 1
  end
  def self.vars
    [@a, @b]
  end
end

Main ractor can calls C.update and update ivars. A ractor can call C.vars and it can returns inconsist values (@a != @b).

The danger of this concern is relatively low because this example is very artificial. Maybe most of usecase is initialization at loading time and no other ractors read while mutating. Also there is no coupled variables (like @a, @b), there is no problem. For example, there is no problem with only one @config ivar which manages all configrations. In other words, two or more configurations @configA, @configB, ... can have an atomicity issue.

The following code is also artifitial example.

class Fib
  @a = @b = 1
  # @a and @b are successive parts of the Fibonacci sequence.
  def self.next
    @a, @b = @b, @a + @b
  end

  def values
    # it should return successive parts of the Fib seq.
    # == "Fib seq constraint"
    [@a, @b]
  end

  def self.eventloop
    loop{
      Ractor.receive; Fib.next
    }
  end
end

gen = Ractor.new(Ractor.current){|main| loop{ m << true } }
con = Ractor.new{ 
  p Fib.value #=> return values can violate "Fib seq constraint"
}

If a user misused as an above example (using class/module ivars for the mutable state repository and updating them with multiple ractors (via the main-ractor)), it is danger.

For this concern, we have several options.

  • (a) there is no problem to introduce this feature because it is almost safe.
  • (b) Ractor is designed to avoid such consistency issues even if it can be avoided by careful programming. So this feature should not be introduced.

(b) is my position, but I agree it is very conservative.

This proposal has advantage for compatibility because many existing code can use ivars for sharing the global configurations and there is no need to rewrite them (if they only refer to sharable objects).

For example, pp library has one global configuration: sharing_detection which is stored in a class instance variable.

https://github.com/ruby/ruby/blob/master/lib/pp.rb#L109

For Ractor, it was rewrote by using Ractor-local configuration, but it was not ideal modification but ad-hoc modification with existing tools.

If this proposal is introduced, we can revert the ad-hoc modification.

(2) Performance concern

To allow accessing ivars from other ractors, every ivar accesses to class/module should be synchronized. Current implementation doesn't need to synchronize this accesses.

For constants and method search, we implemented const cache and method cache mechanisms. Such cache mechanisms are reasonable because they are not frequently rewriting. On the other hands, ivars can be mutated more frequently and not sure the it is reasonable to have a cache mechanism for it.


ko1: Alternative proposal is using TVar.

Rewriting configuration example with TVar:

module Foo
  Config = Ractor::TVar.new{ {example: 42}.freeze }
end

Ractor.new do
  Ractor.atomically do
    p Foo::Config.value #=> {example: 42}
  end
  ...
  Ractor.atomically do
    Foo::Config.value = {example: 43}.freeze
    # modification is allowed within atomically block
  end
end

Ractor.atomically do
  Foo::Config.value = {example: 44}.freeze
end

The advantage of this example is it is clear that manipulating sharable state between Ractors because Ractor.atomically method is needed. Another way of saying this is that we can become more aware of creating a shared state. With instance variables, it is hard to figure out which instance variables are shared with multiple Ractors.

Disadvantages:

  • Writing Ractor.atomically is long to type.
  • Incompatible with older versions.

Configuration should not be changed frequently, so the performance should not be a problem.

module Foo
  Config = Ractor::TVar.new{ {example: 42}.freeze }
end

Ractor.new do
  p Foo::Config.value #=> {example: 42}
end

Ractor.atomically do
  Foo::Config.value = {example: 44}.freeze
end

Rewriting other examples with TVar.

class C
  A = Ractor::TVar.new 0
  B = Ractor::TVar.new 0

  def self.update
    # assertion: A.value and B.value should be equal
    Ractor.atomically do
      A.value += 1
      B.value += 1
    end
  end
  def self.vars
    Ractor.atomically do
      [A.value, B.value]
    end
  end
end
class Fib
  A = Ractor::TVar.new 1
  B = Ractor::TVar.new 1

  def self.next
    Ractor.atomically do
      A.value, B.value = B.value, A.value + B.value
    end
  end

  def values
    # it should return successive parts of the Fib seq.
    # == "Fib seq constraint"
    Ractor.atomically do
      [A.value, B.value]
    end
  end

  def self.eventloop
    loop{
      Ractor.receive; Fib.next
    }
  end
end

gen = Ractor.new(Ractor.current){|main| loop{ m << true } }
con = Ractor.new{ 
  p Fib.value #=> TVar allows us to keep constraint
}
Fib.eventloop

# NOTE: Using TVars, there is no reason to maintain states
# by a main-ractor, so gen ractor can access Fib.next directly.

gen = Ractor.new{loop{ Fib.next } }
con = Ractor.new{ 
  p Fib.value #=> TVar allows us to keep constraint
}

Updated by ko1 (Koichi Sasada) 8 months ago

Eregon (Benoit Daloze) wrote in #note-11:

This is BTW one thing I dislike about Ractor: @ivar accesses depend on the receiver, if it's a module it's completely different than for all other objects, and requires synchronization.
But, that's already there, so might as well allow reading from other Ractors than being both inconsistent and inconvenient.

Sorry, I couldn't understand this point.
There is no special for class/module as a receiver objects (implementation is special, but no Ruby-level difference).
Do I miss something?

Updated by Eregon (Benoit Daloze) 8 months ago

ko1 (Koichi Sasada) wrote in #note-14:

Sorry, I couldn't understand this point.
There is no special for class/module as a receiver objects (implementation is special, but no Ruby-level difference).
Do I miss something?

There is a behavior difference, before Ractor @foo would always work and never raise.
And @foo = would only raise if the receiver is frozen.

Inside a Ractor, currently @foo works unless self.is_a?(Module), same for @foo = (the inconsistency I mention: the same syntax has widely different semantics based on the receiver).
If we accept this proposal, then at least @foo on a Module works if the value can be safely read (= the value is shareable).

Updated by ko1 (Koichi Sasada) 8 months ago

Eregon (Benoit Daloze) wrote in #note-15:

There is a behavior difference, before Ractor @foo would always work and never raise.
And @foo = would only raise if the receiver is frozen.

OK, you meant sharable or not, but not about class/module (class/module are typical cases, though).

Updated by Eregon (Benoit Daloze) 8 months ago

ko1 (Koichi Sasada) wrote in #note-16:

OK, you meant sharable or not, but not about class/module (class/module are typical cases, though).

Isn't it actually only about class/module though?
For a Fixnum or a Ractor.make_shareable(Object.new), it's allowed to read @ivars. And it's a regular FrozenError to try to write an @ivar since they are frozen.

But for class/module, and I think only for those, it's currently not allowed to even read @ivars, even though class/module are considered shareable.

Updated by marcandre (Marc-Andre Lafortune) 8 months ago

ko1 (Koichi Sasada) wrote in #note-13:

(2) Performance concern

To allow accessing ivars from other ractors, every ivar accesses to class/module should be synchronized.

I am surprised this is the case. Would it not be possible to have overwriting an existing instance variable be a single memory write operation? Adding a new instance variable would similarly involve creating a new ivar table and setting it with a single memory write operation?

Current implementation doesn't need to synchronize this accesses.

That is not completely accurate. Current implementation does not but it is not 100% safe. The following completely artificial code crashes (sometimes):

class String
  def self.test
    4000.times.map { |i| eval(<<~RUBY) }.each(&:join)
      Thread.new do
        10.times do
          Thread.pass
          @x#{i} = 42
        end
      end
    RUBY
  end
end

String.test
p String.instance_variables.size # => 4000
15:19][~/temp(master)]$ ruby temp.rb
4000
[15:20][~/temp(master)]$ ruby temp.rb
4000
[15:20][~/temp(master)]$ ruby temp.rb
4000
SEGV received in SEGV handler
[BUG] Segmentation fault at 0x0000000000000380
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin18]

-- Crash Report log information --------------------------------------------
   See Crash Report log file under the one of following:
     * ~/Library/Logs/DiagnosticReports
     * /Library/Logs/DiagnosticReports
   for more details.
Don't forget to include the above Crash Report log file in bug reports.

-- Machine register context ------------------------------------------------
 rax: 0x0000000000000000 rbx: 0x0000000000000000 rcx: 0x0000000000000000
 rdx: 0x0000000000000000 rdi: 0x00007f89c3b7cc00 rsi: 0x00007000302b5ea4
 rbp: 0x00007000302b5e60 rsp: 0x00007000302b5e40  r8: 0x0000000000198bf6
  r9: 0xffffffff00000000 r10: 0x0000000106636d48 r11: 0x00007000302b5ea4
 r12: 0x00007000302b5ed0 r13: 0x00007000302b5e88 r14: 0x00007f89c3b7cc00
 r15: 0x00007f89c3b7cc00 rip: 0x00000001063a2f5c rfl: 0x0000000000010206

-- C level backtrace information -------------------------------------------
Abort trap: 6

Updated by Eregon (Benoit Daloze) 8 months ago

marcandre (Marc-Andre Lafortune) wrote in #note-18:

I am surprised this is the case. Would it not be possible to have overwriting an existing instance variable be a single memory write operation? Adding a new instance variable would similarly involve creating a new ivar table and setting it with a single memory write operation?

That would lose writes if there are concurrent writes for both a new variable and an existing one (illustration).
It always needs some form of synchronization, or more indirections (e.g., chaining).

Updated by Eregon (Benoit Daloze) 8 months ago

I should add, instance variable writes on modules on TruffleRuby are already synchronized (for modules assigned to a constant, and effectively reachable by multiple threads), and that doesn't seem to be a performance issue at all.
Reading can be done without synchronization for existing variables (just need to check if the ivar storage is large enough if it's only growing).

Updated by marcandre (Marc-Andre Lafortune) 8 months ago

Sorry, I should have been more clear. Synchronization for writing: yes (as it should probably already be the case, see my crashing example). I meant that the access for reading wouldn't have to be synchronized I believe.
In short: allowing reads from Ractors doesn't add any need for synchronization

Updated by marcandre (Marc-Andre Lafortune) 8 months ago

Additionally, my suggestion would make synchronization not necessary across threads in the sense that multiple writes from different threads would simply produce "bad" results but wouldn't crash. It would be up to developpers to use Mutex in those cases. Still, probably best to synchronize writes on the main Ractor threads.

Actions #23

Updated by Dan0042 (Daniel DeLorme) 8 months ago

marcandre (Marc-Andre Lafortune) wrote in #note-18:

Adding a new instance variable would similarly involve creating a new ivar table and setting it with a single memory write operation?

That seems like it would work. Then only writes would need to be synchronized.
But my understanding is that ivar tables are not ruby objects that can be automatically garbage-collected. You would need to deallocate the old table at some point when it is safe to do so (no readers currently using it).

Updated by marcandre (Marc-Andre Lafortune) 8 months ago

  • Status changed from Open to Assigned

This has been accepted 🎉

Follow-up question: could we introduce this change in 3.0.x line? Otherwise we have to wait until 3.1 and many developers might resort to the const_set hack in the meantime...

Actions #25

Updated by Eregon (Benoit Daloze) about 1 month ago

  • Has duplicate Bug #18193: Accessing global configuration from Ractors added

Updated by ko1 (Koichi Sasada) 5 days ago

  • Status changed from Assigned to Closed
Actions

Also available in: Atom PDF