Project

General

Profile

Feature #17100

Updated by sawa (Tsuyoshi Sawada) over 4 years ago

# Ractor: a proposal for a new concurrent abstraction without thread-safety issues 

 ## Abstract 

 This ticket proposes a new concurrent abstraction named "Ractor", Ruby's  
 Actor-like feature (not an exact Actor-model). 

 Ractor achieves the following goals: 

 * Parallel execution in a Ruby interpreter process 
 * Avoidance of Avoid thread-safety issues (especially race issues) by limiting the object sharing 
 * Communication via copying and moving 

 I have been I'm working on this proposal for in a few years. The years, and the project name has been "Guild", but was  
 "Guild". I renamed it from Guild to Ractor following because of Matz's preference. 

 Resources: 
 * Proposed specification: https://github.com/ko1/ruby/blob/ractor_parallel/doc/ractor.md 
 * My talk: my talk 
   * (latest, but written in Japanese) http://atdot.net/~ko1/activities/2020_ruby3summit.pdf 
   * (old, API is was changed) http://atdot.net/~ko1/activities/2018_rubykaigi2018.pdf 
   * (old, API is was changed) http://atdot.net/~ko1/activities/2018_rubyconf2018.pdf 

 Current implementation is not complete (contains many bugs) (many bugs are remaining) but it passes the current CI. 
 I propose to merge it soon and try the API, and APIs, continue to continue working on develop the implementation on a master branch. 

 ## Background 

 MRI doesn't provide an in-process parallel computation feature because  
 parallel "Threads" have has many issues: issues. 

 * Ruby programmers need to consider about Thread-safety more. 
 * Interpreter developers need to consider about Thread-safety more. 
 * Interpreter will slow down in single thread execution because of fine-grain synchronization without clever optimizations. 

 The reason for of these issues is "shared-everything" thread model. 

 ## Proposal 

 To overcome the issues on multiple-threads, the Ractor abstraction is proposed. 
 This proposal consists of two-layers: memory model and communication model. 

 Basics: Basic: 
 * Introduce "Ractor" as a new concurrent entity. 
 * Ractors run in parallel. 

 Memory-model: 
 * Separate "shareable" objects and "un-shareable" "unshareable" objects among ractors between parallel running in parallel. ractors. 
    * Shareable objects: Shareable-objects: 
      * Immutable objects (frozen objects and only refer to shareable objects) 
      * Class/module objects 
      * Special shareable objects (Ractor objects, and so on) 
    * Un-shareable objects: Unshareable-objects:  
      * Other objects 
 * Most of objects are "un-shareable", which "unshareable", it means we (Ruby programmers and interpreter developers) don't need to care about thread-safety in most many cases. 
 * We only concentrate on synchronizing to synchronize "shareable" objects. 
 * Compared Compare with completely separated separating memory model (like MVM proposal), the programming will be easier. easier (). 
 * This model is similar to Racket's `Place` abstraction. 

 Communication-model: 
 * Actor-like (not same) message passing using with `Ractor#send(obj)` and `Ractor.recv` 
 * Pull-type communication using with `Ractor.yield(obj)` and `Ractor#take` 
 * Support for multiple waiting using with `Ractor.select(...)` 

 Actor-like model is the origin of the why we name of our this proposal "Ractor" (Ruby's actor). However, currently, However now it is not an Actor model because we can't select the message (with pattern-match as in on Erlang, Elixir, ...). This It means that we can't have has multiple communication channels. Instead of adopting an incomplete actor model, this proposal provides has `yield`/`take` pair to handle provide multiple channels. We discuss Discuss later about this topic later. topic. 

 I strongly believe the memory model is promising. 
 However, I'm not sure if the communication model is the best. 
 This is why I introduced "experimental" warning. 

 Proposed specification: https://github.com/ko1/ruby/blob/ractor_parallel/doc/ractor.md 

 ## Implementation 

 https://github.com/ruby/ruby/pull/3365 
 All GH actions pass. passes. 

 I describe the implementation briefly. 

 ### `rb_ractor_t` 

 Without Ractor, the VM-Thread-Fiber hierarchy is like this: here: 

 * The VM `rb_vm_t` manages running threads (`rb_thread_t`). 
 * A thread (`rb_thread_t`) points to a running fiber (`rb_fiber_t`). 

 With Ractor, we introduce a introduced new layer `rb_ractor_t`: `rb_ractor_t` 

 * The VM `rb_vm_t` manages running ractors (`rb_ractor_t`). 
 * A Ractor manages running threads (`rb_thread_t`). 
 * A thread (`rb_thread_t`) points to a running fiber (`rb_fiber_t`). 

 `rb_ractor_t` has a GVL to manage threads (only one among thread of Ractor's threads can run). 

 Ractor implementation is located in `ractor.h`, `ractor.c` and `ractor.rb`. 

 ### VM-wide lock 

 VM-wide lock is introduced to protect VM global resources such as object space. object-space and so on. 
 It should allow the recursive lock, so the implementation is a monitor. We shall So we may need to call it VM-wide monitor. For now, 
 Now `RB_VM_LOCK_ENTER()` and `RB_VM_LOCK_LEAVE()` are provided provide to acquire/release the lock. 

 Note that it is different from the (current) GVL. 
 GVL is acquired anytime you want to run a the Ruby thread. threads. 
 VM-wide lock is acquired only when accessing the VM-wide resources. 

 On single ractor mode (all Ruby scripts except my tests)  

 ### Object management and GC 

 * (1) All ractors share the object space. 
 * (2) Each All GC event events will stop all ractors, and a ractor does do GC work with barrier synchronization. 
   * Barrier at `gc_enter()` 
   * marking, (lazy) sweeping, ... 
 * (3) Because all of the object space is shared by ractors, object creation is protected with VM-wide lock. 

 (2) and (3) have has huge impact on performance. 
 The plan is: plans are: 

 * For (2), introduce (semi-)separated object space. It would require a requires long time and Ruby 3.0 can't employ this technique. 
 * For (3), introduce free slot cache for the every ractor; then ractor and most creations of creation can be done without synchronization. It will be employed soon. 

 ### Experimental warning 

 Currently, Now Ractor implementation and specification are is not stable. So upon its that the first usage, usage of `Ractor.new` will show a warning: 

 `warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.` 

 ## Discussion 

 ### Actor-based and channel-based 

 I think there are two message passing approaches: approach: Actor-based (as in Erlang, (Erlang, ...) and channel-based (as in Go, (Go, ...). 

 With channel-based approach, channel-based, it is easy to manipulate multiple channels because it manages them multiple channels explicitly. With Actor-based approach manipulates approach, to manipulate multiple channels with message pattern. The receiver Receiver can ignore ignore/pending the unexpected structured messages or put them on hold message and can handle them it after the behavior has is changed (role of actor has is changed). 

 Ractor has `send/recv` like Actor-model, but there is no pattern matching feature. This is because we can't introduce new syntax, syntax and I can't design a good API. 

 With channel-based Channel-based approach, it is easy to design the API (for example, do `ch = Ractor::Channel.new` and share the `ch` that for ractors can provide). However, However I can't design a good API to handle exceptions among between Ractors. 

 Regarding To consider the error handling, we propose a hybrid model using `send/recv`, `yield/take` pairs. `Ractor#take` can receive the source ractor's exception (like `Thread#join`). On Actor approach, we can detect when the destination Ractor is not working (killed) upon when `Ractor#send(obj)`. A receiver Receiver ractor (waiting for `Ractor.recv`) cannot can not detect the sender's trouble, but maybe the priority is not a high. `Ractor#take` also detects sender's (`Ractor.yield(obj)`) error, so the error propagation can be propagated. done. 

 To handle multiple communication channels on Ractor, instead of using multiple channels, we but use *pipe* ractors. 

 ``` 
 # worker-pool (receive by send) 

 main # pipe.send(obj) 
 -> pipe # Ractor.yield Ractor.recv 
   -> 
     worker1 # Ractor.yield(some_task pipe.take)) 
     worker2 # Ractor.yield(some_task pipe.take)) 
     worker3 # Ractor.yield(some_task pipe.take)) 
 -> main # Ractor.select(worker1, worker2, worker3) 

 # if worker* causes an error, main can detect the error. 
 ``` 

 *pipe* ractors may look like channels. seems channel. However, we don't need to introduce new classes class with this technique (the implementation (implementation can omit Ractor creation for pipe ractors). 

 Maybe there are other possibilities. For example, if we can propagate the errors with channels, we can also consider a about channel-model (we need to change the Ractor name :p then). 

 ### Name of Ractor (and Guild) 

 When I proposed Guild in 2016, I regarded assume "move" message-passing message passing (see specification) to be is characteristic of it, and I called explain this feature "moving membership." membership". This is why the name "Guild" was chosen. However Matz pointed out that this move semantics is not used frequently, frequently and he asked me to change the name. Also someone has already been using uses the class name "Guild." "Guild". 

 "Ractor" is short and is not an no existing class; class, this is why I choose "Ractor." "Ractor". 

 I understand people may confuse it can confused with "Reactor." "Reactor". 

 ## TODO 

 There are many remaining tasks. 

 ### Protection 

 Many VM-wide (process-wide) resources are not protected correctly, so using Ractor on a complicated program can cause critical bugs bug (`[BUG]`). Most of global resource are managed by global variables, so that we need to check them correctly. 

 ### C-methods 

 Currently, Now C-methods (methods written in C and defined with  
 `rb_define_method()`) are run in parallel. It means that thread-unsafe code  
 can run in parallel. To solve this issue, I plan the following: 

 (1) Introduce thread-unsafe label for methods 

 It is impossible to make all C-methods thread-safe, especially for C-methods in third 3rd party C-extensions. To protect them, label "thread-unsafe" for these (possibly) (possible) thread-unsafe C-methods as "thread-unsafe." C-methods. 

 When "unsafe"-labeled "unsafe" labeled C methods are invoked, then acquire a VM-wide lock. This VM-wide lock should care about recursiveness recursive-ness (so this lock should be a monitor) and escaping (exceptions). Currently, Now, VM-wide lock doesn't care about escaping, but it should be implemented soon. 

 (2) Built-in C-methods 

 I'll fix most of the builtin C-methods (String, Array, ...) so that they will become thread-safe. 
 If it is not easy, I'll use thread-unsafe label. 

 ### Copying and moving 

 Currently, Now, Marshal protocol makes to make deep copy on message communication. However, Marshal protocol doesn't support some objects like `Ractor` objects, so we need to modify them. 

 Only a few types are supported for moving, so we need to write more. 

 ### "GVL" naming 

 Currently, Now the source code contains the name "GVL", but they are Ractor local locks. 
 Maybe they it should be renamed in the source code. 

 ### Performance 

 To introduce fine-grained lock, the performance tuning is needed. 

 ### Bug fixes 

 many many .... 

 ## Conclusion 

 This ticket proposes a new concurrent abstraction "Ractor." "Ractor". 
 I think Ruby 3 can ship with Ractor with "experimental" status. 

Back