Feature #17100

Updated by ko1 (Koichi Sasada) almost 3 years ago

# Ractor: a proposal for new concurrent abstraction without thread-safety issues 

 ## Abstract 

 This ticket proposes a new concurrent abstraction named "Ractor", Ruby's  
 Actor-like feature (not an exact Actor-model). 

 Ractor achieves the following goals: 

 * Parallel execution in a Ruby interpreter process 
 * Avoid thread-safety issues (especially race issues) by limiting the object sharing 
 * Communication via copying and moving 

 I'm working on this proposal in a few years, and the project name was  
 "Guild". I renamed it from Guild to Ractor because of Matz's preference. 

 * Proposed specification: 
 * my talk 
   * (latest, but written in Japanese) 
   * (old, API was changed) 
   * (old, API was changed) 

 Current implementation is not complete (many bugs are remaining) but it passes current CI. 
 I propose to merge it soon and try the APIs, continue to develop the implementation on a master branch. 

 ## Background 

 MRI doesn't provide in-process parallel computation feature because  
 parallel "Threads" has many issues. 

 * Ruby programmers need to consider about Thread-safety more. 
 * Interpreter developers need to consider about Thread-safety more. 
 * Interpreter will slow down in single thread execution because of fine-grain synchronization without clever optimizations. 

 The reason of these issues is "shared-everything" thread model. 

 ## Proposal 

 To overcome the issues on multiple-threads, the Ractor abstraction is proposed. 
 This proposal consists of two-layers: memory model and communication model. 

 * Introduce "Ractor" as new concurrent entity. 
 * Ractors run in parallel. 

 * Separate "shareable" objects and "unshareable" objects between parallel running ractors. 
    * Shareable-objects: 
      * Immutable objects (frozen objects and only refer to shareable objects) 
      * Class/module objects 
      * Special shareable objects (Ractor objects, and so on) 
    * Unshareable-objects:  
      * Other objects 
 * Most of objects are "unshareable", it means we (Ruby programmers and interpreter developers) don't need to care about thread-safety in many cases. 
 * We only concentrate to synchronize "shareable" objects. 
 * Compare with completely separating memory model (like MVM proposal), the programming will be easier (). 
 * This model is similar to Racket's `Place` abstraction. 

 * Actor-like (not same) message passing with `Ractor#send(obj)` and `Ractor.recv` 
 * Pull-type communication with `Ractor.yield(obj)` and `Ractor#take` 
 * Support multiple waiting with `` 

 Actor-like model is why we name this proposal "Ractor" (Ruby's actor). However now it is not an Actor model because we can't select the message (with pattern-match on Erlang, Elixir, ...). It means we can't has multiple communication channels. Instead of incomplete actor model, this proposal has `yield`/`take` pair to provide multiple channels. Discuss later about this topic. 

 I strongly believe memory model is promising. 
 However, I'm not sure the communication model is the best. 
 This is why I introduced "experimental" warning. 

 Proposed specification: 

 ## Implementation 
 All GH actions passes. 

 I describe implementation briefly. 

 ### `rb_ractor_t` 

 Without Ractor, the VM-Thread-Fiber hierarchy is here: 

 * The VM `rb_vm_t` manages running threads (`rb_thread_t`). 
 * A thread (`rb_thread_t`) points a running fiber (`rb_fiber_t`). 

 With Ractor, we introduced new layer `rb_ractor_t` 

 * The VM `rb_vm_t` manages running ractors (`rb_ractor_t`). 
 * A Ractor manages running threads (`rb_thread_t`). 
 * A thread (`rb_thread_t`) points a running fiber (`rb_fiber_t`). 

 `rb_ractor_t` has GVL to manage threads (only one thread of Ractor's threads can run). 

 Ractor implementation is located in `ractor.h`, `ractor.c` and `ractor.rb`. 

 ### VM-wide lock 

 VM-wide lock is introduced to protect VM global resources such as object-space and so on. 
 It should allow the recursive lock, so the implementation is monitor. So we may need to call it VM-wide monitor. 
 Now `RB_VM_LOCK_ENTER()` and `RB_VM_LOCK_LEAVE()` are provide to acquire/release the lock. 

 Note that it is different from (current) GVL. 
 GVL is acquired anytime you want to run the Ruby threads. 
 VM-wide lock is acquired only when accessing the VM-wide resources. 

 On single ractor mode (all Ruby scripts except my tests)  

 ### Object management and GC 

 * (1) All ractors share the object space. 
 * (2) All GC events will stop all ractors, and a ractor do GC work with barrier synchronization. 
   * Barrier at `gc_enter()` 
   * marking, (lazy) sweeping, ... 
 * (3) Because all the object space is shared by ractors, object creation is protected with VM-wide lock. 

 (2) and (3) has huge impact on performance. 
 The plans are: 

 * For (2), introduce (semi-)separated object space. It requires long time and Ruby 3.0 can't employ this technique. 
 * For (3), introduce free slot cache for the every ractor and most of creation can be done without synchronization. It will be employed soon. 

 ### Experimental warning 

 Now Ractor implementation and specification is not stable. So that the first usage of `` will show a warning: 

 `warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.` 

 ## Discussion 

 ### Actor-based and channel-based 

 I think there are two message passing approach: Actor-based (Erlang, ...) and channel-based (Go, ...). 

 With channel-based, it is easy to manipulate multiple channels because it manages multiple channels explicitly. With Actor-based approach, to manipulate multiple channels with message pattern. Receiver can ignore/pending the unexpected structured message and can handle it after behavior is changed (role of actor is changed). 

 Ractor has `send/recv` like Actor-model, but there is no pattern matching feature. This is because we can't introduce new syntax and I can't design good API. 

 Channel-based approach, it is easy to design the API (for example, `ch =` and share the `ch` for ractors can provide). However I can't design good API to handle exceptions between Ractors. 

 To consider the error handling, we propose hybrid model using `send/recv`, `yield/take` pairs. `Ractor#take` can receive source ractor's exception (like `Thread#join`). On Actor approach, we can detect destination Ractor is not working (killed) when `Ractor#send(obj)`. Receiver ractor (waiting for `Ractor.recv`) can not detect sender's trouble, but maybe the priority is not a high. `Ractor#take` also detects sender's (`Ractor.yield(obj)`) error, so the error propagation can be done. 

 To handle multiple communication channels on Ractor, instead of using multiple channels, but use *pipe* ractors. 

 # worker-pool (receive by send) 

 main # pipe.send(obj) 
 -> pipe # Ractor.yield Ractor.recv 
     worker1 # Ractor.yield(some_task pipe.take)) 
     worker2 # Ractor.yield(some_task pipe.take)) 
     worker3 # Ractor.yield(some_task pipe.take)) 
 -> main #, worker2, worker3) 

 # if worker* causes an error, main can detect the error. 

 *pipe* ractors seems channel. However, we don't need to introduce new class with this technique (implementation can omit Ractor creation for pipe ractors). 

 Maybe there are other possibilities. For example, if we can propagate the errors with channels, we can also consider about channel-model (we need to change the Ractor name :p then). 

 ### Name of Ractor (and Guild) 

 When I proposed Guild in 2016, I assume "move" message passing (see specification) is characteristic and I explain this feature "moving membership". This is why the name "Guild" was chosen. However Matz pointed out that this move semantics is not used frequently and he asked me to change the name. Also someone uses the class name "Guild". 

 "Ractor" is short and no existing class, this is why I choose "Ractor". 

 I understand people can confused with "Reactor". 

 ## TODO 

 There are many remaining tasks. 

 ### Protection 

 Many VM-wide (process-wide) resources are not protected correctly, so using Ractor on complicated program can cause critical bug (`[BUG]`). Most of global resource are managed by global variables, so that we need to check them correctly. 

 ### C-methods 

 Now C-methods (methods written in C and defined with  
 `rb_define_method()`) are run in parallel. It means thread-unsafe code  
 can run in parallel. To solve this issue, I plan the following: 

 (1) Introduce thread-unsafe label for methods 

 It is impossible to make all C-methods thread-safe, especially for C-methods in 3rd party C-extensions. To protect them, label "thread-unsafe" for these (possible) thread-unsafe C-methods. 

 When "unsafe" labeled C methods are invoked, then acquire VM-wide lock. This VM-wide lock should care about recursive-ness (so this lock should be a monitor) and escaping (exceptions). Now, VM-wide lock doesn't care escaping, but it should be implemented soon. 

 (2) Built-in C-methods 

 I'll fix most of builtin C-methods (String, Array, ...) thread-safe. 
 If it is not easy, I'll use thread-unsafe label. 

 ### Copying and moving 

 Now, Marshal protocol to make deep copy on message communication. However, Marshal protocol doesn't support some objects like `Ractor` objects, so we need to modify them. 

 Only a few types are supported for moving, so we need to write more. 

 ### "GVL" naming 

 Now the source code contains the name "GVL", but they are Ractor local locks. 
 Maybe it should be renamed in source code. 

 ### Performance 

 To introduce fine-grained lock, the performance tuning is needed. 

 ### Bug fixes 

 many many .... 

 ## Conclusion 

 This ticket proposes a new concurrent abstraction "Ractor". 
 I think Ruby 3 can ship with Ractor with "experimental" status.