Project

General

Profile

Feature #19842

Updated by ko1 (Koichi Sasada) over 1 year ago

This ticket proposes to introduce M:N threads to improve Threads/Ractors performance. 

 ## Background 

 Ruby threads (RT in short) are implemented from old Ruby versions and they have the following features: 

 * Can be created with simple notation `Thread.new{}` 
 * Can be switched to another ready Ruby thread by: 
   * Time-slice. 
   * I/O blocking. 
   * Synchronization such as Mutex features. 
   * And other blocking reasons. 
 * Can be interruptible by: 
   * OS-deliver signals (only for the main thread). 
   * `Thread#kill`. 
 * Can be terminated by: 
   * the end of each Ruby thread. 
   * the end of the main thread (and other Ruby threads are killed). 

 Ruby 1.8 and erlier versions uses M:1 threads (green threads, user level threads, .... the word 1:N threads is more popular but to make this explanation consistent I use "M:1" term here) which manages multiple Ruby threads on 1 native thread. 

 (Native threads are provided by C interfaces such as Pthreads. In many cases, native threads are OS threads, but there are also user-level implementations, such as user-level pthread libraries in theory. Therefore, they are referred to as native threads in this article and NT in short) 

 If a Ruby thread T1 blocked because of a I/O operation, Ruby interpreter switches to the next ready Ruby thread T2. The I/O operation will be monitors by a `select()` (or similar) functionality and if the I/O is ready, T1 is marked as a ready thread and T1 will be resumed soon. However, when a Ruby thread issues some other blocking operations such as `gethostbyname()`, Ruby interpreter can not swtich to any other Ruby thread while `gethostbyname()` is not finished. 

 We named two types blocking operations: 

 * Managed blocking operations 
   * I/O (most of read/write) 
     * manage by I/O multiplexing API (select, poll, epoll, kqueue, IOCP, io_uring, ...) 
   * Sleeping 
   * Synchronization (Mutex, Queue, ...) 
 * Unmanaged operations 
   * All other blocking operations not listed above, written in C 
     * Huge number calculation like `Bignum#*` 
     * DNS lookup 
     * I/O (can not detect block-able or not by multiplexing API) 
       * open on FIFO, close on NFS, ... 
     * flock and other locking mechanism 
     * library call which uses blocking operations 
       * `libfoo` has `foo_func()` and `foo_func()` waits DNS lookup. A Ruby extension `foo-ruby` can call `foo_func()`. 

 With these terms we can say that M:1 threads can suport managed blocking operations but can not support unmanaged operations (can not make progress other Ruby threads) without further tricks. 

 Note that if the `select()`-like system calls say a `fd` is ready, but the I/O opeartion for `fd` can be blocked because of some contention (read by another thread or process, for example). 

 M:1 threads has another disadvantage that it can not run in parallel because only a native thread is used. 

 From Ruby 1.9 we had implemented 1:1 thread which means a Ruby thread has a corresponding native thread. To make implementation easy we also introduced a GVL. Only a Ruby thread acquires GVL can run. With 1:1 model, we can support managed blocking oprations and unmanaged blocking operations by releasing GVL. When a Ruby thread want to issue a blocking operation, the Ruby thread releases GVL and another ready Ruby threads continue to run. We don't care the blocking operation is managed or unmanaged. 

 (We can not make some of unmanaged blocking operations interruptible (stop by Ctrl-C for example)). 

 Advantages of 1:1 threads to the M:1 threads is: 

 * Easy to handle blocking operations by releasing GVL. 
 * We can utilize parallelism with multiple native threads by releasing GVL. 

 Disadvantages of 1:1 threads to the M:1 threads is: 

 * Overhead to make many native threads for many Ruby threads 
   * We can not make huge number of Ruby threads and Ractors on 1:1 threads. 
 * Thread switching overhead by GVL because inter-core communication is needed. 

 From Ruby 3.0 we introduced fiber scheduler mechanism to maintain multiple fibers  

 Differences between Ruby 1.8 M:1 threads are: 

 * No timeslice (only switch fibers by managed blocking operations) 
 * Ruby users can make own schedulers for apps with favorite underlying mechanism 

 Disadvantages are similar to M:1 threads. Another disadvantages is we need to consider about Fiber's behavior. 

 From Ruby 3.0 we also introduced Ractors. Ractors can run in parallel because of separating most of objects. 1 Ractor creates 1 Ruby thread, so Ractors has same disadvantages of 1:1 threads. For example, we can not make huge number of Ractors. 


 ## Goal 

 Our goal is making lightweight Ractors on lightweight Ruby threads. To enable this goal we propose to implement M:N threads on MRI. 

 M:N threads manages M Ruby threads on N native threads, with limited N (~= CPU core numbers for example). 

 Advantages of M:N threads are: 

 1. We can run M ractors on N native threads simultaneously if the machine has N cores. 
 2. We can make huge number of Ruby threads or Ractors because we don't need huge number of native threads 
 3. We can support unmanaged blocking operations by locking a native thread to a Ruby thread which issues an unmanaged blocking operation. 
 4. We can make our own Ruby threads or Ractors scheduler instead of the native thread (OS) scheduler. 

 Disadvantages of M:N threads are: 

 1. It is complex implmentation and it can be hard. 
 2. It can introduce incompatibility especaially on TLS (Thread local storage). 
 3. We need to maitain our own scheduler. 

 Without using multiple Ractors, it is similar to Ruby 1.8 M:1 threads. The difference with M:1 threads are locking NT mechanism to support unmanaged blocking operations. Another advantage is that it is easy to fallback to 1:1 threads by locking all of corresponding native threads to Ruby threads. 

 ## Proposed design 

 ### User facing changes 

 If a program only has a main Ractor (i.e., most Ruby programs), the user will not face any changes by default. 
 On main Ractor, all threads are 1:1 threads by default and there is no compatibility issue. 

 `RUBY_MN_THREADS=1` envrionment variable is given, main Ractor enables M:N threads. 
 Note that the main thread locks NT by default because the initial NT is special in some case. I'm not sure we can relax this limitation. 

 On the multiple Ractors, N (+ alpha) native threads run M ractors. Now there is no way to disable M:N threads on multiple Ractors because there are only a few multi-Ractor programs and no compatibility issues. 

 Maximum number of N can be specified by `RUBY_MAX_PROC=N`. 8 by default but this value should be specified with the number of CPU processors (cores). 

 ### TLS issue 

 On M:N threads a Ruby thread (RT1) migrates from a native thread (NT1) to NT2, ... so that TLS on native code can be a problem. 
 For example, RT1 calls a library function `foo()` and it set TLS1 on NT1. After migrating RT1 to NT2, RT1 calls `foo()` again but there is no TLS1 record because TLS1 is recorded only on NT1. 

 On this case, RT1 should be run on NT1 while using native library foo. To avoid such prbolem, we need the following features: 

 * 1:1 threads on main Ractor by default 
 * functionality to lock the NT for RT, maybe `Thread#lock_native_thread` and `Thread#unlock_native_thread` API is needed. For example, Go language has `runtime.LockOSThread()` and `runtime.UnlockOSThread()` for this purpose. 
 * Or C-API only for this purpose? (not fixed yet) 

 Thankfully, the same problem can occur with Fiber scheduler (and of course Ruby 1.8 M:1 threads), but I have not heard of it being much of a problem, so I expect that TLS will not be much of an issue. 

 ### Unmanaged blocking operations 

 From Ruby 1.9 (1:1 threads), the `nogvl(func)` API is used for most blocking operations to keep the threading system healthy. In other words, `nogvl(func)` represents that the given function is blocking operation. To support unmanaged blocking operations, we lock a native thread for the Ruby thread which issues blocking operation. 

 If the blocking operations doesn't finish soon, other Ruby threads can not run because a RT locks NT. In this case, another system monitoring thread named "Timer thread" (historical name and TT in short) creates another NT to run ready other Ruby threads. 

 This TT's behavior is the same as the behavior of "sysmon" in the Go language. 

 We named locked NT as dedicated native threads (DNT) and other NT as shared native threads (SNT). The upper bound by `RUBY_MAX_PROC` affects the number of SNT. In other words, the number of DNT is not limited (it is same that the number of NT on 1:1 threads are not limited). 

 ### Managed blocking operations 

 Managed blocking operations are multiplexing by `select()`-like functions on the Timer thread.. Now only `epoll()` is supported. 

 I/O operation flow (read on fd1) on Ruby thread RT1: 

 1. check the ready-ness of fd1 by `poll(timeout = 0)`, goto step 4. 
 2. register fd1 to Timer thread (TT) epoll and resume another ready Ruby thread. 
 3. If TT detects that the fd1 is ready, make RT1 as ready thread. 
 4. When RT1 is resumed, then do `read()` by locking corresponding NT1. 

 `sleep(n)` operation flow on Ruby thread RT1: 

 1. register timeout of RT1 to TT epoll. 
 2. If TT detects the timeout of RT1 (n seconds), TT makes RT1 as a ready Ruby thread. 

 ### Internal design 

 * 2 level scheduling 
   * Ruby threads of a Ractor is managed by M:1 1:N threads 
   * Ruby threads of different Ractors are managed by M:N threads 
 * Timer thread has several duties 
   1. Monitoring I/O (or other event) ready-ness 
   2. Monitoring timeout 
   3. Produce timeslice signals 
   4. Help OS signal delivering 

 (On pthread environment) recent Ruby doesn't make timer thread but MaNy implementation makes TT anytime. it can be improved. 


 ## Implementation 

 The code name is MaNy project, it is from MN threads. 

 https://github.com/ko1/ruby/tree/many2 

 The implementation is not matured (debugging now). 

 ## Measurements 

 See RubyKaigi 2023 slides: https://atdot.net/~ko1/activities/2023_rubykaigi2023.pdf 

 ## Discussion 

 * Enable/disable 
   * default behavior 
   * how to switch the behavior 
 * Should we lock the NT for main thread anytime? 
 * Ruby/C API to lock the native threads 

 ## Misc 

 This description will be improved more later. 

Back