Project

General

Profile

Actions

Feature #19842

closed

Introduce M:N threads

Added by ko1 (Koichi Sasada) 8 months ago. Updated 5 months ago.

Status:
Closed
Target version:
-
[ruby-core:114422]

Description

This ticket proposes to introduce M:N threads to improve Threads/Ractors performance.

Background

Ruby threads (RT in short) are implemented from old Ruby versions and they have the following features:

  • Can be created with simple notation Thread.new{}
  • Can be switched to another ready Ruby thread by:
    • Time-slice.
    • I/O blocking.
    • Synchronization such as Mutex features.
    • And other blocking reasons.
  • Can be interruptible by:
    • OS-deliver signals (only for the main thread).
    • Thread#kill.
    • Thread#raise.
  • Can be terminated by:
    • the end of each Ruby thread.
    • the end of the main thread (and other Ruby threads are killed).

Ruby 1.8 and erlier versions uses M:1 threads (green threads, user level threads, .... the word 1:N threads is more popular but to make this explanation consistent I use "M:1" term here) which manages multiple Ruby threads on 1 native thread.

(Native threads are provided by C interfaces such as Pthreads. In many cases, native threads are OS threads, but there are also user-level implementations, such as user-level pthread libraries in theory. Therefore, they are referred to as native threads in this article and NT in short)

If a Ruby thread RT1 blocked because of a I/O operation, Ruby interpreter switches to the next ready Ruby thread RT2. The I/O operation will be monitors by a select() (or similar) functionality and if the I/O is ready, RT1 is marked as a ready thread and RT1 will be resumed soon. However, when a Ruby thread issues some other blocking operations such as gethostbyname(), Ruby interpreter can not swtich to any other Ruby thread while gethostbyname() is not finished.

We named two types blocking operations:

  • Managed blocking operations
    • I/O (most of read/write)
      • manage by I/O multiplexing API (select, poll, epoll, kqueue, IOCP, io_uring, ...)
    • Sleeping
    • Synchronization (Mutex, Queue, ...)
  • Unmanaged operations
    • All other blocking operations not listed above, written in C
      • Huge number calculation like Bignum#*
      • DNS lookup
      • I/O (can not detect block-able or not by multiplexing API)
        • open on FIFO, close on NFS, ...
      • flock and other locking mechanism
      • library call which uses blocking operations
        • libfoo has foo_func() and foo_func() waits DNS lookup. A Ruby extension foo-ruby can call foo_func().

With these terms we can say that M:1 threads can suport managed blocking operations but can not support unmanaged operations (can not make progress other Ruby threads) without further tricks.

Note that if the select()-like system calls say a fd is ready, but the I/O opeartion for fd can be blocked because of some contention (read by another thread or process, for example).

M:1 threads has another disadvantage that it can not run in parallel because only a native thread is used.

From Ruby 1.9 we had implemented 1:1 thread which means a Ruby thread has a corresponding native thread. To make implementation easy we also introduced a GVL. Only a Ruby thread acquires GVL can run. With 1:1 model, we can support managed blocking oprations and unmanaged blocking operations by releasing GVL. When a Ruby thread want to issue a blocking operation, the Ruby thread releases GVL and another ready Ruby threads continue to run. We don't care the blocking operation is managed or unmanaged.

(We can not make some of unmanaged blocking operations interruptible (stop by Ctrl-C for example)).

Advantages of 1:1 threads to the M:1 threads is:

  • Easy to handle blocking operations by releasing GVL.
  • We can utilize parallelism with multiple native threads by releasing GVL.

Disadvantages of 1:1 threads to the M:1 threads is:

  • Overhead to make many native threads for many Ruby threads
    • We can not make huge number of Ruby threads and Ractors on 1:1 threads.
  • Thread switching overhead by GVL because inter-core communication is needed.

From Ruby 3.0 we introduced fiber scheduler mechanism to maintain multiple fibers

Differences between Ruby 1.8 M:1 threads are:

  • No timeslice (only switch fibers by managed blocking operations)
  • Ruby users can make own schedulers for apps with favorite underlying mechanism

Disadvantages are similar to M:1 threads. Another disadvantages is we need to consider about Fiber's behavior.

From Ruby 3.0 we also introduced Ractors. Ractors can run in parallel because of separating most of objects. 1 Ractor creates 1 Ruby thread, so Ractors has same disadvantages of 1:1 threads. For example, we can not make huge number of Ractors.

Goal

Our goal is making lightweight Ractors on lightweight Ruby threads. To enable this goal we propose to implement M:N threads on MRI.

M:N threads manages M Ruby threads on N native threads, with limited N (~= CPU core numbers for example).

Advantages of M:N threads are:

  1. We can run M ractors on N native threads simultaneously if the machine has N cores.
  2. We can make huge number of Ruby threads or Ractors because we don't need huge number of native threads
  3. We can support unmanaged blocking operations by locking a native thread to a Ruby thread which issues an unmanaged blocking operation.
  4. We can make our own Ruby threads or Ractors scheduler instead of the native thread (OS) scheduler.

Disadvantages of M:N threads are:

  1. It is complicated implmentation and it can be hard.
  2. It can introduce incompatibility especaially on TLS (Thread local storage).
  3. We need to maitain our own scheduler.

Without using multiple Ractors, it is similar to Ruby 1.8 M:1 threads. The difference with M:1 threads are locking NT mechanism to support unmanaged blocking operations. Another advantage is that it is easy to fallback to 1:1 threads by locking all of corresponding native threads to Ruby threads.

Proposed design

User facing changes

If a program only has a main Ractor (i.e., most Ruby programs), the user will not face any changes by default.
On main Ractor, all threads are 1:1 threads by default and there is no compatibility issue.

RUBY_MN_THREADS=1 envrionment variable is given, main Ractor enables M:N threads.
Note that the main thread locks NT by default because the initial NT is special in some case. I'm not sure we can relax this limitation.

On the multiple Ractors, N (+ alpha) native threads run M ractors. Now there is no way to disable M:N threads on multiple Ractors because there are only a few multi-Ractor programs and no compatibility issues.

Maximum number of N can be specified by RUBY_MAX_PROC=N. 8 by default but this value should be specified with the number of CPU processors (cores).

TLS issue

On M:N threads a Ruby thread (RT1) migrates from a native thread (NT1) to NT2, ... so that TLS on native code can be a problem.
For example, RT1 calls a library function foo() and it set TLS1 on NT1. After migrating RT1 to NT2, RT1 calls foo() again but there is no TLS1 record because TLS1 is recorded only on NT1.

On this case, RT1 should be run on NT1 while using native library foo. To avoid such prbolem, we need the following features:

  • 1:1 threads on main Ractor by default
  • functionality to lock the NT for RT, maybe Thread#lock_native_thread and Thread#unlock_native_thread API is needed. For example, Go language has runtime.LockOSThread() and runtime.UnlockOSThread() for this purpose.
  • Or C-API only for this purpose? (not fixed yet)

Thankfully, the same problem can occur with Fiber scheduler (and of course Ruby 1.8 M:1 threads), but I have not heard of it being much of a problem, so I expect that TLS will not be much of an issue.

Unmanaged blocking operations

From Ruby 1.9 (1:1 threads), the nogvl(func) API is used for most blocking operations to keep the threading system healthy. In other words, nogvl(func) represents that the given function is blocking operation. To support unmanaged blocking operations, we lock a native thread for the Ruby thread which issues blocking operation.

If the blocking operations doesn't finish soon, other Ruby threads can not run because a RT locks NT. In this case, another system monitoring thread named "Timer thread" (historical name and TT in short) creates another NT to run ready other Ruby threads.

This TT's behavior is the same as the behavior of "sysmon" in the Go language.

We named locked NT as dedicated native threads (DNT) and other NT as shared native threads (SNT). The upper bound by RUBY_MAX_PROC affects the number of SNT. In other words, the number of DNT is not limited (it is same that the number of NT on 1:1 threads are not limited).

Managed blocking operations

Managed blocking operations are multiplexing by select()-like functions on the Timer thread.. Now only epoll() is supported.

I/O operation flow (read on fd1) on Ruby thread RT1:

  1. check the ready-ness of fd1 by poll(timeout = 0), goto step 4.
  2. register fd1 to Timer thread (TT) epoll and resume another ready Ruby thread.
  3. If TT detects that the fd1 is ready, make RT1 as ready thread.
  4. When RT1 is resumed, then do read() by locking corresponding NT1.

sleep(n) operation flow on Ruby thread RT1:

  1. register timeout of RT1 to TT epoll.
  2. If TT detects the timeout of RT1 (n seconds), TT makes RT1 as a ready Ruby thread.

Internal design

  • 2 level scheduling
    • Ruby threads of a Ractor is managed by M:1 threads
    • Ruby threads of different Ractors are managed by M:N threads
  • Timer thread has several duties
    1. Monitoring I/O (or other event) ready-ness
    2. Monitoring timeout
    3. Produce timeslice signals
    4. Help OS signal delivering

(On pthread environment) recent Ruby doesn't make timer thread but MaNy implementation makes TT anytime. it can be improved.

Implementation

The code name is MaNy project, it is from MN threads.

https://github.com/ko1/ruby/tree/many2

The implementation is not matured (debugging now).

Measurements

See RubyKaigi 2023 slides: https://atdot.net/~ko1/activities/2023_rubykaigi2023.pdf

Discussion

  • Enable/disable
    • default behavior
    • how to switch the behavior
  • Should we lock the NT for main thread anytime?
  • Ruby/C API to lock the native threads

Misc

This description will be improved more later.

Actions #1

Updated by ko1 (Koichi Sasada) 8 months ago

  • Subject changed from Intorduce M:N threads to Introduce M:N threads

Updated by tenderlovemaking (Aaron Patterson) 8 months ago

ko1 (Koichi Sasada) wrote:

This ticket proposes to introduce M:N threads to improve Threads/Ractors performance.

Advantages of M:N threads are:

  1. We can run N ractors on N native threads simultaneously if the machine has N cores.

Should this be "M ractors on N native threads"? It sounds like Ractors are not 1:1 with threads in this design.

  1. We can make huge number of Ruby threads or Ractors because we don't need huge number of native threads
  2. We can support unmanaged blocking operations by locking a native thread to a Ruby thread which issues an unmanaged blocking operation.
  3. We can make our own Ruby threads or Ractors scheduler instead of the native thread (OS) scheduler.

Internal design

  • 2 level scheduling
    • Ruby threads of a Ractor is managed by 1:N threads
    • Ruby threads of different Ractors are managed by M:N threads

I am still trying to understand. If I understand correctly, when M:N is enabled, threads become green threads, and Ractors are truly parallel (but CPU time is divided among N threads). Is that right?

Updated by k0kubun (Takashi Kokubun) 8 months ago

when M:N is enabled, threads become green threads, and Ractors are truly parallel (but CPU time is divided among N threads)

I thought he used [num_ractors]:[num_native_threads] for "1:N threads" as well (neither 1 nor N is the number of Ruby threads). If that's the case, Ruby threads under the "1:N threads" model are not green threads because N means multiple native threads. Well, I might be wrong, but I at least hope so since existing code using multiple threads will be less efficient otherwise. He declared he use "M:1" (not 1:N) for green threads, so maybe it's true.

Hoping my guess was correct, I'm not quite sure what "2 level scheduling" means though. Is it just that the single-ractor mode applies some optimization (what is it?) instead of reusing the scheduler for the multi-ractor mode? To me, it feels like you could do the same thing as "M:N threads" in "1:N threads" situations since M=1. (Or does this mean my guess was just wrong?)

Updated by tenderlovemaking (Aaron Patterson) 8 months ago

k0kubun (Takashi Kokubun) wrote in #note-3:

when M:N is enabled, threads become green threads, and Ractors are truly parallel (but CPU time is divided among N threads)

I don't think threads under the "1:N threads" model are green threads because N here means multiple native threads (neither 1 nor N is the number of Ruby threads). I guess "1:N threads" just means that Ruby threads share a set of native threads instead of using one native thread for each Ruby thread.

Ok. So Ruby Threads and Ractors will both share time on native threads. The difference being that Ractors can run in parallel and Ruby Threads that belong to the same Ractor cannot. I guess we kind of have that with cached pthreads right now, though they're only reused after a thread dies, so there is no TLS problem.

I'm not quite sure what "2 level scheduling" means though. Is it just that the single-ractor mode applies some optimization (what is it?) instead of reusing the scheduler for the multi-ractor mode? To me, it feels like you could do the same thing as "M:N threads" in "1:N threads" situations since M=1.

I don't understand either. Will Ruby threads map to native threads as well? If we have to write our own scheduler, I'm not sure the reason to do that.

Updated by ko1 (Koichi Sasada) 8 months ago

tenderlovemaking (Aaron Patterson) wrote in #note-2:

ko1 (Koichi Sasada) wrote:

This ticket proposes to introduce M:N threads to improve Threads/Ractors performance.

Advantages of M:N threads are:

  1. We can run N ractors on N native threads simultaneously if the machine has N cores.

Should this be "M ractors on N native threads"? It sounds like Ractors are not 1:1 with threads in this design.

You are right.

The reason is:

On the multiple Ractors, N (+ alpha) native threads run M ractors. Now there is no way to disable M:N threads on multiple Ractors because there are only a few multi-Ractor programs and no compatibility issues.

Internal design

  • 2 level scheduling
    • Ruby threads of a Ractor is managed by 1:N threads
    • Ruby threads of different Ractors are managed by M:N threads

I am still trying to understand. If I understand correctly, when M:N is enabled, threads become green threads, and Ractors are truly parallel (but CPU time is divided among N threads). Is that right?

You are right.

N Ractors on Ruby 3.0 are already parallel execution with N native threads.
M:N enables that N Ractors run in parallel with M native threads (N >> M).

Actions #6

Updated by ko1 (Koichi Sasada) 8 months ago

  • Description updated (diff)

Updated by ko1 (Koichi Sasada) 8 months ago

We can run N ractors on N native threads simultaneously if the machine has N cores.

Sorry it was typo (updated).

"We can run M ractors on N native threads simultaneously if the machine has N cores."

Actions #8

Updated by ko1 (Koichi Sasada) 8 months ago

  • Description updated (diff)

Updated by ko1 (Koichi Sasada) 8 months ago

Ruby threads of a Ractor is managed by 1:N threads

is also typo (updated)

"Ruby threads of a Ractor is managed by M:1 threads"

Updated by ko1 (Koichi Sasada) 8 months ago

Hoping my guess was correct, I'm not quite sure what "2 level scheduling" means though. Is it just that the single-ractor mode applies some optimization (what is it?) instead of reusing the scheduler for the multi-ractor mode? To me, it feels like you could do the same thing as "M:N threads" in "1:N threads" situations since M=1. (Or does this mean my guess was just wrong?)

This means that there are two schedulers in different layers.

  • Thread scheduler (which ready thread should be run in a Ractor)
  • Ractor scheduler (which Ractor's thread should be run)

This design is based on GVL. We can not run two threads belong to a same Ractor simultaneously (different NTs).

Updated by ko1 (Koichi Sasada) 8 months ago

Ok. So Ruby Threads and Ractors will both share time on native threads. The difference being that Ractors can run in parallel and Ruby Threads that belong to the same Ractor cannot. I guess we kind of have that with cached pthreads right now, though they're only reused after a thread dies, so there is no TLS problem.

BTW current reusing native thread strategy can cause TLS issue in theory because:

  • RT1 on NT1 set TLS :tv1 = 1
  • RT1 died and NT1 can be reused
  • RT2 reuses NT1 and :tv1 = 1 is already set (because RT1 doesn't cleanup TLS).

If a library relies on the default value of tv1 (== 0), it can be a problem.

thread int tv1 = 0;

func(){
  if (tv1) do_without_init();
  else do_with_init();        // RT2 needs to come here but tv1 is already set.
}

BUT I don't hear any issues about it.

Actions #12

Updated by ko1 (Koichi Sasada) 8 months ago

  • Description updated (diff)
Actions #13

Updated by ko1 (Koichi Sasada) 8 months ago

  • Description updated (diff)

Updated by byroot (Jean Boussier) 8 months ago

Thread Local Storage issue

Is there any plan to expose a c API to store data on the Ruby thread to replace the native APIs?

Locking to a native thread is fine for making old code compatible, but if we wish to upgrade that code to work better with MaNy, we'd need such API.

Updated by ko1 (Koichi Sasada) 8 months ago

byroot (Jean Boussier) wrote in #note-14:

Thread Local Storage issue

Is there any plan to expose a c API to store data on the Ruby thread to replace the native APIs?

Locking to a native thread is fine for making old code compatible, but if we wish to upgrade that code to work better with MaNy, we'd need such API.

VALUE rb_thread_local_aref(VALUE thread, ID key);
VALUE rb_thread_local_aset(VALUE thread, ID key, VALUE val);

are exposed.

Anyway I believe only few extensions rely on (native thread) TLS.
I think it is problematic libfoo uses TLS and there is no way to modify it.

Updated by byroot (Jean Boussier) 8 months ago

rb_thread_local_aref

Ah, I totally missed that API, thank you.

Another question I have is around being able to opt-out of N:M for a specific thread.

e.g if you have something like:

$background_check = Thread.new do
  loop do
   SomeGem.native_method_that_release_the_gvl_most_of_the_time
  end
end

I think it could make sense to let it have its own native thread?

how to switch the behavior

So, if we don't plan to enable this for the main ractor, IMHO the value is quite limited.

The worry for enabling MaNy on the main ractor is C extensions, if so, should we have a way to mark extensions as MaNy compatible? Like how they can declare themselves as Ractor safe?

This would potentially allow to enable MaNy on the main Ractor unless you load a non-compatible extension?

Alternatively, it could be being a runtime flag of some sort and disable by default for a few versions until common extensions are made compatible?

Updated by ko1 (Koichi Sasada) 8 months ago

byroot (Jean Boussier) wrote in #note-16:

Another question I have is around being able to opt-out of N:M for a specific thread.

e.g if you have something like:

$background_check = Thread.new do
  loop do
   SomeGem.native_method_that_release_the_gvl_most_of_the_time
  end
end

I think it could make sense to let it have its own native thread?

functionality to lock the NT for RT, maybe Thread#lock_native_thread and Thread#unlock_native_thread API is needed. For example, Go language has runtime.LockOSThread() and runtime.UnlockOSThread() for this purpose.

may be what you want.

how to switch the behavior

So, if we don't plan to enable this for the main ractor, IMHO the value is quite limited.

You can try your application with:

RUBY_MN_THREADS=1 envrionment variable is given, main Ractor enables M:N threads.


The worry for enabling MaNy on the main ractor is C extensions, if so, should we have a way to mark extensions as MaNy compatible? Like how they can declare themselves as Ractor safe?

This would potentially allow to enable MaNy on the main Ractor unless you load a non-compatible extension?

Alternatively, it could be being a runtime flag of some sort and disable by default for a few versions until common extensions are made compatible?

I think only a few libraries rely on native thread TLS, so I think it is too much to mark something.
Hopefully C-extension author knows it has trouble or not.

If we have many troubles, when we need to consider about it.

Updated by byroot (Jean Boussier) 8 months ago

Thread#lock_native_thread may be what you want.

Well, it depends if locked thread are counted as part of RUBY_MAX_PROC or not. If not then yes that would be what I want.

You can try your application with RUBY_MN_THREADS=1

Ah, I indeed missed that. Thank you. The idea would be to make it the default at some point?

Updated by ko1 (Koichi Sasada) 8 months ago

byroot (Jean Boussier) wrote in #note-18:

Thread#lock_native_thread may be what you want.

Well, it depends if locked thread are counted as part of RUBY_MAX_PROC or not. If not then yes that would be what I want.

A locked NT (DNT) is not counted.

We named locked NT as dedicated native threads (DNT) and other NT as shared native threads (SNT). The upper bound by RUBY_MAX_PROC affects the number of SNT. In other words, the number of DNT is not limited (it is same that the number of NT on 1:1 threads are not limited).

You can try your application with RUBY_MN_THREADS=1

Ah, I indeed missed that. Thank you. The idea would be to make it the default at some point?

If there are no issues :p

Updated by tenderlovemaking (Aaron Patterson) 8 months ago

Since Ractors will time share on one native thread, I think that means multiple live Ruby threads could also possibly share the same native thread.

We currently have some thread related event hooks like RUBY_INTERNAL_THREAD_EVENT_READY here.
Right now, we can kind of tell what thread is "ready" in the event by checking pthread_self() inside the internal hook callback. For example (translated from C to Ruby):

def callback event, event_data, data
  if event == RUBY_INTERNAL_THREAD_EVENT_READY
    # Get native thread ID
    thread_id = pthread_threadid_np(pthread_self())

    thread = Thread.list.find { |t| t.native_thread_id == thread_id }
    puts "#{thread} is ready!!"
  end
end


rb_internal_thread_add_event_hook(callback, RUBY_INTERNAL_THREAD_EVENT_MASK, self);

Since the native thread id won't be unique for a live thread, can we maybe create a unique ID for each thread and pass it to the event callbacks? I can create a separate ticket, but since MaNy isn't merged yet, it isn't a problem yet.

Updated by byroot (Jean Boussier) 8 months ago

@tenderlovemaking (Aaron Patterson) this issue was envisioned initially, I had a PR to pass such thread idea, but I guess it got forgotten about https://github.com/ruby/ruby/pull/6189.

We could just merge that PR (which I think has value even aside from MaNy).

Updated by tenderlovemaking (Aaron Patterson) 8 months ago

byroot (Jean Boussier) wrote in #note-21:

@tenderlovemaking (Aaron Patterson) this issue was envisioned initially, I had a PR to pass such thread idea, but I guess it got forgotten about https://github.com/ruby/ruby/pull/6189.

We could just merge that PR (which I think has value even aside from MaNy).

Great! Yes, I would like if we merge that PR 😄

Updated by ioquatix (Samuel Williams) 8 months ago

This is an interesting proposal, thanks for writing it up. I have some thoughts and questions, in no particular order.

Can we use MaNy without Ractor? Last time I tried Ractor, I ran into significant problems. So, if this depends on creating Ractors for application code, I'm concerned it may be difficult to use in practice. When I first started working on fiber scheduling about 6 years ago, multi-process was the only logical way to achieve true parallelism. So, the fiber scheduler always felt like the simplest model was M processes : N fibers. The main advantage of Ractor is memory usage in this case. But what is the advantage of MaNy, since we already have the fiber scheduler?

When Ractor was merged, due to the TLS of Ruby state, there was a performance hit in some cases, especially to Fiber context switching. This was quickly fixed, and I hope that MaNy does not introduce other performance regressions to existing code. It sounds like you've thought about compatibility w.r.t. making it a feature of Ractor. However, in my experience, there are other significant blockers to high performance concurrency, namely, garbage collection. Are you confident you can introduce this feature without performance regressions? Are there any areas where we can improve performance?

Why not use the fiber scheduler interface for "managed blocking" operations? This would bring you several mature schedulers without essentially replicating all the work done adding hooks in the right places. In your managed blocking operations, there are actually a lot of operations which the fiber scheduler handles in addition to your list: Process#wait, Addrinfo.getaddrinfo (and all related methods which do name resolution). Also, it's worth noting that io_uring provides asynchronous open/read/write, so things like "I/O (can not detect block-able or not by multiplexing API), open on FIFO, close on NFS, ..." are not problem going forward. Regarding scheduling threads, the io-event gem takes a "fiber" argument, but in fact, you can pass any object that implements #transfer for re-scheduling.

Regarding "flock and other locking mechanism" and similar (e.g. fallocate) - I think going forward, we will see many of these operations supported by io_uring. However, it can be tricky in practice - an uncontested operation can proceed faster than an asynchronous call, so it's often better to do something like: read -> EAGAIN -> io_uring than directly schedule the read into the io_uring. I think we can continue to expand the lexicon of "managed blocking" operations as need arises. I've been referring to this as progressive concurrency - depending on the platform and supported features, more or less operations may be executed concurrently - but it doesn't affect user code.

I'm also a little concerned about the messaging/public image. Threads are primarily considered a tool for parallelism. But this proposal introduces significant changes to how they work. When Matz talked about Threads in the past, he was not positive: "I regret introducing Thread". TruffleRuby and JRuby have both implemented threads that run with true parallelism. It also looks like there was a proposal for Python regarding removing the "GVL" and allowing free threading. Since the Fiber Scheduler already provides a similar "green threading" implementation, and is used in production today, what are the main advantages of MaNy?

More specifically, do you want to encourage people to write e.g. Thread.new{Net::HTTP.get(...)}.join? How will we deal with concurrency vs parallel execution and thread safety? The fiber scheduler deliberately chooses to deal with concurrency to avoid issues relating to parallelism. e.g. Async{} was designed to be safe while allowing practical levels of concurrency. I do appreciate there is an overlap between concurrency problems and parallel problems, but parallel execution is far more tricky in real world programs. I don't think we can say Thread is safe, in general, especially if you consider all major implementations of Ruby. I have personally experimented with JRuby and TruffleRuby and at least at the time, neither of them were totally thread safe (it was possible to corrupt internal data). One of the reasons Ractor is appealing is because it isolates parallel execution and presents a safe interface. Can we do the same for Thread?

Updated by byroot (Jean Boussier) 8 months ago

Can we use MaNy without Ractor?

@ko1 (Koichi Sasada) answered above, you can enable it on the main ractor threads with RUBY_MN_THREADS=1.

But what is the advantage of MaNy, since we already have the fiber scheduler?

Can't answer for @ko1 (Koichi Sasada), but to me preemption is a major advantage.

Updated by ioquatix (Samuel Williams) 8 months ago

Can't answer for @ko1 (Koichi Sasada) (Koichi Sasada), but to me preemption is a major advantage.

Please correct me if I'm wrong but IIUC: because CRuby doesn't have true parallelism within Threads, pre-emption has been limited to context switching only when the GVL is released (i.e. "managed blocking" operations). This can be demonstrated by the following program:

#!/usr/bin/env ruby

require 'sqlite3'

Thread.new do
  while true
    sleep 0.1
    $stdout.write '.'
  end
end

s = "SELECT 1;"*5000000 # Slow.
db = SQLite3::Database.new(':memory:')

sleep 1

$stdout.write '>'
db.execute_batch2(s) # Does not release GVL.
$stdout.write '<'

sleep 1

No matter what way we cut it, there are some major limitations to thread scheduling in Ruby, by design. Does MaNy change this? I assume the only way we can change this, according to the current design, is to use Ractor, which allows threads to execute independently with separate GVLs. @ko1 (Koichi Sasada) you mention other tricks, do you mind sharing any other ideas you have?

In summary, a similar style of pre-emption could be introduced for fiber scheduler if we want it. In fact, some users already experimented with it using a timer to interrupt the scheduler to force re-scheduling. I see it as more of a latency/throughput trade off. I didn't introduce it because I see it as low priority. But it's been at the back of my mind. Since we do mark GVL regions, we could also lift those operations into a thread pool to avoid managed blocking operations stalling the event loop. But as far as pre-emption goes, I would argue that because the fiber scheduler does actively manage blocking operations and allow what feels like a similar level of concurrency, that they are similar in actual throughput/latency. Whether that's true in practice is something that could be evaluated. If nogvl regions are a big source of latency, we could add a hook for redirecting them to a thread pool. The current advice I give to users is to (1) use a background job processor or (2) explicitly write Thread.new{...}.join, both of which typically work pretty well in practice. A nogvl hook would look like this:

class MyScheduler
  def blocking_region(&block)
    Thread.new(&block).value
  end
end

Updated by byroot (Jean Boussier) 8 months ago

Please correct me if I'm wrong but IIUC: because CRuby doesn't have true parallelism within Threads, pre-emption has been limited to context switching only when the GVL is released

No, that's incorrect, a thread that would never release the GVL is preempted after ~100ms. See the TIME_QUANTUM constant: https://github.com/ruby/ruby/blob/acedbcb1b4eb6b362f11e783bff53c237d05afc6/thread_pthread.c#L398-L400

In your example, it doesn't happen because the thread is in a C method. That preemption only happens when executing Ruby code.

Updated by ioquatix (Samuel Williams) 8 months ago

In your example, it doesn't happen because the thread is in a C method. That preemption only happens when executing Ruby code.

Ah yes, thanks for this. Actually, I'm not convinced you are correct. For some definition of "pure Ruby code", e.g.:

#!/usr/bin/env ruby

# Original discussion: https://bugs.ruby-lang.org/issues/18258

require 'benchmark'

class Borked
  def freeze
  end
end

class Nested
  def initialize(count, top = true)
    if count > 0
      @nested = count.times.map{Nested.new(count - 1, false).freeze}.freeze
    end
    
    if top
      @borked = Borked.new
    end
  end
  
  attr :nested
  attr :borked
end

def test(n)
  puts "Creating nested object of size N=#{n}"
  nested = Nested.new(n).freeze
  shareable = false
  
  result = Benchmark.measure do
    $stdout.write '>'
    shareable = Ractor.shareable?(nested)
    $stdout.write '<'
  end

  pp result: result, shareable: shareable
end

Thread.new do
  while true
    sleep 0.1
    $stdout.write '.'
  end
end

test(10)

Gives me the following output:

> ./shareable.rb
Creating nested object of size N=10
....................>.<{:result=>
  #<Benchmark::Tms:0x00007f34d70ee170
   @cstime=0.0,
   @cutime=0.0,
   @label="",
   @real=4.046605570008978,
   @stime=0.25974500000000006,
   @total=4.043000999999999,
   @utime=3.7832559999999997>,
 :shareable=>false}

I think one of the main advantages of pre-emptive scheduling is in cases like this, where user level code is being highly unfair - as I'm sure you know. It seems like some "unmanaged blocking operations" are simply unavoidable with the current design.

To expand on the above example, maybe Ractor.shareable?(nested) should release the GVL? But in what cases? Because releasing the GVL would be a significant cost in probably 99% of cases. So it's not clear to me what the solution is - either we have proper parallelism and pre-emtive scheduling, or we have some kind of cooperative scheduling with a timer thread which optimistically tries to interrupt some subset of blocking operations (but not all). I hesitate to call that true pre-emptive multi-tasking as the system can still stall in some cases, and to me, that's the entire point of pre-emption - to avoid those cases and force a context switch.

Updated by luke-gru (Luke Gruber) 7 months ago

Ractor.shareable is a C function that doesn't call back into the interpreter under usual circumstances, so won't be pre-empted. It could be re-written in Ruby to avoid this sort of issue, ie where large object graphs are traversed without being able to be pre-empted. I guess this is another gain of re-writing things in Ruby, other than JIT gains.

Updated by tenderlovemaking (Aaron Patterson) 7 months ago

Any idea when / if we will merge this? Or, is there anything preventing us to merge?

Updated by hsbt (Hiroshi SHIBATA) 7 months ago

@tenderlovemaking (Aaron Patterson) We can merge this with ko1's convenience. We needed to skip this for preview2 because ko1 said this branch is bit of unstable.

Actions #31

Updated by Eregon (Benoit Daloze) 7 months ago

  • Description updated (diff)

Updated by byroot (Jean Boussier) 7 months ago

this branch is bit of unstable.

If so, the sooner it's merge the sooner we can help stabilize it, and the more time we have to do so before release.

Updated by ko1 (Koichi Sasada) 6 months ago

byroot (Jean Boussier) wrote in #note-32:

this branch is bit of unstable.

If so, the sooner it's merge the sooner we can help stabilize it, and the more time we have to do so before release.

Now I finished to ready-to-merge version (sorry too slow).
Naruse-san (release manager) allows me to merge it if it is easy to revert, so I want to try to merge it, and revert it if we have uneasy bugs (I hope there is not).

Actions #35

Updated by ko1 (Koichi Sasada) 6 months ago

  • Status changed from Open to Closed

Applied in changeset git|be1bbd5b7d40ad863ab35097765d3754726bbd54.


M:N thread scheduler for Ractors

This patch introduce M:N thread scheduler for Ractor system.

In general, M:N thread scheduler employs N native threads (OS threads)
to manage M user-level threads (Ruby threads in this case).
On the Ruby interpreter, 1 native thread is provided for 1 Ractor
and all Ruby threads are managed by the native thread.

From Ruby 1.9, the interpreter uses 1:1 thread scheduler which means
1 Ruby thread has 1 native thread. M:N scheduler change this strategy.

Because of compatibility issue (and stableness issue of the implementation)
main Ractor doesn't use M:N scheduler on default. On the other words,
threads on the main Ractor will be managed with 1:1 thread scheduler.

There are additional settings by environment variables:

RUBY_MN_THREADS=1 enables M:N thread scheduler on the main ractor.
Note that non-main ractors use the M:N scheduler without this
configuration. With this configuration, single ractor applications
run threads on M:1 thread scheduler (green threads, user-level threads).

RUBY_MAX_CPU=n specifies maximum number of native threads for
M:N scheduler (default: 8).

This patch will be reverted soon if non-easy issues are found.

[Bug #19842]

Updated by ko1 (Koichi Sasada) 5 months ago

Sorry I forget to summary the specification.
The following design was accepted by matz.

  • Enable/disable and default behavior
    • On the main Ractor, M:N scheduler is not enabled by default. RUBY_MN_THREADS=1 envval will enables it.
    • On non-mrain Ractor, M:N scheduler is enabled by default and no way to disable it.
    • RUBY_MAX_CPU=n envval sets N (the maximum number of native threads) and default is 8 now.
  • how to switch the behavior
    • No way to change the behavior now (future work)
  • Should we lock the NT for main thread anytime?
    • Yes. The main thread (== an initial thread on launching the interpreter) locks the initial native thread.
  • Ruby/C API to lock the native threads
    • Now no way to lock the native threads (future work)

I hope there is no behavior changes.

Actions

Also available in: Atom PDF

Like3
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like1Like0Like0Like0Like0Like1Like0