Project

General

Profile

Feature #17307

A way to mark C extensions as thread-safe, Ractor-safe, or unsafe

Added by Eregon (Benoit Daloze) 2 months ago. Updated 28 days ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:100720]

Description

I would like to design a way to mark C extensions as thread-safe, Ractor-safe, or unsafe (= needs process-global lock).
By default, if not marked, C extensions would be treated as unsafe for compatibility.

Specifically, TruffleRuby supports C extensions, but for scalability it is important to run at least some of them in parallel (e.g., HTTP parsing in Puma).
This was notably mentioned in my RubyKaigi talk.
TruffleRuby defaults to acquire a global lock when executing C extension code for maximum compatibility (Ruby code OTOH can always run in parallel).
There is a command-line option for that lock and it can be disabled, but then it is disabled for all C extensions.
The important property for TruffleRuby is that the C extension does not need a global lock, i.e., that it synchronizes any mutable state in C that could be accessed by multiple threads, such as global C variables.
I believe many C extensions are already thread-safe, or can easily become thread-safe, because they do not rely on global state and do not share the RData objects between threads.

Ractor also needs a way to mark C extensions, to know if it's OK to use the C extension in multiple Ractors in parallel, and that the C extension will not leak non-shareable objects from one Ractor to another, which would lead to bugs & segfaults.
Otherwise, C extensions could only be used on the main/initial Ractor (or need to acquire a process-global lock whenever executing C extension code and ensure no non-shareable objects leak between Ractors), which would be a very big limitation (almost every non-trivial application depends on a C extension transitively).

In both cases, global state in the C extension needs synchronization.
In the thread-safe case, mutable state in C that could be accessed by multiple Ruby threads needs to be synchronized too (there might be no such state, e.g., if C extension objects are created per Thread).
In the Ractor case, the C extension must never pass an object from a Ractor to another, unless it is a shareable object.

What do you think would be a good way to "mark" C extensions?
Maybe defining a symbol in the C extension, similar to the Init_foo we have, like say foo_is_thread_safe/foo_is_ractor_safe?
A symbol including the C extension name seems best, to avoid any possible confusion when looking it up.

Maybe there are other ways to mark C extensions than defining symbols, that could still be read by the Ruby implementation reliably?

I used the term C extensions but of course it would apply to native extensions too (including C++/Rust/...).

cc ko1 (Koichi Sasada)

Updated by Eregon (Benoit Daloze) 2 months ago

Above I suggested to mark a whole C extension as thread-safe/ractor-safe/unsafe, but we could potentially also mark on a per function/method (e.g. rb_define_function) basis, at least for thread-safe/unsafe.
For instance with rb_define_function_thread_safe or with some attribute on the function, but it seems inconvenient.
Then TruffleRuby would acquire the global C extensions lock only for unsafe functions, and not acquire it for thread-safe functions.

Potentially this granularity could work for Ractor too, and then unsafe functions could only be called on the main Ractor.
The whole C extension would still need to not leak non-shareable objects between Ractors though.

I think in practice, marking per C extension is easier to understand and more convenient.

#2

Updated by Eregon (Benoit Daloze) 2 months ago

  • Description updated (diff)

Updated by Eregon (Benoit Daloze) 2 months ago

From discussing with ko1 (Koichi Sasada),
another way to mark would be to call a function in Init_foo, like:

static VALUE foo(VALUE self) {
  ...
}

void Init_foo(void) {
  rb_mark_extension_as_thread_safe();
  rb_mark_extension_as_ractor_safe();

  VALUE cMyClass = rb_define_class("MyClass", rb_cObject);
  rb_define_method(cMyClass, "foo", foo, 0); // foo() can be executed in parallel safely
}

Concretely, rb_mark_extension_as* would set some thread-local state so the following rb_define* would know they are thread/ractor-safe.
That state would be reset after executing Init_foo.
We need to use the save/restore pattern in case Init_foo loads another C extension.
That assumes loading a C extension happens on a single-thread, but that seems very much the case.

It's an imperative vs declarative approach.
Declarative seems slightly better to me because the semantics are a bit simpler (there can be no code before rb_mark_extension_as*, it is possible to know before starting to execute Init_foo).

Updated by Eregon (Benoit Daloze) 2 months ago

An advantage of the symbol approach is it's automatically backward compatible.
Otherwise, we need to design a way to check if the rb_mark_extension_as* functions are available (could be via a macro or so, so no need for a slow have_func check for that).

ko1 (Koichi Sasada) also thought about a single function like rb_concurrency_safety( RACTOR_SAFE | THREAD_SAFE ) (we'd need some extra prefixes there).

Updated by Eregon (Benoit Daloze) 2 months ago

ko1 (Koichi Sasada) discussed with matz (Yukihiro Matsumoto) and they concluded with this interface:

Init_foo()
{
  RB_EXT_CONFIG({
    .ractor_safe = true,
    .thread_safe = true,
  });
  rb_define_method(mod, "foo", func, arity);
}

Sounds good to me.
Here is a sketch of an implementation, also with a "3rd variant" to check it's extensible if we need it in the future:

#include <stdbool.h>
#include <stdio.h>
struct rb_ext_config {
  bool ractor_safe;
  bool thread_safe;
  bool third_variant_safe;
};
#define RB_EXT_CONFIG(...) \
  struct rb_ext_config config = __VA_ARGS__; \
  rb_ext_config(&config);
void rb_ext_config(struct rb_ext_config *config) {
  printf("ractor_safe: %d\n", config->ractor_safe);
  printf("thread_safe: %d\n", config->thread_safe);
  printf("third_variant_safe: %d\n", config->third_variant_safe);
}
int main(int argc, char const *argv[]) {
  RB_EXT_CONFIG({
    .ractor_safe = true,
    .thread_safe = true,
  });
  return 0;
}

outputs:

ractor_safe: 1
thread_safe: 1
third_variant_safe: 0

For backward compatibility, it will also be needed to wrap if in #ifdef RB_EXT_CONFIG, which seems fine.

However, this struct design means all implementations will have to define both the ractor_safe and thread_safe members,
otherwise it will fail to compile.

And if/when we would add a third member, then we would need some way to check if that member is declared, otherwise it wouldn't be backward compatible.
This seems to work, although I'm not sure if such usages of the preprocessor are well defined:

  RB_EXT_CONFIG({
    .ractor_safe = true,
    .thread_safe = true,
#ifdef RB_EXT_CONFIG_VARIANT3
    .third_variant_safe = true,
#endif
  });

Updated by nobu (Nobuyoshi Nakada) 2 months ago

Eregon (Benoit Daloze) wrote in #note-5:

However, this struct design means all implementations will have to define both the ractor_safe and thread_safe members,
otherwise it will fail to compile.

Not all members need to be initialized, otherwise “zero”ed.

And if/when we would add a third member, then we would need some way to check if that member is declared, otherwise it wouldn't be backward compatible.

Indeed, it can be a problem.
Another idea which was discussed together, checking particular symbols may be possible, except for portabilities.

Updated by Eregon (Benoit Daloze) 2 months ago

nobu (Nobuyoshi Nakada) wrote in #note-6:

Not all members need to be initialized, otherwise “zero”ed.

Right, I meant CRuby will have to declare both members for the struct, even though thread_safe might not be useful for CRuby yet. I think it's fine though.

Indeed, it can be a problem.

The #ifdef RB_EXT_CONFIG_VARIANT3 above seems a decent way if we need a third member.

Another idea which was discussed together, checking particular symbols may be possible, except for portabilities.

Are there some platforms where dlsym() or equivalent is not available?
We need to call Init_foo, so I guess we should be able to look if there is a foo_is_(thread|ractor)_safe symbol too on all platforms.

Updated by ko1 (Koichi Sasada) 2 months ago

Are there some platforms where dlsym() or equivalent is not available?

I didn't think it is a problem.
But it should be a problem for static link version (make static link ruby binary with all extensions).

Nobu told me that statically linked ruby can find the global symbols with dlopen(NULL, 0) (use NULL for file path).
However I'm not sure this technique is feasible on all of platform.
Does anyone know?

Updated by ko1 (Koichi Sasada) 2 months ago

I agree only a few people using statically linked ruby with extensions.
So another option is ignore unsupported platforms.

Updated by aardvark179 (Duncan MacGregor) 2 months ago

I talked this over with Eregon yesterday in the context of doing this work in TruffleRuby, and we came to the conclusion that there is a real advantage in being able to selectively mark methods and procedures as thread safe or not. For example a database gem might not be able to declare itself fully thread safe as a whole, but still be able to get significant benefit from marking all the methods used to access individual fields in a record as thread safe so that they could be run while another thread is in the process of executing a different query. My gut feeling would be that very few C extensions could be marked as entirely thread safe, and it would take significant engineering effort to make get them to that that state. This would mean we'd want calls in the init function that look something like

static VALUE foo(VALUE self) {
  ...
}

static VALUE bar(VALUE self) {
  ...
}

void Init_foo(void) {
  rb_mark_declarations_as_thread_safe(true);
  rb_mark_declarations_as_ractor_safe(true);

  VALUE cMyClass = rb_define_class("MyClass", rb_cObject);
  rb_define_method(cMyClass, "foo", foo, 0); // foo() can be executed in parallel safely

  rb_mark_declarations_as_thread_safe(false);
  rb_mark_declarations_as_ractor_safe(false);

  rb_define_method(cMyClass, "bar", bar, 0); // bar() cannot be executed in parallel safely
}

There are some places where we would have to make decisions about how things behave. If you are defining a proc or method from a C extension does its thread safety depend on the method defining it, should the rb_mark_declaration... functions be called explicitly, or should we provide new APIs for defining procs and methods?

Updated by matz (Yukihiro Matsumoto) about 2 months ago

I am OK with having API to specify Ractor/Thread-safety. But thread-safety here is a bit ambiguous. For example, object allocation is not thread-safe in CRuby but probably safe in TruffleRuby. We need to clarify the definition of thread-safety before introducing the thread-safe mark. Or it should be TruffleRuby specific thread-safe mark.

Matz.

Updated by Eregon (Benoit Daloze) about 2 months ago

By marking as thread-safe, I mean that the C extension would manipulate its own state in a thread-safe manner. I think the various APIs mentioned above all make that clear.

The thread safety of the interpreter is not considered here. It's up to the interpreter to make use or not of these flags on the C extension.

We should indeed define better what thread-safe for a C extension means in practice, i.e. some docs on how making a C extension thread-safe.
I or Duncan will prototype something in TruffleRuby and with a couple C exts, and write some documentation about that.

It would be useful if ko1 (Koichi Sasada) can write some documentation about how to make a C extension Ractor-safe, as I think he knows best.


In short, all rb_* functions are expected to be thread-safe on TruffleRuby.

I think CRuby never really had the concept of thread safety for the interpreter, because it never had shared memory parallelism. The GVL/GIL basically avoids any thread-safety issue by not allowing Ruby Threads to run in parallel (inside a Ractor, different Ractors have almost isolated heaps, so there is no actual shared memory at the Ruby level).
(there is rb_thread_call_without_gvl() but basically calling any rb_*() function inside is unsafe).

So, "thread-safe C extensions" would probably not be directly useful to CRuby as long as the GVL exists. So CRuby can ignore the thread-safe flag basically, and instead the Ractor-safe flag is the interesting part for CRuby.
I think we'd want to avoid having "truffleruby" in the name for the thread-safe flag, because there could be other Ruby implementations in the future with C extension support and parallel threads (Rubinius is an example, although it's inactive recently).
If a Ruby implementation supports parallel threads, it needs most things in the interpreter to be thread-safe. C extension APIs too.

Updated by ko1 (Koichi Sasada) about 2 months ago

It would be useful if ko1 (Koichi Sasada) can write some documentation about how to make a C extension Ractor-safe, as I think he knows best.

Yes. I'll write it before Ruby 3.0 release.


The thread safety of the interpreter is not considered here. It's up to the interpreter to make use or not of these flags on the C extension.

Yes, it is one idea to define what is "thread-safe".
However, I'm not sure we can use this definition because rb_ APIs are thread-unsafe on MRI.

Updated by shyouhei (Shyouhei Urabe) about 2 months ago

Eregon (Benoit Daloze) wrote in #note-12:

By marking as thread-safe, I mean that the C extension would manipulate its own state in a thread-safe manner. I think the various APIs mentioned above all make that clear. The thread safety of the interpreter is not considered here.

Am I missing something? This sounds eccentric to me. Whether a C function is thread safe or not is rather a common concept. "This C function is otherwise thread-safe unless any functions it (maybe recursively) calls break thread-safety" is NOT a thread-safe C function itself. Given rb_* families differ in their thread-safety among implementations, it is nearly impossible for thread-safe C functions to use them, which effectively means almost everything an extension library concerns cannot be thread-safe.

Updated by Eregon (Benoit Daloze) about 2 months ago

shyouhei (Shyouhei Urabe) wrote in #note-14:

Am I missing something? This sounds eccentric to me. Whether a C function is thread safe or not is rather a common concept. "This C function is otherwise thread-safe unless any functions it (maybe recursively) calls break thread-safety" is NOT a thread-safe C function itself. Given rb_* families differ in their thread-safety among implementations, it is nearly impossible for thread-safe C functions to use them, which effectively means almost everything an extension library concerns cannot be thread-safe.

Marking as thread-safe would mean nothing if rb_* functions are not thread-safe.
So on CRuby it would do nothing.
So the condition is "if this C extension function is thread-safe, assuming rb_* functions are thread-safe.".
If rb_* functions are not thread-safe, then the marking has no effect/is not used by anything.
shyouhei (Shyouhei Urabe) Does that make sense?

Updated by shyouhei (Shyouhei Urabe) about 2 months ago

Eregon (Benoit Daloze) wrote in #note-15:

shyouhei (Shyouhei Urabe) wrote in #note-14:

Am I missing something? This sounds eccentric to me. Whether a C function is thread safe or not is rather a common concept. "This C function is otherwise thread-safe unless any functions it (maybe recursively) calls break thread-safety" is NOT a thread-safe C function itself. Given rb_* families differ in their thread-safety among implementations, it is nearly impossible for thread-safe C functions to use them, which effectively means almost everything an extension library concerns cannot be thread-safe.

Marking as thread-safe would mean nothing if rb_* functions are not thread-safe.
So on CRuby it would do nothing.
So the condition is "if this C extension function is thread-safe, assuming rb_* functions are thread-safe.".
If rb_* functions are not thread-safe, then the marking has no effect/is not used by anything.
shyouhei (Shyouhei Urabe) Does that make sense?

This is not what I know is a thread-safety. I understand what you need, but you should name the property differently than thread-safe, like for instance Truffle safe.

Updated by Eregon (Benoit Daloze) about 2 months ago

shyouhei (Shyouhei Urabe) wrote in #note-16:

This is not what I know is a thread-safety. I understand what you need, but you should name the property differently than thread-safe, like for instance Truffle safe.

Could you explain what differs or what is your definition of thread safety?

I guess it's something like "sufficient synchronization around shared mutable state + every function called is thread-safe or synchronized by all callers"?
When calling, e.g., libc functions, one needs to consult the man page for thread-safety.
Similarly, when looking at rb_* functions, one needs to look the documentation of the Ruby implementation for whether these functions are thread-safe.
That's how I see it.

And again, such a definition is not specific to TruffleRuby, it could apply to any Ruby implementation with parallel threads and C extension support, such as Rubinius and maybe others in the future.

Maybe we can use another term, something like "parallel safe", but in the end the necessary condition is that calling the C extensions functions in parallel is correct, i.e., that these functions are thread-safe, assuming the rb_*, libc, etc, functions behave thread-safe as documented in their respective documentation.
I get that it's a bit weird to mark a C extension as thread-safe, given that on CRuby they can't be executed in parallel just with that condition.

But, on CRuby rb_* functions must be called under the GIL so rb_* are thread-safe on CRuby when used correctly (under the GIL).
I think that makes thread-safe the most appropriate term for it.

Updated by shyouhei (Shyouhei Urabe) about 2 months ago

Eregon (Benoit Daloze) wrote in #note-17:

shyouhei (Shyouhei Urabe) wrote in #note-16:

This is not what I know is a thread-safety. I understand what you need, but you should name the property differently than thread-safe, like for instance Truffle safe.

Could you explain what differs or what is your definition of thread safety?

For instance:

#include <string.h>

const char *
foo(int bar)
{
    return strerror(bar);
}

This is not a thread-safe function. strerror(3) is one of those functions which POSIX explicitly specifies to be thread-unsafe (IEEE Std 1003.1 Chapter 2). Now, depending on operating systems this could in fact run in a thread-safe manner. I guess musl libc could be one of such implementation that has reentrancy. That is of course a very god property per-se, but doesn't mean the foo above can be considered thread-safe. In fact it isn't on, for instance, OpenBSD.

Same discussion must apply when we replace that strerror into NUM2CHR.

I guess it's something like "sufficient synchronization around shared mutable state + every function called is thread-safe or synchronized by all callers"?

A C function must be threa-safe as a whole. If a C function calls something thread-unsafe it must not be thread-safe itself. POSIX defines Thread-Safe to be "a thread-safe function can be safely invoked concurrently with other calls to the same function, or with calls to any other thread-safe functions, by multiple threads" (IEEE Std 1003.1 Chapter 3). It doesn't allow a part of C function be unsafe due to it is call(s) to external function(s).

When calling, e.g., libc functions, one needs to consult the man page for thread-safety.
Similarly, when looking at rb_* functions, one needs to look the documentation of the Ruby implementation for whether these functions are thread-safe.
That's how I see it.

I'm concerning about annotating a 3rd party C extension to be thread safe. When a manual or document or something states that a function is thread safe, I guess nobody would look into its implementation to see if it contains call to problematic libc routines.

And again, such a definition is not specific to TruffleRuby, it could apply to any Ruby implementation with parallel threads and C extension support, such as Rubinius and maybe others in the future.

OK, I was wrong about calling this TruffleRuby specific. My concern is about portability.

Maybe we can use another term, something like "parallel safe", but in the end the necessary condition is that calling the C extensions functions in parallel is correct, i.e., that these functions are thread-safe, assuming the rb_*, libc, etc, functions behave thread-safe as documented in their respective documentation.
I get that it's a bit weird to mark a C extension as thread-safe, given that on CRuby they can't be executed in parallel just with that condition.

But, on CRuby rb_* functions must be called under the GIL so rb_* are thread-safe on CRuby when used correctly (under the GIL).
I think that makes thread-safe the most appropriate term for it.

This means everything that an extension library could call to be thread-safe at once (hence no lock needed), or to be thread-unsafe at all (hence need lock everywhere). I guess it prevents gradual evolution of the implementation: when a part of our CAPI were made thread safe, but something remains unsafe, There is no way to express that info because everything are already marked safe. Effectively results in no thread safety at all is the easiest solution for us. I don't think that is a good future.

Updated by ko1 (Koichi Sasada) about 2 months ago

Here is implementation: https://github.com/ruby/ruby/pull/3824/files

Documents will be filled soon...

#20

Updated by ko1 (Koichi Sasada) about 2 months ago

  • Status changed from Open to Closed

Applied in changeset git|182fb73c40351f917bf44626c44c1adb6cb1aa5a.


rb_ext_ractor_safe() to declare ractor-safe ext

C extensions can violate the ractor-safety, so only ractor-safe
C extensions (C methods) can run on non-main ractors.
rb_ext_ractor_safe(true) declares that the successive
defined methods are ractor-safe. Otherwiwze, defined methods
checked they are invoked in main ractor and raise an error
if invoked at non-main ractors.

[Feature #17307]

Updated by Eregon (Benoit Daloze) 29 days ago

I've missed this was committed (https://github.com/ruby/ruby/pull/3824), the API seems fine.

Example usage:

#ifdef HAVE_RB_EXT_RACTOR_SAFE
  rb_ext_ractor_safe(true);
#endif

https://github.com/ruby/ruby/commit/3d31944129180

I guess

#ifdef RB_EXT_RACTOR_SAFE
  RB_EXT_RACTOR_SAFE(true);
#endif

works too and avoids the extra have_macro in extconf.rb.

shyouhei (Shyouhei Urabe) I see "thread-safe" as not necessarily the POSIX definition but more like the computer science definition.
I can see it's a bit ambiguous, but it also seems the most fitting term.

In any case, I think any C extension needs to assume calling rb_* functions from a rb_define_method-C-function is safe.
So it's really about the thread-safety of the state of the C extension itself, not about rb_* functions, which need to work if calling rb_define_method-C-function in parallel is possible at all.

Updated by nobu (Nobuyoshi Nakada) 28 days ago

Eregon (Benoit Daloze) wrote in #note-21:

I guess

#ifdef RB_EXT_RACTOR_SAFE
  RB_EXT_RACTOR_SAFE(true);
#endif

works too and avoids the extra have_macro in extconf.rb.

HAVE_RB_EXT_RACTOR_SAFE is defined in a header, you don't need have_func("rb_ext_ractor_safe").

Updated by Eregon (Benoit Daloze) 28 days ago

nobu (Nobuyoshi Nakada) wrote in #note-22:

HAVE_RB_EXT_RACTOR_SAFE is defined in a header, you don't need have_func("rb_ext_ractor_safe").

I see, thanks. I somehow missed that because it wasn't in ko1's original PR, and I forgot to git pull.

Also available in: Atom PDF