Feature #21140
openAdd a method to get the address of certain JIT related functions
Description
Feature #21116 extracted RJIT as a gem. But RJIT accesses certain internal functions which it cannot access as a gem. For example it used the rb_str_bytesize
function, but this symbol is not exported, so we cannot access it (even from a C extension).
Instead of exporting these symbols, I would like to propose an API for getting access to their addresses in Ruby.
For example
RubyVM::RJIT.address_of(:rb_str_bytesize) # => 123456
I would like to limit the addresses to this list which are the ones required by RJIT.
Updated by k0kubun (Takashi Kokubun) 7 days ago
路 Edited
+1
This seems like the right approach to me too.
Updated by nobu (Nobuyoshi Nakada) 7 days ago
I think the functions belong to built-in classes can be exported simply, i.e., other than related to VM.
I want to manage that list least as possible.
Updated by tenderlovemaking (Aaron Patterson) 6 days ago
- Related to Feature #21116: Extract RJIT as a third-party gem added
Updated by tenderlovemaking (Aaron Patterson) 6 days ago
nobu (Nobuyoshi Nakada) wrote in #note-2:
I think the functions belong to built-in classes can be exported simply, i.e., other than related to VM.
I want to manage that list least as possible.
I understand. I think though, if YJIT uses some function, then I think we should make that function available for 3rd party JITs. Can we use YJIT's bindgen code to generate the list? Then we don't have to specifically maintain a separate list.
I made a list of symbols that RJIT uses but are not available:
"rb_ary_entry_internal"
"rb_ary_tmp_new_from_values"
"rb_ary_unshift_m"
"rb_ec_ary_new_from_values"
"rb_ec_str_resurrect"
"rb_ensure_iv_list_size"
"rb_fix_aref"
"rb_fix_div_fix"
"rb_fix_mod_fix"
"rb_fix_mul_fix"
"rb_get_symbol_id"
"rb_gvar_get"
"rb_hash_new_with_size"
"rb_hash_resurrect"
"rb_obj_as_string_result"
"rb_reg_new_ary"
"rb_str_bytesize"
"rb_str_concat_literals"
"rb_str_eql_internal"
"rb_str_getbyte"
"rb_sym_to_proc"
"rb_vm_bh_to_procval"
"rb_vm_concat_array"
"rb_vm_defined"
"rb_vm_get_ev_const"
"rb_vm_getclassvariable"
"rb_vm_ic_hit_p"
"rb_vm_opt_newarray_hash"
"rb_vm_opt_newarray_max"
"rb_vm_opt_newarray_min"
"rb_vm_opt_newarray_pack"
"rb_vm_set_ivar_id"
"rb_vm_setclassvariable"
"rb_vm_setinstancevariable"
"rb_vm_splat_array"
"rb_vm_throw"
"rb_vm_yield_with_cfunc"
We could probably export many of these functions, but I guess there are a significant number of rb_vm_*
functions. If we could reuse YJIT's bindgen code, that might make maintenance easier.
Updated by tenderlovemaking (Aaron Patterson) 6 days ago
As an example,
Both RJIT and YJIT use rb_ary_entry_internal
. YJIT solves this by wrapping the function here (as rb_yarv_ary_entry_internal
), then it adds the function to bindgen here. We could change RJIT to use rb_yarv_ary_entry_internal
, and also use YJIT's bindgen code to generate RubyVM::RJIT.address_of(:rb_yarv_ary_entry_internal)
. Then we don't have to maintain a specific list of symbols since YJIT's bindgen code must be updated when YJIT needs to update.
Updated by Eregon (Benoit Daloze) 5 days ago
IMO it's better to have them through RubyVM::RJIT.address_of
than exporting them, because that way it's very clear they are not part of the public Ruby C API.
And so e.g. it's expected that TruffleRuby does not expose these internal functions and that they should only be used for RJIT purposes.
Updated by tenderlovemaking (Aaron Patterson) 5 days ago
I made a patch here.
It generates a function based on YJIT's bindgen file. Whatever functions YJIT exposes in bindgen are also available via the API.
I named the method:
RubyVM::Internals.address_of(:rb_vm_ci_argc)
I don't have any particular opinion on what the name should be, but I think address_of
makes sense. Also I chose RubyVM::Internals
because I don't think it should be specific to RJIT (I would like to use this in my own JIT compilers).
Updated by maximecb (Maxime Chevalier-Boisvert) 3 days ago
I'm skeptical of the idea of having third-party JITs as gems. This is exposing a ton of internal APIs that were not previously exposed, which could be potentially problematic if people start to rely on them. You have to think that random gems that are not actually JITs could begin to use these APIs.
I can't stop you from making this change, but Ruby has a history of merging new features too fast without carefully considering the full implications. This is going to sound cynical, but Ruby is not your personal side-project, it's a piece of software that millions of people rely on. If you want a playground to build a JIT and have fun, why not build your own implementation of Lox from Crafting Interpreters or fork an existing one? I'm sorry if this sounds harsh, but I think we all need to ponder merging big changes really carefully. You too should at least try to play the devil's advocate here. What are the downsides?
My two biggest concerns:
-
The additional maintenance burden of random gems relying on internal APIs they shouldn't rely on. Think of the JIT challenges we run into with people abusing binding now. The Ruby public API surface is already too big imo.
-
What does this mean for security? If you have access to these APIs from Rubyland you can potentially take control of the Ruby VM. Is access to these internal APIs restricted somehow?
Updated by tenderlovemaking (Aaron Patterson) 3 days ago
maximecb (Maxime Chevalier-Boisvert) wrote in #note-8:
I'm skeptical of the idea of having third-party JITs as gems. This is exposing a ton of internal APIs that were not previously exposed, which could be potentially problematic if people start to rely on them. You have to think that random gems that are not actually JITs could begin to use these APIs.
These APIs are already exposed via RJIT in current releases. Since we've extracted RJIT as a gem, I don't think RJIT can work without access to these.
I can't stop you from making this change, but Ruby has a history of merging new features too fast without carefully considering the full implications. This is going to sound cynical, but Ruby is not your personal side-project, it's a piece of software that millions of people rely on.
I think it can be both a side project as well as a piece of software that millions of people rely on. That's been at the core of the culture of the global Ruby community since its inception. I think for many of us on the Ruby-core team, it is a side-project, and I don't think it's right to take that aspect away. What Kokubun and I are trying to achieve is to give people a way to experiment with the language in ways you may not be able to imagine right now.
If you want a playground to build a JIT and have fun, why not build your own implementation of Lox from Crafting Interpreters or fork an existing one? I'm sorry if this sounds harsh, but I think we all need to ponder merging big changes really carefully. You too should at least try to play the devil's advocate here. What are the downsides?
My two biggest concerns:
- The additional maintenance burden of random gems relying on internal APIs they shouldn't rely on. Think of the JIT challenges we run into with people abusing binding now. The Ruby public API surface is already too big imo.
I think it's important we document that this API is unstable / unreliable. That is why I called it RubyVM::Internals
to try to indicate how private it is. Additionally it's an API that just returns an integer, so using this API is particularly hard.
- What does this mean for security? If you have access to these APIs from Rubyland you can potentially take control of the Ruby VM. Is access to these internal APIs restricted somehow?
I don't think this change has any impact with regard to security. This information can be recovered via dlsym
or parsing ELF / DWARF. This change just makes access somewhat easier.
Updated by maximecb (Maxime Chevalier-Boisvert) 3 days ago
路 Edited
I think it can be both a side project as well as a piece of software that millions of people rely on.
I apologize for the tone of my post which was rather hostile. I woke up with a pretty bad headache this morning and was in a grumpy mood. I should have worded my thoughts more kindly. My main point is that I am afraid that things get merged into Ruby without fully weighing the implications eg Ractors. This was merged because of the enthusiasm of one specific core dev, but it's been in non-working state until recently.
I am sure that your API will work but as we were discussing in the YJIT meeting, we have a problem where C extensions already have access to lots of things which they really shouldn't have direct access to. We should make sure to tell Ruby extension developers "this is a JIT API and there are no stability guarantees between Ruby versions". This should be made 1000% clear in the documentation with ALL CAPS AND BOLD FONTS but there is still a risk that people will abuse it and some gems could break. We can say "oh well, their fault, they were stupid", but imagine if 5 years from now we make some CRuby change and 3 gems that Shopify depends on blow up. What do we do then? We'd have no choice but to roll back those CRuby changes and it could stall CRuby development in some areas.
My recommendation: guard this API behind a special configure flag that is separate from YJIT's. Something like --enable-jit-gem-api
. That way you get to have your cake and eat it too. Ruby devs can build a custom Ruby and do anything they want with it. You get to write your own Ruby JIT gem and embed it into your IoT toaster. You can even build your own Ruby and deploy it into production at your startup if you want to, but you also effectively shield the rest of Ruby users from security and avoid Ruby gems that have no JIT needs becoming dependent on this API.
Updated by ufuk (Ufuk Kayserilioglu) 3 days ago
maximecb (Maxime Chevalier-Boisvert) wrote in #note-10:
if 5 years from now we make some CRuby change and 3 gems that Shopify depends on blow up. What do we do then?
We would fix them forward. That's why we have our daily Ruby-head CI running, so that we can catch these kinds of changes as early as possible and fix any code that needs changing. This is something we've been doing for the last 3 years and we've fixed many similar incompatibilities in our codebase and/or our dependencies as appropriate.
My recommendation: guard this API behind a special configure flag that is separate from YJIT's. Something like
--enable-jit-gem-api
.
I think that would result in RJIT and any other experimental JIT related project being completely irrelevant and kill any kind of experimentation on the platform.
Ruby has always been a language of folks running with scissors, and I don't think we should stop doing that now.
Updated by maximecb (Maxime Chevalier-Boisvert) 3 days ago
路 Edited
We would fix them forward. That's why we have our daily Ruby-head CI running, so that we can catch these kinds of changes as early as possible and fix any code that needs changing.
You would't know about the gems that might break until someone actually merges the change into CRuby. At this point, you would be forced to delay deploying new versions of CRuby to Core/SFR until the gem issues are fixed. This would mean either changing our code to not use the broken gem (unknown time / difficulty), or trying to fork the gem or get the author to fix it (unknown time / difficulty). Either way, it's another small crisis that we need to handle and we potentially waste a significant amount of time on.
I think that would result in RJIT and any other experimental JIT related project being completely irrelevant and kill any kind of experimentation on the platform.
No it wouldn't. YJIT was guarded by a configure option at the beginning. Building a custom Ruby is not that difficult for people to do. If you want a slightly less difficult option, then I would say make it a command-line flag. Still has largely the same benefits but it saves you the building a custom Ruby step. Unless you think that having to write --yjit
made YJIT irrelevant.
Ruby has always been a language of folks running with scissors, and I don't think we should stop doing that now.
That kind of sounds like: people have made bad decisions in the past, and there have been predictable negative consequences, but we should keep making bad decisions with predictable outcomes because there is a precedent for doing so?
Imo good software engineering requires care and discipline. There is a reason why we make plastic scissors to give to kids, etc. There is a reason why rocket launchers have an arrow that says "this side towards enemy", even though soldiers are presumably adults that have been given prior training.
Updated by Eregon (Benoit Daloze) 3 days ago
路 Edited
Would this API be needed for e.g. a JIT in the FFI gem? (https://railsatscale.com/2025-02-12-tiny-jits-for-a-faster-ffi/)
Looking at the PoC I'm unsure but I think not.
I think it's a good idea to have this API not accessible by default because it is deep internals which are meant for experimentation and nothing else.
If there is a convincing non-experimental use case that needs it, then it's a feature it's not accessible by default, because we would actually want to re-discuss and potentially expose the needed parts properly.
This information can be recovered via dlsym
Why not just use dlsym() then? (e.g. after dlopen(NULL, flags)
)
It seems basically equivalent to this API (it also takes a function name and returns an integer), and anyway one needs to ability to call native functions to use this API.
Updated by tenderlovemaking (Aaron Patterson) 3 days ago
Eregon (Benoit Daloze) wrote in #note-13:
Would this API be needed for e.g. a JIT in the FFI gem? (https://railsatscale.com/2025-02-12-tiny-jits-for-a-faster-ffi/)
Looking at the PoC I'm unsure but I think not.
It's hard to know for sure until we finish it 馃槄
I think it's a good idea to have this API not accessible by default because it is deep internals which are meant for experimentation and nothing else.
If there is a convincing non-experimental use case that needs it, then it's a feature it's not accessible by default, because we would actually want to re-discuss and potentially expose the needed parts properly.
I don't think providing the method hurts anything. It's experimental and unstable, so people who build anything with it should understand the risks of depending on it.
This information can be recovered via dlsym
Why not just use dlsym() then? (e.g. after
dlopen(NULL, flags)
)
It seems basically equivalent to this API (it also takes a function name and returns an integer), and anyway one needs to ability to call native functions to use this API.
Many of the functions would work, but the list of functions I provided above do not. Their symbols are not available via dlsym
. They can be recovered via DWARF, but from my experience with TenderJIT v1, it's a huge pain.
[aaron@tc-lan-adapter ~]$ ruby -v -r fiddle -e'p Fiddle::Handle::DEFAULT["rb_vm_throw"]'
ruby 3.5.0dev (2025-02-14T21:16:53Z master ba148e71e5) +PRISM [arm64-darwin24]
-e:1:in 'Fiddle::Handle#[]': unknown symbol "rb_vm_throw" (Fiddle::DLError)
from -e:1:in '<main>'
[aaron@tc-lan-adapter ~]$ ruby -v -r fiddle -e'p Fiddle::Handle::DEFAULT["rb_shape_id"]'
ruby 3.5.0dev (2025-02-14T21:16:53Z master ba148e71e5) +PRISM [arm64-darwin24]
4306064836
Both rb_vm_throw
and rb_shape_id
are listed in the YJIT bindgen file, but only one symbol is visible via dlsym
. If all of the symbols were available via dlsym
then I would not propose this API.
Updated by maximecb (Maxime Chevalier-Boisvert) 3 days ago
I don't think providing the method hurts anything. It's experimental and unstable, so people who build anything with it should understand the risks of depending on it.
People do lots of things they shouldn't do though.
What do you think about the idea of guarding it behind a Ruby command-line argument such as --enable-jit-gem-api
or something similar?
Updated by tenderlovemaking (Aaron Patterson) 3 days ago
maximecb (Maxime Chevalier-Boisvert) wrote in #note-15:
I don't think providing the method hurts anything. It's experimental and unstable, so people who build anything with it should understand the risks of depending on it.
People do lots of things they shouldn't do though.
What do you think about the idea of guarding it behind a Ruby command-line argument such as
--enable-jit-gem-api
or something similar?
I'd rather not add more flags and conditionals. But if that's what it takes to add this API then I would do it.
Updated by Eregon (Benoit Daloze) 3 days ago
tenderlovemaking (Aaron Patterson) wrote in #note-14:
I don't think providing the method hurts anything. It's experimental and unstable, so people who build anything with it should understand the risks of depending on it.
I used to be of that opinion but clearly time has shown that people or gems have abused RubyVM too much already.
A prime example of that is quite a few gems depend or depended on RubyVM::AbstractSyntaxTree, even though it was marked as experimental/unstable/etc (e.g the order of node children could change incompatibly at any time, and no way to access by name).
From my POV it took several years of work on Prism, making it the official API for parsing Ruby, and then migrating those gems to Prism to clean that mess.
I think RubyVM::AbstractSyntaxTree should have never been available by default, it should have been behind a configure flag or so, as a debug/research tool (or only as text output, like --dump=parsetree
).
A runtime flag might have been enough (I'm not sure, it feels more risky) to discourage using it for anything but experiments.
Similar for RubyVM::InstructionSequence, it would be possible to design an API which is not tight to CRuby bytecode but can support serializing & deserializing a binary form of source code.
But because RubyVM::InstructionSequence exists it will probably never be attempted.
Things like getting the start/end column and end line got delayed by years because there were workarounds with RubyVM
, which other Ruby implementations must not implement to not break more code.
I can see how this request is not really of the same scope, but it is asking to expose deep internals of the VM like non-exported (static
) functions.
Those are likely to change so it feels wrong to expose them in any way.
When RJIT was part of core it was more OK to use such internals as it could be evolved with CRuby, but that's no longer the case so it should use more stable APIs/rely less on internals.
How about reviewing the functions RJIT really needs, and those it could work around?
And then maybe having the ones really needed as exported functions (e.g. without declaration in header to make it not too easy) or static inline (in a separate header to make it clear it's not part of the Ruby C API).
BTW for static inline
functions those can easily be accessed with a C extension, e.g. rb_ary_entry_internal()
in https://github.com/ruby/ruby/blob/27ba268b75bbe461460b31426e377b42d4935f70/internal/array.h#L54-L68.
Updated by tenderlovemaking (Aaron Patterson) 3 days ago
Eregon (Benoit Daloze) wrote in #note-17:
How about reviewing the functions RJIT really needs, and those it could work around?
This seems like a lot of work. Are you volunteering? 馃槣
The functions listed in RJIT's bindgen file are very similar to YJIT's, which is no surprise as RJIT is based on YJIT. AFAICT, they use the same functions, just that YJIT wraps some (as I mentioned here).
And then maybe having the ones really needed as exported functions (e.g. without declaration in header to make it not too easy) or static inline (in a separate header to make it clear it's not part of the Ruby C API).
This seems reasonable, but I'm worried about getting bogged down debating about each function and whether or not it is "really needed". That's why I wanted to use the set that RJIT/YJIT use already. I assume it's not just using the function "for fun" and actually has a purpose.
Updated by maximecb (Maxime Chevalier-Boisvert) 3 days ago
路 Edited
That's why I wanted to use the set that RJIT/YJIT use already. I assume it's not just using the function "for fun" and actually has a purpose.
It's tough because the set of functions a JIT needs tends to grow over time. For example, YJIT never did much to optimize hash access, but in the future we would probably want to do that, so we would need to expose more functions. As such I don't think it makes sense to have some kind of a fixed list if the use case is JIT compilers.
Benoit is making the same argument I've made, which is that it's fragile to expose such a large set of internal functions without any safeguards. It's a basic but unfortunate fact of language design that once you add a feature, it's very hard to take it back... Even if you tell people they shouldn't use it. I think this point is 100% valid. I know that you would prefer no safeguards because it seems more convenient slash fun to play with in the short term, but IMO it's only reasonable to ask for either a configure flag or a command-line flag. It probably should be a configure flag.
Updated by Eregon (Benoit Daloze) 2 days ago
Thinking a bit more about this, how is RJIT going to work since on master it seems everything related to it has been removed?
For example when looking at the FFI JIT PoC it uses RubyVM::RJIT::C
(which is only defined when passing --rjit
on the command line BTW) but that no longer exists at all on master.
Primitive.cexpr!()
(and Primitives in general) as used here is not available outside of core, and so AFAIK there won't be any way to find out struct offsets of rb_control_frame_t
.
BTW exposing struct offsets is IMO more dangerous and risky than the list of internal functions above. It reminds me of old debugger gems which used to copy internal headers from some given CRuby version, that's very brittle.
IOW, I think it would be good to figure out how RJIT is going to work as a gem, I suppose right now it cannot work as a gem.
Maybe some part of it not depending on Fiddle should stay in core?
Updated by maximecb (Maxime Chevalier-Boisvert) 2 days ago
Correct me if I'm wrong: for struct offsets I'm assuming they will have to parse C header files or DWARF files?