Actions

Copy link

Feature #21221

open

Proposal to upstream ZJIT

Added by maximecb (Maxime Chevalier-Boisvert) 4 months ago. Updated 3 months ago.

Status:

Assigned

Assignee:

maximecb (Maxime Chevalier-Boisvert)

Target version:

3.5

[ruby-core:121560]

Description

Background¶

For the past 3 months, the YJIT team at Shopify has been working on a next-generation Ruby JIT, which we refer to as ZJIT. This new compiler is currently being developed in a private fork, with the hope that we can eventually upstream it into ruby/ruby. Maxime Chevalier-Boisvert will give a talk at RubyKaigi 2025 to officially announce the project to the Ruby community and the broader public.

Design of ZJIT¶

YJIT compiles YARV bytecode more or less directly to machine code. This has the benefit that YJIT compiles code fast and that it’s a relatively simple architecture, which was helpful in building the compiler incrementally. The downside is that YJIT is difficult to extend and build upon. In particular, YJIT is very limited when it comes to optimizations that cross YARV instruction boundaries. We’ve known for some time that in order to unlock higher levels of performance, a Ruby JIT would need the ability to perform more aggressive inlining, but it is challenging to cleanly do this in YJIT.

The main innovation of ZJIT is that it has its own Static Single Assignment (SSA) Intermediate Representation (IR). YARV bytecode is converted into this IR, which can then be optimized using multiple optimization passes. These passes can be orthogonal and modular to some degree, which makes the design of the compiler easier to reason about, easier to extend and also opens up the possibility of having multiple JIT compilation tiers in the future, which is something that both Matz and Takashi Kokubun have hoped to see in a Ruby JIT for some time.

In addition to this, we are moving to a more traditional method-based JIT compiler design. I (Maxime) have had the chance to help build and deploy Lazy Basic Block Versioning (LBBV), an offshoot of my PhD research into a production compiler, an opportunity which I’m very thankful for. However, I feel like ZJIT might benefit from having a compiler architecture that is more “standard”. With YJIT, we’ve had very few contributions from Ruby Core contributors outside of Shopify. I’m hoping that if we build a compiler with an architecture that is more textbook-like, we could have a project that is more approachable for new contributors and thus more inclusive of other Ruby core members outside of Shopify, which would be great for the long-term future of Ruby.

Current Status¶

It is still early days for ZJIT. We are only 3 months into its development. As such, ZJIT is currently very much incomplete and can only run small tests and microbenchmarks. Nonetheless, we would still like to upstream it because developing in a fork makes it much harder to keep up with upstream changes in Ruby. We’re hoping to bring it much farther along this year, and we believe that ZJIT will be fairly unintrusive in the upstream repo given that it will have no more dependencies than YJIT, and it will also be guarded by a command-line switch.
Ruby 3.5 / 2025 Objectives
Our goal for the end of the year is to bring ZJIT approximately at parity with YJIT in terms of performance. We expect that it will be relatively easy to outperform YJIT on small microbenchmarks, but that matching YJIT’s performance across larger Ruby programs will take several months because of the breadth of Ruby features used. It is non-trivial to efficiently handle megamorphic call sites making use of keyword arguments, for instance.

It should not be difficult to bring ZJIT at parity with YJIT in terms of supported/unsupported Ruby features, because the JIT compiler can always fall back to the interpreter for any feature it doesn’t support.

Some features we aim to implement/complete in time for Ruby 3.5:

Fast JIT-to-JIT calls using a custom calling convention
Polymorphic inline caches
Support for full deoptimization (e.g. for TracePoint)
Ability to deoptimize single functions (e.g. method redefined, caller gets deoptimized)
Side-exit much less often than YJIT (crucial for good performance)
Ability to serialize machine code
Dead-code elimination, constant propagation
Fusion of comparison and branch instructions

Stretch goals and longer-term goals:

Support two JIT compilation tiers
Aggressive inlining of Ruby calls
Optimize GC allocations
Allocation elision to speed up allocations and reduce GC pressure

We are currently using a modified/improved version of the YJIT backend to generate machine code. This means ZJIT is coming out of the gate with support for both x86-64 and arm64, as YJIT did.

The ability to serialize machine code is something that we hope to be able to implement in ZJIT. This would allow us to save compiled code and reuse it in future executions of a given program. This would enable faster startup times. We know from experience that this is important in production environments such as Shopify’s where code can be (re)deployed several times a day, but it also makes sense on a smaller scale where individuals run code on a personal computer and can benefit from software starting up faster.

Merging Logistics¶

Like its predecessor, ZJIT is written in Rust, and has very few dependencies by design. In particular, there are no external dependencies outside of the Rust compiler (rustc) to build ZJIT with Ruby.

Given that it is very early in ZJIT’s development process, we would like to upstream ZJIT without replacing YJIT, so as to ensure that Ruby 3.5 ships with a well-tested, production-ready JIT. As with YJIT, we would like to suggest that ZJIT should be guarded by a --zjit command-line switch. Since using the compiler is opt-in, there is very little risk for the average Ruby user. We can adjust the way we advertise ZJIT at the time of the Ruby 3.5 release and how much we want the broader Ruby community to try it based on its level of maturity at that point. If ZJIT is not sufficiently mature, we can simply tell people that it is experimental and only for enthusiasts, and recommend that they use YJIT instead.

We are currently developing ZJIT using Rust 1.85.0 so that we can use the 2024 edition of Rust. This shouldn’t be a problem since Rust can easily be installed using the rustup tool, and if a sufficiently recent Rust compiler is not available, CRuby can still build without ZJIT, or with YJIT only (YJIT requires rustc 1.58.0).

In terms of build strategy, if a recent version of the Rust cargo tool is installed, it may be possible to automatically build both YJIT and ZJIT in the same binary. Otherwise, YJIT could be built without ZJIT as long as rustc 1.58.0 or more a recent version is available. If neither is available, then CRuby can be built without either JIT as a fallback. Another possibility, if we want to be more conservative for Ruby 3.5, is to only enable building ZJIT if configure is run with an explicit --enable-zjit. We can potentially make this decision closer to the end of the year.

The timeline for upstreaming would be in the 4 to 6 weeks following RubyKaigi. To merge ZJIT upstream, we will rebase the commits on ruby/ruby‘s master branch and generally preserve the commit history. Some commit messages will be cleaned up and improved prior to merging. Some commits which are logically related may be squashed together. We will only enable a small subset of CI tests for ZJIT at first, so as to keep all tests passing.

Actions

Copy link

#1 [ruby-core:121575]

Updated by matz (Yukihiro Matsumoto) 4 months ago

I agree with making ZJIT upstream. And I feel no worry about the migration, since I trust the team with merging process.
I don't think it was BBV's fault that there were so few contributions to YJIT from outside Shopify, but I'll leave it to @maximecb (Maxime Chevalier-Boisvert) to make the technical choices.

Matz.

Actions

Copy link

#2 [ruby-core:121588]

Updated by maximecb (Maxime Chevalier-Boisvert) 4 months ago

Thank you for you trust Matz!

I think it should be helpful to onboard new people if we have an architecture that is more "standard", more like something you would read about in a compiler textbook. I will talk a bit about that at RubyKaigi. Connects with the concept of a "strangeness budget: https://steveklabnik.com/writing/the-language-strangeness-budget/

The most important thing though is that this compiler will have a higher-level intermediate representation, which will allow us to have more of a modular/extensible design, and this is orthogonal with BBV.

Actions

Copy link

#3 [ruby-core:121630]

Updated by Eregon (Benoit Daloze) 4 months ago · Edited

Interesting to see work starting on a CRuby optimizing method-based JIT.

For context YJIT is essentially a baseline JIT given it does little inlining and as you say only one optimization pass (YJIT compiles YARV bytecode more or less directly to machine code).

Side-exit much less often than YJIT

Isn't this the main reason to move to a method-based JIT?

I think the second and connected reason is an IR would be less useful for LBBV as it can only optimize a basic block at a time, since it does not know which branch is taken before generating code and executing it to find out. One could optimize a trace/sequence of basic blocks at once after the trace's code is generated & executed for a method but that would mean generating code twice in succession.
So in summary (correct me if I'm wrong) LBBV like tracing JITs cannot optimize well (e.g. no side exits) multiple frequent paths within a method (or it would effectively becomes a method JIT and very little of LBBV would be used then).

What is the plan for gathering profiling information in ZJIT, e.g. receiver class for calls (the interpreter has an inline cache but only if monomorphic), for branches (to avoid generating code for never used branches), etc?

Actions

Copy link

#4 [ruby-core:121631]

Updated by maximecb (Maxime Chevalier-Boisvert) 4 months ago · Edited

For context YJIT is essentially a baseline JIT given it does little inlining and as you say only one optimization pass

I would say it's somewhere in between given it has access to type information. Not to mention, LBBV can essentially dead code elimination with 100% accuracy, which is also a neat trick. Baseline JITs are typically just template JITs where the focus is pure compilation speed.

Side-exit much less often than YJIT
Isn't this the main reason to move to a method-based JIT?

Not really. Higgs, the VM I built for my PhD thesis had no interpreter, it was a pure-JIT system, so zero side-exits. The reason we side-exit is that Ruby has very very many corner cases, and it's hard to handle them all in a JIT, especially in a context where e compile directly from YARV.

I think the second and connected reason is an IR would be less useful for LBBV as it can only optimize a basic block at a time

Depends how you structure your JIT. Higgs had an SSA IR and I did do some very basic method-based optimizations using SSA before generating code with LBBV. YARV is definitely not an ideal IR for any kind of JIT. It's been optimized for interpreter performance, meaning bigger and more complex instructions with more control flow inside each instruction, which is not what you want for a JIT.

A choice we could have made for ZJIT is to use LBBV for the backend part of the compiler. However, I think there is value in having a JIT that has a more standard/textbook design, as it will make onboarding new people easier. This is a deliberate choice. I may not be technical lead of this project forever, and I want to prioritize what I think is best for the Ruby community over my own personal research agenda or getting more citations on my past publications.

Plus, Marc Feeley and his student Olivier Melançon recently published a paper about static/offline BBV, so I think we could actually have a BBV pass that operates directly on the SSA IR if we wanted, but this can be done in a completely modular self-contained way, where this pass can be turned on and off without impacting the rest of the compiler much. So all this is to say, there is the option of doing BBV-type optimizations within the context of a method JIT as well later, if we choose to.

So in summary (correct me if I'm wrong) LBBV like tracing JITs cannot optimize well (e.g. no side exits) multiple frequent paths within a method (or it would effectively becomes a method JIT and very little of LBBV would be used then).

I'd argue LBBV probably shines when you have a few frequent code paths with different types. In a traditional method-based JIT, the type information you obtain via profiling/sampling tends to be incomplete. It also doesn't take control flow paths into account. LBBV can "unroll" the control-flow graph and discover type information down multiple levels of branches. It can also discover this information as the program is executing. If you profile your interpreter, you run into the issue that some branches may just never have been executed when you hit your threshold and decide to JIT the method, so you have incomplete type information about the paths not yet taken. LBBV doesn't have that problem.

What is the plan for gathering profiling information in ZJIT, e.g. receiver class for calls (the interpreter has an inline cache but only if monomorphic), for branches (to avoid generating code for never used branches), etc?

At the moment we're adding our own YARV profiling instructions that we can install/uninstall as needed. We complement that with some intraprocedural type propagation.

Actions

Copy link

#5 [ruby-core:121724]

Updated by tekknolagi (Maxwell Bernstein) 3 months ago

Eregon (Benoit Daloze) wrote in #note-3:

Isn't this the main reason to move to a method-based JIT?

I think the second and connected reason is an IR would be less useful for LBBV as it can only optimize a basic block at a time, since it does not know which branch is taken before generating code and executing it to find out. One could optimize a trace/sequence of basic blocks at once after the trace's code is generated & executed for a method but that would mean generating code twice in succession.

I don't think this is strictly true. One could imagine doing LBBV in "tracing mode" without generating any code, and then generating code after hitting a threshold. This gives more scope to the potential optimizer than a block at a time. Alternatively, you could use LBBV in "baseline mode" and then build and optimize an IR for a "next tier". This feels like it's more in the eye of the beholder than a fundamental limitation of the tech.

So in summary (correct me if I'm wrong) LBBV like tracing JITs cannot optimize well (e.g. no side exits) multiple frequent paths within a method (or it would effectively becomes a method JIT and very little of LBBV would be used then).

I think there is no inherent reason that LBBV can't side-exit. This is a policy decision. I'm actually not sure what you mean here since half the point of a tracing JIT is to side exit on unexpected data or unexpected control paths.

Actions

Copy link

#6 [ruby-core:121726]

Updated by Eregon (Benoit Daloze) 3 months ago

I (mis)understood side-exit as anything getting out of the current compilation unit, whether that's jumping to other compiled code or deoptimizing to interpreter.
But it's clear it's meant only as deoptimizing to interpreter in both latest comments.
In my reply I was even mostly focusing on the "jumping to other compiled code" part of it.
Probably that clarifies the above.

IOW, tracing and I think LBBV as well occur some overhead for multiple frequent paths within a method because of the need to jump between multiple "compiled traces"/compilation units (and separate compilation units means less optimal register allocation, etc as some knowledge is lost from one compilation to the other when compiling).

If you profile your interpreter, you run into the issue that some branches may just never have been executed when you hit your threshold and decide to JIT the method, so you have incomplete type information about the paths not yet taken.

The JIT can then turn that path in a deoptimization and then that doesn't affect the precision of types. i.e. the (branch) profile knows it's dead code. It depends on having branch profiles though.

I don't think this is strictly true. One could imagine doing LBBV in "tracing mode" without generating any code, and then generating code after hitting a threshold.

Mmh but then you'd need a tracing interpreter (seems a non-trivial addition and kinda separate from LBBV), or a more precisely a variant of the interpreter which calls backs into the compiler on branches (that's simpler). Good point, that would give a scope of a trace/sequence of basic blocks without needing to generate native code twice. In that scenario it feels very close to regular tracing.

Actions

Copy link

#7 [ruby-core:121732]

Updated by tekknolagi (Maxwell Bernstein) 3 months ago

Eregon (Benoit Daloze) wrote in #note-6:

In my reply I was even mostly focusing on the "jumping to other compiled code" part of it.

Maybe I am overly optimistic, but again, I do not think this is a fundamental limitation of the tech. If a callee function can have multiple entrypoints and exit points depending on caller context, why should a trace not be able to do the same? Sure, you lose some of the ability to modify code that has already been emitted/frozen, but Cranelift seems to be exploring this space using aegraphs with great success. Maybe this is a bit of a reaching argument.

[...] In that scenario it feels very close to regular tracing.

I would love for someone someday to write a comprehensive article exploring the spectrum of implementations and research: tracing, trace trees, tracelets, (L)BBV, ... that all pertinent authors agree with or at least have contributed to.

Actions

Copy link

Updated by hsbt (Hiroshi SHIBATA) 3 months ago

Status changed from Open to Assigned

Actions

Copy link

Also available in: Atom PDF

Like4

Like0Like0Like0Like0Like0Like0Like0Like0

Project

General

Profile

Ruby

Tags

Custom queries

Feature #21221

Proposal to upstream ZJIT

Background¶

Design of ZJIT¶

Current Status¶

Merging Logistics¶

Updated by matz (Yukihiro Matsumoto) 4 months ago

Updated by maximecb (Maxime Chevalier-Boisvert) 4 months ago

Updated by Eregon (Benoit Daloze) 4 months ago · Edited

Updated by maximecb (Maxime Chevalier-Boisvert) 4 months ago · Edited

Updated by tekknolagi (Maxwell Bernstein) 3 months ago

Updated by Eregon (Benoit Daloze) 3 months ago

Updated by tekknolagi (Maxwell Bernstein) 3 months ago

Updated by hsbt (Hiroshi SHIBATA) 3 months ago