Project

General

Profile

Actions

Feature #20860

open

Merge Optional Experimental Feature MMTk into Ruby

Added by peterzhu2118 (Peter Zhu) 14 days ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:119676]

Description

GitHub PR: https://github.com/ruby/ruby/pull/11979

Summary

In this ticket, we're proposing upstreaming the current MMTk implementation into the ruby/mmtk repository. This repository will be mirrored into ruby/ruby and adds the files in gc/mmtk.c, gc/mmtk.h, and the Rust implementation in gc/mmtk.

The current MMTk implementation uses the GC API and implements the NoGC and mark-sweep algorithms.

The current implementation is, in many cases, slower than Ruby's default GC, but we have concrete steps to improve performance, which is discussed in the Next Steps section.

Background

In [Feature #20351] we introduced a mechanism to plug an external garbage collector into Ruby using a dynamic shared library and in [Feature #20470] we introduced an API for third-party garbage collectors to plug into Ruby. Using this API, we were able to demonstrate that we can plug NoGC (a GC that allocates but never collects) and a modified version of Ruby's garbage collector into Ruby.

For the past few months, we've been implementing plugging MMTk into Ruby using this API.

What's MMTk?

MMTk is a framework that provides a wide variety of garbage collector implementations. Once a language integrates into their API, the language can use a wide variety of GC implementations, from basic algorithms such as mark-sweep (similar to Ruby's current GC), to more complex algorithms such as Immix its variants.

Kunshan Wang is a researcher at Australian National University and he has been working on implementing Ruby with MMTk. His work is available as a fork of Ruby at mmtk/ruby with Rust bindings at mmtk/mmtk-ruby.

Implementation

Overview

We've taken Kunshan's implementation and rewritten it using the GC API. Compared to Kunshan's original implementation, this makes the changes minimally invasive as it removes the need for MMTk specific code inside Ruby. Instead, all of the MMTk code lives inside of gc/mmtk.c, gc/mmtk.h, and the Rust code inside of the gc/mmtk directory.

However, compared to Kunshan's implementation, we only support a subset of features and have inferior performance. Most notably, we currently only support NoGC (a GC that only allocates but never collects) and mark-sweep (which is similar to Ruby's current GC). We do not support the advanced copying GC that Kunshan's implementation supports, such as Immix.

Using This Feature

To use this feature, follow these steps:

  1. Configure Ruby with --with-shared-gc=. (you can also change the directory you want to place the GC libraries).
    You should see with shared GC: yes in the configuration summary.
  2. Build MMTk and the Rust binding by running cargo build or cargo build --release in the gc/mmtk directory to build the debug and release versions, respectively.
    This will generate the gc/mmtk/target/debug/libmmtk_ruby.a or gc/mmtk/target/release/libmmtk_ruby.a file for MMTk and the Rust binding.
  3. Run make shared-gc SHARED_GC=mmtk to build the GC library.
    This will generate the librubygc.mmtk.so (on Linux) or librubygc.mmtk.dylib (on macOS) file in the directory that you specified with --with-shared-gc.
  4. Run Ruby with RUBY_GC_LIBRARY=mmtk environment variable to use MMTk.
    On debug builds of MMTk, you should see logging output, such as Initialized MMTk with MarkSweep. You can turn this output off by setting RUST_LOG= (empty value) environment variable.

You can also customize MMTk at runtime with the following environment variables:

  • MMTK_PLAN=<NoGC|MarkSweep>: Configures the GC algorithm used by MMTk. Defaults to MarkSweep.
  • MMTK_HEAP_MODE=<fixed|dynamic>: Configures the MMTk heap used. fixed is a fixed size heap, dynamic is a dynamic sized heap that will grow and shrink in size based on heuristics using the MemBalancer algorithm. Defaults to dynamic.
  • MMTK_HEAP_MIN=<size>: Configures the lower bound in heap memory usage by MMTk. Only valid when MMTK_HEAP_MODE=dynamic. size is in bytes, but you can also append KiB, MiB, GiB for larger sizes. Defaults to 1MiB.
  • MMTK_HEAP_MAX=<size>: Configures the upper bound in heap memory usage by MMTk. Once this limit is reached and no objects can be garbage collected, it will crash with an out-of-memory. size is in bytes, but you can also append KiB, MiB, GiB for larger sizes. Defaults to 80% of your system RAM.

Code Organization

The code is organized into two parts: the C binding and the Rust binding.

The C binding lives in gc/mmtk.c. It implements the GC API that Ruby communicates with and performs Ruby-level operations such as stopping and starting Ractors before and after a GC, marking objects, and freeing objects.

The Rust binding lives in the gc/mmtk directory. It calls the APIs provided by mmtk-core to allocate objects, and also implements traits (including callbacks) required by mmtk-core for stopping/resuming Ractors, scanning roots, and scanning object fields.

mmtk-core is included as a dependency of the Rust binding and contains language agnostic implementations of various garbage collectors.

At compile time, the Rust binding is statically linked to the C binding to form a shared object that can be dynamically loaded by Ruby.

Why We Are Proposing to Upstream This Feature

At this point, we have fully functional implementations of NoGC and mark-sweep algorithms. While we still have a long way to go to improve performance and implement more advanced algorithms (discussed in the Next Steps section), we would like to upstream this to improve collaboration with the Ruby core team and the Ruby community.

This proposal is for an experimental feature and will not be enabled by default. For users that want to try out this feature, they will have to compile Ruby with the shared GC feature enabled, compile the Rust bindings, and compile the MMTk shared GC.

Additionally, we do not ever anticipate replacing Ruby's default GC with MMTk but instead offer it as an alternative implementation. Ruby's default GC will always be the default GC due to its lack of external dependencies, versatility, and ease of use. As such, similar to YJIT, we will not be introducing a dependency on Rust for normal builds.

Benchmarks and Analysis

We ran yjit-bench (commit 1b298fa) on a Ubuntu 24.04 machine with an Intel Core Ultra 7 155H. Here are the benchmark results:

--------------  -----------  ----------  ---------  ---------  ----------  ---------  ------------  -----------
bench           master (ms)  stddev (%)  RSS (MiB)  mmtk (ms)  stddev (%)  RSS (MiB)  mmtk 1st itr  master/mmtk
activerecord    451.4        0.1         60.8       4667.3     0.9         69.6       0.10          0.10       
chunky-png      1151.1       0.3         41.4       1416.1     0.4         29.4       0.83          0.81       
erubi-rails     1940.7       0.2         105.6      15892.9    3.4         326.3      0.14          0.12       
hexapdf         3570.4       1.1         119.2      7552.2     3.6         159.7      0.37          0.47       
liquid-c        87.7         0.7         26.1       247.8      2.3         33.8       0.35          0.35       
liquid-compile  86.1         3.1         27.0       192.5      14.2        34.6       0.31          0.45       
liquid-render   217.5        0.3         25.8       428.7      1.3         34.4       0.50          0.51       
lobsters        1662.3       1.1         268.1      2455.5     2.6         282.3      0.27          0.68       
mail            194.2        0.6         48.5       585.8      2.3         58.7       0.37          0.33       
psych-load      2976.9       0.0         24.8       12646.1    1.7         33.6       0.23          0.24       
railsbench      4124.3       0.3         96.9       20606.6    0.5         171.5      0.22          0.20
rubocop         245.3        1.6         81.7       474.7      7.1         92.9       0.27          0.52       
ruby-lsp        234.5        0.3         61.5       1337.8     2.0         62.7       0.20          0.18       
sequel          98.3         0.6         30.0       376.7      1.7         44.4       0.26          0.26       
--------------  -----------  ----------  ---------  ---------  ----------  ---------  ------------  -----------

The performance geometric mean is 0.28, so it is almost 4x slower than the default GC.

We analyzed the railsbench benchmark, and we found clear reasons why it's slower:

  1. The default GC runs 823 GC times for the benchmark, while MMTk runs 10111 times, which is over 10x as much.
  2. The default GC runs only 11 major GC runs, and the rest (over 800) are minor GC. Since the MMTk implementation is not generational, every GC run is a major GC.
  3. As a result, the default GC spends 3397ms in GC while MMTk spends 141386ms (which is 41x the time compared to the default GC).
  4. Running a profile (attached screenshot below), we can see that the most of the time only a single worker thread is performing work. Since parallelism wasn't a priority in this phase of the project, there's significant improvement opportunities there.

These are some of the performance bottlenecks that we have identified and we have concrete steps on improving this in the section below.

Next Steps

Our current roadmap looks like the following:

  • Support a faster non-moving collector such as non-moving Immix.
  • Improve parallelism in the GC cycle so that it is faster.
  • Implement copying GC such as Immix.
  • Implement generational garbage collectors for better performance.
  • Improve Ruby's data structures (such as object shapes, arrays, and strings) to take advantage of MMTk's ability to allocate dynamic and larger object sizes.

Files

Screenshot 2024-10-25 at 3.07.01 PM.png (528 KB) Screenshot 2024-10-25 at 3.07.01 PM.png peterzhu2118 (Peter Zhu), 11/01/2024 04:10 PM

No data to display

Actions

Also available in: Atom PDF

Like1