Project

General

Profile

Actions

Feature #22121

open

Introduce Parallel Sweep feature

Feature #22121: Introduce Parallel Sweep feature
1

Added by luke-gru (Luke Gruber) 2 days ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:125796]

Description

Abstract

Ruby's GC sweep implementation is currently incremental and lazy. This is to reduce pause times when sweeping. However, the sweep implementation doesn't take advantage of parallelism (multi-core). Sweeping in a GC is not an "embarassingly parallel" problem, but extra threads can help. I would like to introduce such a feature to Ruby so that users can take advantage of their multi-core CPUs to reduce GC pause times.

Design

I would like to have an additional "sweep thread" that sweeps alongside the Ruby GC thread (in parallel) and at the same time as Ruby code is running (concurrent). When the sweep thread is working alongside the Ruby GC thread, both threads grab pages from the current heap. If the sweep thread has already swept a page, the Ruby GC thread finishes it by clearing its bitmaps and adding the page to the free pages or empty pages lists. If there are no swept pages from the sweep thread, the Ruby GC thread sweeps a page by itself instead of waiting for swept pages.

When an incremental sweep step is over, the sweep thread sweeps 1 incremental step's budget worth of slots while Ruby code is running. This is so that during the next incremental step, the Ruby GC thread just has to finish the pages off instead of sweeping the objects.

Limitations

Certain objects aren't safe to be freed by the sweep thread. T_DATA types from native extensions cannot be swept in general (although most are safe) because the user's sweep function may not be thread-safe. That function may modify global state in such a way that when it is called from both the sweep thread AND the Ruby GC thread at once, it behaves badly. In order to get around this, we introduce a new TypedData flag. T_DATA internal to the VM mostly have this flag set, and native extension authors can set this flag if it is defined to allow their type to be swept concurrently.

This feature is only available for the default GC. Ruby's MMTK garbage collector has its own implementation of concurrent sweeping that is not affected by this feature.

This feature is only available for pthread platforms, although that restriction could be lifted with a bit of work.

This feature is not on by default (see section Building).

Implementation

The PR is currently a draft, but it's in a working state. Please play with it and tell me what you think!

Building

./configure --enable-parallel-sweep
make -j
./ruby --enable-parallel-sweep -v
#=> ruby 4.1.0dev (2026-06-19T15:49:17Z parallel-sweep cd7e59d45b) +PRISM +Parallel-Sweep [arm64-darwin25]

Benchmarks

This image below is from running ruby-bench --headline benchmarks 10 times and taking the median of means of the runs.
The ruby-bench command was:

./run_benchmarks.rb --no-pinning --interleave --chruby="ruby_ctl::ruby_ctl --yjit" --chruby="psweep_lockfree::psweep_lockfree --yjit --enable-parallel-sweep" --headline

If you want sweep info per run, you can run it with the GC harness:

./run_benchmarks.rb --no-pinning --interleave --chruby="ruby_ctl::ruby_ctl --yjit" --chruby="psweep_lockfree::psweep_lockfree --yjit --enable-parallel-sweep" --harness=gc --headline


I encourage others to experiment with this feature, run your own benchmarks and share them here. When running your benchmarks, make sure not to pin the process to a CPU with taskset, otherwise the sweep thread will not run in parallel.

注記

Sometimes there are regressions run to run. Run the benchmark 5-10 times to get more accurate results.
GC micro-benchmarks may not see any improvements or even slight regressions. The implementation targets a workload where there is sufficient time in between sweep steps for the sweep thread to do its work.

警告

Benchmarking on MacOS tends to be less accurate than Linux. However, if that's all you have and want to share your results, please do.

Future Work

In the future, I would like it if we could process the metadata of a page in the sweep thread. For example, if we could clear the bitmaps and unlink an empty page, or even add a page to the free pages list then the Ruby pause time would get even smaller (or none at all). I did have a prototype of this, but it caused issues because it was creating too many empty pages too fast, and that had unintended consequences for the rest of the GC.

I also believe parallel marking would have a large benefit. Major GCs take a long time due to marking, and if you have lots of threads of fibers than it takes even longer. Marking is a more naturally parallel problem, so would likely benefit from more than 1 worker thread.


Files

clipboard-202606191327-7umxq.png (342 KB) clipboard-202606191327-7umxq.png luke-gru (Luke Gruber), 06/19/2026 05:27 PM

No data to display

Actions

Also available in: PDF Atom