Feature #16648: improve GC performance by 5% with builtin_prefetch - Ruby - Ruby Issue Tracking System

Actions

Copy link

Feature #16648

open

improve GC performance by 5% with builtin_prefetch

Added by bpowers (Bobby Powers) over 5 years ago. Updated over 5 years ago.

Status:

Open

Assignee:

Target version:

[ruby-core:97240]

Description

The mark phase of non-incremental major GC is (I believe) dominated by pointer chasing. One way we can improve that is by prefetching cachelines from memory before they are accessed, to reduce stalls. I did some experimenting, and the following patch reduces the time spent on a full GC from ~ 950 milliseconds to ~ 900 milliseconds, a small but stable improvement. I would love if additional folks have other benchmarks (or could point me at them) to see if this holds up beyond the web service I tested, and whether something like this could be considered for inclusion.

I also attempted a more "principled" approach based on an optimization described in the GC handbook: putting a FIFO queue in front of the mark stack, and prefetching addresses as they enter the queue. However, I wasn't able to see any performance improvement there despite testing a number of queue sizes from 4 to 64. Its possible I implemented this wrong, or misjudged the access patterns (if e.g. the memory of a VALUE is accessed before push_mark_stack is called, it would invalidate this approach). The code for that alternative is here: https://github.com/bpowers/ruby/commit/d790d0c15047c36c23850a112093fe0e32fd3262

Files

0001-gc-prefech-objects-seems-to-improve-full-GC-performa.patch (2.29 KB) 0001-gc-prefech-objects-seems-to-improve-full-GC-performa.patch

bpowers (Bobby Powers), 02/22/2020 05:53 PM

Actions

Copy link

Also available in: Atom PDF

Like0

Like0Like0

Project

General

Profile

Ruby

Tags

Custom queries

Feature #16648

improve GC performance by 5% with builtin_prefetch

Updated by alanwu (Alan Wu) over 5 years ago

Updated by bpowers (Bobby Powers) over 5 years ago