Project

General

Profile

Feature #19729

Updated by eightbitraptor (Matthew Valentine-House) 11 months ago

## Github PR 

 [Store object age in a bitmap #7938](https://github.com/ruby/ruby/pull/7938) 

 ## Abstract 

 Ruby currently uses 2 bits of the flags in every object to track how many GC 
 events the object has survived. Objects are created at age 0 and can currently 
 grow to age 3, at which point they are considered "old". 

 Similar to the work carried out for [bitmap marking in Ruby 
 2.0](https://bugs.ruby-lang.org/issues/5839) which moved the mark bit from the 
 flags to a bitmap attached to a heap page; This PR moves the age bits out of the 
 object flags and stores them in a bitmap. 

 ## Description 

 This PR creates a new bitmap `age_bits` on each heap page. the size of the 
 bitmap is controlled by `HEAP_PAGE_BITMAP_LIMIT`, which roughly indicates how 
 many objects need to be considered by that bitmap, and `RVALUE_AGE_BITS_SIZE` 
 which varies how many bits we use per object to store the age.  

 We also introduce functions `RVALUE_AGE_SET`, `RVALUE_AGE_GET`, 
 `RVALUE_AGE_RESET` and `RVALUE_AGE_INC`    to manipulate the age of an object 
 pointed to by a VALUE. 

 ## Impact 

 **Benefits:** 

 * Improved CoW performance, GC will modify the object header fewer times for  
   fewer objects. 
 * Allow configuration of the age at which objects are considered old. Because 
   the number of bits used is configurable, we can support arbitrary numbers of 
   GC events before an object is considered old. We can use this to change 
   major/minor GC timings for workloads with unusually high/low object churn. 
 * Free a flag in each object that can be repurposed. Object flags are a precious 
   resource, we should prefer to store data outside the flags where possible. 
 * Remove GC related concerns from the object structure. This is important for 
   initiatives like a generic GC interface and MMTk in order to keep GC related 
   code as isolated as possible. 

 **Concerns:** 

 * Slightly increased RSS, because we now allocate an extra 2 bits per heap page 
   slot, to create the age_bits bitmap on VM bootup. On my machine  
   `sizeof(struct heap_page)` has increased from 1312 to 1728 bytes. 

 ## Benchmarking 

 Railsbench and Optcarrot benchmarked using `yjit-bench`. Showing small memory 
 increase, but comparable performance. 

 ``` 
 master: ruby 3.3.0dev (2023-06-08T11:22:43Z master 3fe09eba9d) [x86_64-linux] 
 mvh-rvalue-age-bitmap: ruby 3.3.0dev (2023-06-13T06:59:40Z master c74f42a4fb) [x86_64-linux] 

 ----------    -----------    ----------    ---------    --------------------------    ----------    ---------    -----------------------------    ---------------------------- 
 bench         master (ms)    stddev (%)    RSS (MiB)    mvh-rvalue-age-bitmap (ms)    stddev (%)    RSS (MiB)    mvh-rvalue-age-bitmap 1st itr    master/mvh-rvalue-age-bitmap 
 railsbench    2154.3         0.5           101.4        2124.9                        0.5           101.5        1.01                             1.01 
 optcarrot     5372.2         0.6           55.0         5282.3                        0.6           55.1         1.02                             1.02 
 ----------    -----------    ----------    ---------    --------------------------    ----------    ---------    -----------------------------    ---------------------------- 
 Legend: 
 - mvh-rvalue-age-bitmap 1st itr: ratio of master/mvh-rvalue-age-bitmap time for the first benchmarking iteration. 
 - master/mvh-rvalue-age-bitmap: ratio of master/mvh-rvalue-age-bitmap time. Higher is better for mvh-rvalue-age-bitmap. Above 1 represents a speedup. 
 ``` 

 ## Notes 

 **FL_PROMOTED** 

 We still use one of the two age bits. The original `FL_PROMOTED0` has been 
 renamed to `FL_PROMOTED` and has been repurposed to indicate an objects 
 old/young status. 

 We need this because correctly tracking references from old to young objects 
 relies on a write barrier, that's triggered whenever a field on an object is 
 written to. Because this code is a very hot path it needs to be fast. Looking up 
 the heap page and then calculating the old/young status based on the age bits 
 would slow down this part of the code too much. 

 Instead we set `FL_PROMOTED` whenever the object crosses the threshold into the 
 old gen, and unset it if that object ever gets demoted back into the young gen. 
 This way the write barrier can quickly tell whether the object is old or not and 
 whether to add it to the rememberset. 

 **rb_age_reset** 

 I expose a function from gc.c to reset the object age. This is only used from 
 one place: `Init_VM` in `vm.c`. We create a hidden class during boot for the 
 VM's "Frozen Core". This is created using `rb_class_new` as a T_CLASS, then it 
 has it's class path set, and then the flags are overwritten, forcing the type to 
 T_ICLASS. 

 Classes are allocated with age 2, and `rb_set_class_path` allocates, which may 
 trigger GC. If this happens then the class will immediately become old, and have 
 its `FL_PROMOTED` bit set. This will then be immediately wiped over when the 
 flags are forced to T_ICLASS. 

 This will result in the `FL_PROMOTED` bit and the object age being out of sync. 

 I don't know why this code is using `rb_class_new` rather than 
 `rb_include_class_new` but I will follow up this PR with an investigation. 

Back