Project

General

Profile

Feature #10137

Updated by mrkn (Kenta Murata) over 9 years ago

# Abstract 

 Introduce *incremental GC* algorithm to reduce pause time of major/full  
 GC. This ticket includes design and implementation note and a working  
 patch. 

 # Background and problem 

 Ruby 2.1 uses generational GC algorithm (named RGenGC) to improve GC  
 throughput. Genrational GC algorithm separates existing objects into  
 young generation objects and old generation objects. At the most of GC  
 timing, GC only marks young generation objects (called minor GC). If  
 there are no enough memory, then mark all of objects (called major GC or  
 full GC). Minor GC is dramatically fast than major GC. So that total  
 throughput of application improves (10% improvement in my RDoc benchmark,  
 [ruby-list:49896] reported that GC intensive application is 6 times  
 faster!). 

 However, major GC is needed periodically and it pauses same time as GC  
 on Ruby 2.0 and before. This problem hits htis response time intensive  
 application such as web application. 

 # Proposal 

 Introduce *Incremental GC* algorithm for major GC. 

 Incremental GC algorithm is well-known GC algorithm to reduce GC pause  
 time. Scatter GC (marking) process in several phases and run processes  
 interleaving with Ruby's execution process. This is similar to current  
 lazy sweep algorithm (in fact, lazy sweep is a half part of incremental  
 GC algorithm). 

 Running ruby process with marking phase, it is possible to introduce  
 critical bug because marked objects can points un-marked objects (on the  
 incremental GC terminology, marked objects are called "Black" objects  
 and un-marked objects are called "White" objects). Such un-marked  
 objects can be left in un-marking and be swept.  

 ``` 
 # if `ary' is already marked, and obj is not marked 
 ary[0] = obj 
 obj = nil # no-body except `ary' refers `obj' 
 ``` 

 To prevent such destructive bug, we can use write barriers to detect  
 such "marked objects" to "un-marked objects". We can care about such  
 case. 

 Yes, MRI/CRuby has "WB-unprotected" objects such objects does not have  
 write barriers because of compatibility or implementation issues. To  
 care about such WB-unprotected objects, we need to traverse all of  
 living WB-unprotected objects at a time in the last of incremental  
 marking. This is new extending idea against traditional incremental GC  
 algorithm (at least I surveyed). 

 Deisgn and implementation details are here: 
 http://www.atdot.net/fp_store/f.p61can/file.data-incremental-gc.pdf 

 Maybe a diagram at page 10 will help you to understand the flow of all  
 GC process. 

 Code is here: 
 https://github.com/ko1/ruby/tree/rincgc 

 Compare with trunk: 
 https://github.com/ko1/ruby/compare/rincgc 

 # Implementation note 

 ## WB-unprotected bitmap 

 As I said, we need to check all of living WB-unprotected objects at the  
 last of incremental marking phase. To do it lightweight, introduce  
 WB-unprotected bitmap intead of specific bit in RBasic::flags. 

 We can get all living WB-unprotected objects with the following pseudo  
 code: 

 ``` 
 bits = mark_bits[i] & wb_unprotected_bits[i]; 
 while (bits) { 
   if (bits & 1) do something. 
   bits >>= 1; 
 } 
 ``` 

 ## 4 age promotion 

 Because we don't need to use WB-protected bit in RBasic::flags, we have  
 another 1 bit in RBasic::flags. To utilize this bit, we add *age* of an  
 object with exsiting promoted bit. Rename FL_WB_PROTECTED to  
 FL_PROMOTED0 and FL_PROMOTED to FL_PROMOTED1. 

 These two bits represent object's age (0 to 3) and 3 means OLD objects. 

 ## Write barriers 

 We extend write barriers to detect such [marked objects -> un-marked  
 objects] reference in incremental GC. It introduces some overhead. 

 # Evaluation 

 Benchmark results on real Linux machine (*1) 
 http://www.atdot.net/sp/raw/yekban 
 *1: Intel(R) Xeon(R) CPU E5335 @ 2.00GHz, 4GB memory, 2.6.32-5-amd64 (Debian squeeze) 

 In most of case, there are only a few (~5%) performance down.  
 Incremental GC introduces some overhead.    But I think they are  
 acceptable speed-down. 

 Discouse benchmark (only on Virtualbox VM, so accuracy is not good) 
 http://www.atdot.net/sp/raw/g9uban 

 We can recognize reducing worst case seconds. 

 # TODO 

 (1) Clean up codes 

 Now, we can not disable incremental GC codes and generational GC codes.  
 We need to add ability to enable/disable features with macros. 

 (2) Tuning parameters 

 Now the parameters are fixed in codes. mruby already have tuning  
 parameters for incremental GC (matz said they are from Lua),  
 "GC.interval_ratio" and "GC.step_ratio". We can import these functions  
 (or making another interface to tell). 

 (3) Enter GC/ Exit GC internal events 

 This patch also includes function "gc_enter()" and "gc_exit()" which set  
 and reset a "doing GC" flag. 

 If we introduce internal event to hook these functions, we can measure  
 exact GC pause time (and mutators time). 

 # Summary 

 This feature proposal is to introduce incremental GC algorithm with working code. 
 Incremental GC algorithm reduce application's pause time of major GC. 

 Any feedback are welcome! 

 Thanks, 
 Koichi 

Back