Project

General

Profile

Bug #9607

Updated by ko1 (Koichi Sasada) over 10 years ago

Abstract 
 ======== 

 Generational GC (called RGenGC) was introduced from Ruby 2.1.0. RGenGC  
 reduces marking time dramatically (about x10 faster). However, RGenGC  
 introduce huge memory consumption. This problem has impact especially  
 for small memory machines. 

 Ruby 2.1.1 introduced new environment variable  
 RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR to control full GC timing. However,  
 this solution is not solve problem completely. 

 To solve this issue, we modify **Full GC timing strategy**: 
 (1) Always invoke full GC before extending the heap. 
 (2) Increase the heap if not enough old-objects space. 
 This modification introduces a bit slow down, but reduce memory  
 consumption. 

 Background and problem 
 ====================== 

 RGenGC algorithm 
 ---------------- 

 Ruby 2.0 and earlier versions uses simple mark and sweep. Long marking  
 time had been an big issue. To solve this issue, Ruby 2.1.0 introduced 
 new generational GC called RGenGC (restricted generational GC). 

 RGenGC algorithm enables to introduce partial marking (called `minor GC'),  
 which marks only newer created objects, and skips marking fof old  
 objects (*1). Sometime, this marks all objects (called `major GC' or  
 `full GC'). Many minor GC and small number of major GC makes GC faster. 

 (*1) RGenGC doesn't skip sweeping for old-objects. This is another issue. 


 Full GC timing 
 -------------- 

 There is a question: "When should we invoke invoke full GC?". 

 Usually, generational GC uses the strategy that "when a space for old  
 objects is full, then invoke full GC". 

 Ruby 2.1.0 defines the size of old space for old objects with  
 `old_object_limit' and old_object_limit is doubled by the old objects  
 number (`old_object_count') at the last full GC.  

 Before the GC, we determine minor or major by comparing  
 `old_object_limit' and current old objects number (`old_object_count')  
 if    we compare current old object number and old_object_limit, and do  
 full GC if old_object_count > old_object_limit. 

 Here is a pseudo code of RGenGC: 

     def gc 
       if old_object_count > old_object_limit 
         major_gc = false 
         minor_mark() 
       else 
         major_gc = true 
         major_mark() 
       end 
       sweep() # Actually it is lazy sweep. 
      
       # double `old_object_count' here when it is major GC 
       old_object_limit = old_object_count * 2 if major_gc 
     end 
    
 This strategy works fine for memory rich machines, because only a few  
 full GCs are invoked. 

 However, this strategy causes more and more memory consumption. 

 Fig.1 is a result of (modified) discourse benchmark (Thanks Sam  
 Saffron!!). X-axis is GC count and Y-axis represents a number of slots  
 (objects). `total_slots' is avaialbe slots to use, `old_object' is  
 old_object_count. 

 [Fig1. Usage of slots on Ruby 2.2 dev](https://bugs.ruby-lang.org/attachments/download/4278/ruby2_2.JPG) dev](ruby2_2.JPG) 

 As you can see, old_object_limit is too high and total_slots are  
 expanded (x1.8, specified by GC_HEAP_GROWTH_FACTOR) before full GC. 


 Full GC timing tuning from Ruby 2.1.1 
 ------------------------------------- 

 To solve this issue, Ruby 2.1.1 introduced an environment variable  
 "RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR" (use `old_object_limit_factor' for  
 short). 

 This variable control how to extend `old_object_limit'. 

 In pseudo code, we changed from 

     # double `old_object_count' here when it is major GC 
     old_object_limit = old_object_count * 2 if major_gc 

 to 

     # double `old_object_count' here when it is major GC 
     old_object_limit = old_object_count * old_object_limit_factor if major_gc 

 The default value of this environment variable is 2. So it is same  
 behavior on default. 

 With RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=1.3, the benchmark result is  
 Fig.2. 

 [Fig2. Usage of slots on Ruby 2.2 dev w/ old_limit_factor=1.3](https://bugs.ruby-lang.org/attachments/download/4279/ruby_2_2_factor_1_3.JPG) old_limit_factor=1.3](ruby_2_2_factor_1_3.JPG) 

 We can observe that the total slots doesn't grow than the default  
 behavior. 

 Try this environment variable if you have trouble with memory usage. 

 Note that if you want to disable generational garbage collection, you  
 can specify 0.9 (any number lesser than 1.0) for  
 RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR.    With this technique, on every GC  
 "old_object_count > old_object_limit" is true and do major GC. 

 BTW, this variable should be noted on NEWS file. I missed to add it. 


 More intelligent approach 
 ------------------------- 

 "RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR" with small number can solve this  
 issue, but we need to specify correct value for each application. It is  
 tough work for us. 


 Proposal 
 ======== 

 With these graphes, we find two insights. 

 (1) We need to invoke full GC becore expanding heaps.    If we invoke full  
 GC, it is possible to stop expanding heaps. 
 (2) Increasing speed of old objects is completely slow. 

 To invoke full GC before expanding, we set a upper bount for  
 old_object_limit as "total_slots * 0.7". This value is same as the  
 threshold to determin expanding heaps or not. 

 After full GC, it is possible that "old_object_count > old_object_limit"  
 is true, but only a few differences. This situation causes many of full  
 GC. To avoid such situation, we add a few slots if "old_object_limit *  
 0.7 < old_object_count). In this case, "old_object_limit * 0.7" is a  
 minimum space for old objects. 

 In pseudo code: 

     def gc 
       if old_object_count > old_object_limit 
         major_gc = false 
         minor_mark() 
       else 
         major_gc = true 
         major_mark() 
       end 
      
       sweep() # Actually it is lazy sweep. 
      
       if major_gc 
         if total_slots * 0.7 < using_slots 
           # not enough space 
           extend_heap(total_slots * (1.8 - 1)) # 1.8 is growth_factor 
         elsif old_object_limit * 0.7 < old_object_count 
           # not enough old object count 
           extend_heap(old_object_count - object_limit * 0.7) 
         end 
       else 
         do_major_gc_at_next_gc = true 
       end 
      
       if major_gc 
         a = old_object_count * old_object_limit_factor 
         b = total_slots * 0.7 
         old_object_limit = [a, b].min 
       end 
     end 

 With this proposal, we can reduce total_slots consumption (Fig3, Fig4). 

 [Fig3. Usage of slots on proposal strategy](https://bugs.ruby-lang.org/attachments/download/4280/proposed.JPG) strategy](proposed.JPG) 

 [Fig4. Usage of slots on proposal strategy w/ old_limit_factor=1.3](https://bugs.ruby-lang.org/attachments/download/4281/proposed_factor_1_3.JPG) old_limit_factor=1.3](proposed_factor_1_3.JPG) 

 However, more and more GC invoking time. It is trade-off because  
 reducing total_slots introduces more frequent GC.    We can solve this  
 issue by making condition parameter 0.7 as tunable. 


 Future work 
 =========== 

 (1) Promotion strategy 

 Current growing speed of old object number is too high. So we need to  
 consider about promotion strategy. Current strategy is "promote young  
 objects when young objects survive one garbage collection". We already  
 implemented "RGENGC_THREEGEN" mode, which enable to filter unexpected  
 promotion. 

 NOTE: THREEGEN = 3gen is strange name because generation is only two. We  
 will change this mode name to AGE2PROMOTION and so on. 

 (2) Partial sweep 

 We successed to use partial marking on minor GC. However, everytime  
 sweep all available slots. Sweeping time is not so big, but there is a  
 space to optimize it. 

 (3) Incremental major GC 

 With this proposal, we increase major GC count. To avoid long major GC  
 pausing time, we need to implement incremental marking on full GC. 

Back