Project

General

Profile

Actions

Feature #15626

closed

Manual Compaction for MRI's GC (`GC.compact`)

Added by tenderlovemaking (Aaron Patterson) about 5 years ago. Updated almost 5 years ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:91634]

Description

Hi,

I've attached a patch that implements a compactor for MRI. I'll try to describe the patch in detail below, but the biggest user facing changes are:

  1. A new method GC.compact that can be called from Ruby
  2. C extensions have a new "after compaction" callback they can register
  3. Introduction of new marking functions that allow objects to be marked, but not "pinned" to the same location
  4. A function to get the new location of an object (to be used from the "after compaction" callback)

Compactor for RGenGC

This document describes the compactor implementation for RGenGC. This patch
adds a new method to Ruby, GC.compact which will compact the objects in the
heap. What follows is a description of the algorithm, along with achievements
and technical challenges.

Steps for Compaction

These are the high level steps taken in GC.compact are as follows:

  1. Full GC
  2. Move objects
  3. Update references
  4. Full GC

Step One, Full GC

Not all objects in Ruby's heap can move. In particular, C extensions may hold
references to VALUE pointers. If the VALUE pointers move, the C extension may
not be able to find them and will get a SEGV or possibly wrong data.

To prevent this, a new "pinning" table was introduced. Any object marked via
rb_gc_mark is "pinned" and is not allowed to move. C extensions must mark
their references in order to keep them alive. Since C extensions use
rb_gc_mark to mark a reference, we know not to move that reference. In order
to ensure all references get marked, we perform a full GC.

New functions, rb_gc_mark_no_pin, etc were introduced and used internally.
These functions keeps the object alive but without pinning the reference.

Summary: Perform a full GC so that objects marked with rb_gc_mark do not move.

Step Two, Move Objects

This compactor uses a "two finger" algorithm which was introduced in "The
Programming Language LISP: Its Operation and Applications" (page 228)1. Two
pointers point at each side of the heap, if one slot is empty, and the other is
moveable, it swaps places and leaves a T_MOVED object that contains a
forwarding address.

In Ruby, the algorithm looks something like this:

def compact
  heap = [ ... ] # many slots
  left  = 0
  right = heap.length - 1

  while left < right
    left_slot = heap[left]
    right_slot = heap[right]

    if is_empty?(left_slot) && !is_empty?(right_slot) && can_move?(right_slot)
      swap(left, right)
      heap[right] = T_MOVED.new(left) # leave forwarding address
    end

    while !is_empty?(heap[left])
      left += 1
    end

    while is_empty?(heap[right]) || !can_move?(heap[right])
      right -= 1
    end
  end
end

If we have a heap with 6 slots, before compaction it might look something like
this:

                     ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─        
                        Before Compaction   │       
                     └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─        
               ┌─────┬─────┬─────┬─────┬─────┬─────┐
 Slot Index    │  1  │  2  │  3  │  4  │  5  │  6  │
               ├─────┼─────┼─────┼─────┼─────┼─────┤
Slot Contents  │  A  │  B  │Empty│Empty│  C  │  D  │
               │     │     │     │     │     │     │
               └─────┴─────┴─────┴─────┴─────┴─────┘

After compaction, but before reference update, it will look like this:

                     ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─        
                         After Compaction   │       
                     └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─        
               ┌─────┬─────┬─────┬─────┬─────┬─────┐
 Slot Index    │  1  │  2  │  3  │  4  │  5  │  6  │
               ├─────┼─────┼─────┼─────┼─────┼─────┤
Slot Contents  │  A  │  B  │  D  │  C  │MOVED│MOVED│
               │     │     │     │     │TO 4 │TO 3 │
               └─────┴─────┴─────┴─────┴─────┴─────┘

Slots 5 and 6 contain T_MOVED objects that point to the new locations of D and
C objects.

Step Three, Update References

After objects have moved, references are updated. This is a matter of updating
any references to use the forwarding address rather than the original location.

For example, if object A has a reference to C, and B a reference to D:

                     ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─        
                        Before Compaction   │       
                     └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─        
               ┌─────┬─────┬─────┬─────┬─────┬─────┐
 Slot Index    │  1  │  2  │  3  │  4  │  5  │  6  │
               ├─────┼─────┼─────┼─────┼─────┼─────┤
Slot Contents  │  A  │  B  │Empty│Empty│  C  │  D  │
               │     │     │     │     │     │     │
               └─────┴─────┴─────┴─────┴─────┴─────┘
                  │     │                 ▲     ▲   
                  │     │                 │     │   
                  └─────┼─────────────────┘     │   
                        └───────────────────────┘   

After compaction, but before updating references, the heap will look like this:

                     ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─        
                         After Compaction   │       
                     └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─        
               ┌─────┬─────┬─────┬─────┬─────┬─────┐
 Slot Index    │  1  │  2  │  3  │  4  │  5  │  6  │
               ├─────┼─────┼─────┼─────┼─────┼─────┤
Slot Contents  │  A  │  B  │  D  │  C  │MOVED│MOVED│
               │     │     │     │     │TO 4 │TO 3 │
               └─────┴─────┴─────┴─────┴─────┴─────┘
                  │     │                 ▲     ▲   
                  │     │                 │     │   
                  └─────┼─────────────────┘     │   
                        └───────────────────────┘   

The update references step looks at each object in the heap, checks to see if it
references any moved slots, then updates the reference to use the new location.
In Ruby, it looks like this:

def update_references
  heap.each do |slot|
    next if is_empty?(slot) || is_moved?(slot)

    slot.references.each_with_index do |child, i|
      if is_moved?(child)
        slot.set_reference(i, child.new_location)
      end
    end
  end
end

After reference updates, the heap will look like this:

                     ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─        
                        After Ref Updates   │       
                     └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─        
               ┌─────┬─────┬─────┬─────┬─────┬─────┐
 Slot Index    │  1  │  2  │  3  │  4  │  5  │  6  │
               ├─────┼─────┼─────┼─────┼─────┼─────┤
Slot Contents  │  A  │  B  │  D  │  C  │MOVED│MOVED│
               │     │     │     │     │TO 4 │TO 3 │
               └─────┴─────┴─────┴─────┴─────┴─────┘
                  │     │     ▲     ▲               
                  │     │     │     │               
                  └─────┼─────┼─────┘               
                        └─────┘                     

Step Four, Full GC

After references have been updated, a full GC is performed. This converts the
T_MOVED slots back in to T_EMPTY slots. The GC automatically takes care of
this because no object should reference a T_MOVED slot.

Non Moving Objects

Non moving objects are:

  1. Objects on the stack
  2. Objects marked with rb_gc_mark
  3. Hash key objects

Objects on the stack

The stack should not be mutated, so these are pinned.

Objects marked with rb_gc_mark

As mentioned earlier, any object marked with rb_gc_mark may not move because a
C extension may hold a reference. Here is an excerpt from Yajl, a JSON parser.
This is the mark function it uses:

static void yajl_encoder_wrapper_mark(void * wrapper) {
    yajl_encoder_wrapper * w = wrapper;
    if (w) {
        rb_gc_mark(w->on_progress_callback);
        rb_gc_mark(w->terminator);
    }
}


obj = Data_Make_Struct(klass, yajl_encoder_wrapper, yajl_encoder_wrapper_mark, yajl_encoder_wrapper_free, wrapper);

The values in this mark function w->on_progress_callback and w->terminator
will not move since they are pinned by rb_gc_mark.

Hash key objects

Certain hash keys will not move. Most objects use their address as the hash
value. If the object moves, the hash value could change, so hash keys are not
allowed to move.

There is an exception for string objects. String objects calculate their hash
based on the bytes in the string. Since the bytes in the string do not change
after move, they are allowed to move. Most hash keys in large programs are
strings, so only allowing string keys to move seemed enough.

Special Considerations

Object ID

Object#object_id bases its value from the address of the object. For example:

x = Object.new
id = x.object_id
GC.compact        # `x` moves
id == x.object_id # `id` should be same as `x.object_id`

We expect the object id to remain the same regardless of compaction. To deal
with object_id, two hashes were introduced. One hash contains a map of the
object to the seen object id. The second map contains all seen object ids.
The maps are updated any time object_id is called, an object moves, or an
object is freed.

object_id implementation in Ruby:

$obj_to_id = {}
$id_to_obj = {}

class Object
  def object_id
    if $obj_to_id.key?(address)
      # Second call to object_id on this object
      return $obj_to_id[address]
    else
      # First call to object_id on this object

      addr = self.address

      loop do
        if $seen_ids.key?(addr)
          # Resolve conflict
          addr += sizeof(RVALUE)
        else
          $obj_to_id[self.address] = addr
          $id_to_obj[addr]         = self
          return addr
        end
      end
    end
  end

  private

  def address
    # returns address in memory of object
  end
end

During compaction:

def compact
  heap = [ ... ] # many slots

  while left < right
    dest_slot = heap[left]
    source_slot = heap[right]

    if moving?(source_slot)
      if $obj_to_id.key?(source_slot.address)
        id = $obj_to_id.delete(source_slot.address)
        $obj_to_id[dest_slot.address] = id
        $id_to_obj[id]                = dest_slot.address
      end

      # etc
    end
  end
end

During Free:

def free(obj)
  if $obj_to_id.key?(obj.address)
    $seen_ids.delete($obj_to_id.delete(obj.address))
  end
end

ObjectSpace._id2ref

When generating an object id, in the case of a collision, the address is
incremented by 40. This enables _id2ref to determine there is a VALUE pointer
and round trip the object correctly.

Compaction Support for C extensions

Any object marked with rb_gc_mark will be "pinned" and cannot move. C
extensions mark objects via rb_gc_mark, so this ensures that pointers in C
stay valid.

However, some C extensions may want to support reference movement.

Reference Movement in C extensions

In order to maintain backwards compatibility, if a C extension holds a reference
to a VALUE, the VALUE should not move. Going forward, C extensions can support
moving VALUEs. To support moving VALUEs, a C extension should change
rb_gc_mark to rb_gc_mark_no_pin, then implement a new callback that is
called during compaction that gives the extension an opportunity to update its
own references. A new function rb_gc_new_location will return the new
location for a particular VALUE.

Here is an example for autoload from variable.c. A new compaction callback is
registered, autoload_i_compact:

static const rb_data_type_t autoload_data_i_type = {
    "autoload_i",
    {autoload_i_mark, autoload_i_free, autoload_i_memsize, autoload_i_compact},
    0, 0, RUBY_TYPED_FREE_IMMEDIATELY
};

The mark function changes to mark references, but not pin them:

static void
autoload_i_mark(void *ptr)
{
    struct autoload_data_i *p = ptr;

    rb_gc_mark_no_pin(p->feature);

    /* allow GC to free us if no modules refer to this via autoload_const.ad */
    if (list_empty(&p->constants)) {
        rb_hash_delete(autoload_featuremap, p->feature);
    }
}

After the heap is compacted, any C extensions that registered a "compaction"
callback will be called and have a chance to update internal references. The
autoload compaction function is like this:

static void
autoload_i_compact(void *ptr)
{
    struct autoload_data_i *p = ptr;
    p->feature = rb_gc_new_location(p->feature);

The compaction callback asks for the new location for the p->feature
reference. rb_gc_new_location will either return the current value of
p->feature (in the case the VALUE did not move), or the new location.

Reference Verification and Testing

After compaction, objects that have moved are changed to T_MOVED objects with
forwarding addresses. Once all references are updated, no object should point
to a T_MOVED slot. We can say that before and after compaction, no object
should ever point at a T_MOVED slot. So if a T_MOVED object is ever pushed
on to the mark queue, there is a bug. push_mark_stack has a check for
T_MOVED objects, and will crash if a T_MOVED object is ever pushed on to the
mark stack.

A method GC.verify_compaction_references was added that doubles the available
heap size, then compacts the heap. Since the heap has doubled in size, any
object that can move will move. Any references to T_MOVED objects will be
caught during the GC phase after compaction.

Known Issue

Safety for C extensions depends entirely on C extensions marking their
references. If a C extension does not mark a reference, the compactor assumes
that the object is safe to move. This can cause an error in the following
situation, when there is an object graph like this:

 ┌───────────────┐       ┌───────────────┐ 
 │  Ruby Object  │       │  Ruby Object  │ 
 │  Implemented  │       │  Implemented  │ 
 │    in Ruby    │       │     in C      │ 
 │               │       │               │ 
 └───────────────┘       └───────────────┘ 
         │                       │         
         │                                 
         │                       │   NOT   
 Marked  │                          Marked 
         │                       │         
         │   ┌───────────────┐             
         │   │               │   │         
         │   │  Ruby Object  │             
         └──▶│               │◀─ ┘         
             │               │             
             └───────────────┘             

If the C extension contains a reference to an object, but expects the object
not to move because a Ruby object contains a reference, then the target Ruby
object may move and the reference in the C extension will be wrong. I like to
call these "ghost references" because the GC cannot see them but they will come
back to haunt you.

The solution to this is either:

  1. Change the C extension to rb_gc_mark the object
  2. Remove the reference from the C extension

One example of this situation was fixed here:

https://github.com/msgpack/msgpack-ruby/issues/133

Summary

Backwards compatibility with C extensions is maintained by "pinning" any
references marked with rb_gc_mark. A full mark must be done before
compacting to discover all pinned references.

A new callback for C extensions is introduced so that C extensions can support
compaction. C extensions that wish to support compaction must use the new
callback, rb_gc_mark_no_pin, and rb_gc_new_location.

C extensions that maintain pointers but do not mark those pointers may SEGV. We
can use GC.verify_compaction_references to discover these issues.

  1. See also "The Garbage Collection Handbook" page 32


Files

before_compact-0.png (111 KB) before_compact-0.png First request, before compaction tenderlovemaking (Aaron Patterson), 02/27/2019 10:26 PM
after_compact-0.png (80.6 KB) after_compact-0.png First request, after compaction tenderlovemaking (Aaron Patterson), 02/27/2019 10:26 PM
before_compact-1.png (81.2 KB) before_compact-1.png Second request, before compaction tenderlovemaking (Aaron Patterson), 02/27/2019 10:26 PM
after_compact-1.png (79.4 KB) after_compact-1.png Second request, after compaction tenderlovemaking (Aaron Patterson), 02/27/2019 10:26 PM
0001-GC-Compaction-for-MRI.patch (77.4 KB) 0001-GC-Compaction-for-MRI.patch tenderlovemaking (Aaron Patterson), 02/27/2019 11:43 PM
ruby_2019-03-22-171839_PikachuEXEs-MBP-2018.crash (71.4 KB) ruby_2019-03-22-171839_PikachuEXEs-MBP-2018.crash PikachuEXE (Pikachu EXE), 03/22/2019 09:20 AM
ruby_2019-03-22-171839-1_PikachuEXEs-MBP-2018.crash (71.4 KB) ruby_2019-03-22-171839-1_PikachuEXEs-MBP-2018.crash PikachuEXE (Pikachu EXE), 03/22/2019 09:20 AM

Updated by tenderlovemaking (Aaron Patterson) about 5 years ago

I've tested the compactor with a Rails application, and I want to share my results.

I added the middleware below:

# This file is used by Rack-based servers to start the application.

require_relative 'config/environment'
require 'objspace'

$COUNT = 0
class CompactHeap
  def initialize app
    @app = app
  end

  def call env
    x = @app.call env
    GC.start
    File.open("before_compact-#{$COUNT}.txt", 'w') do |f|
      ObjectSpace.dump_all(output: f)
    end
    GC.compact
    File.open("after_compact-#{$COUNT}.txt", 'w') do |f|
      ObjectSpace.dump_all(output: f)
    end
    $COUNT += 1
    x
  end
end

use CompactHeap
run Rails.application

The above middleware will:

  1. Run the web request
  2. Run the GC
  3. Dump the heap
  4. Compact the heap
  5. Dump the heap

This gives us an idea what the heap looks like after a request, but before and after compaction.
I've attached visualizations of the heap for two requests. Each visualization
is a PNG file that represents the heap. Each column of the image represents a
page in the heap. Columns are two pixels wide. Each square in the column
represents a slot. Slots are 2px by 2px squares. Red squares are slots that
have an unmovable (pinned) object. Black squares are slots that have a moveable
object. Other squares are empty slots.

The first image "before-compact-0.png" is before a compaction has ever
occurred. You can see there are many unfilled pages. Of the 582 pages, 98 are
full, and 484 have empty slots. After compaction "after-compact-0.png", there
are 576 pages, 411 full, and 165 have empty slots.

Subsequent requests have similar looking heaps to "after-compact-0" (as expected).

The program I used to generate these visualizations can be found here:

https://gist.githubusercontent.com/tenderlove/578cfb4ce677465d0a8ceb4362928a89/raw/41798db55154b235059f1414a8bb18cd955d1ace/heapviz.rb

I'll publish the Rails application I used to generate these graphs and add a link here. :)

Actions #2

Updated by tenderlovemaking (Aaron Patterson) about 5 years ago

  • File 0001-Compacting-GC-for-MRI.patch added

Adding updated patch.

Actions #3

Updated by tenderlovemaking (Aaron Patterson) about 5 years ago

  • File deleted (0001-Compacting-GC-for-MRI.patch)
Actions #4

Updated by tenderlovemaking (Aaron Patterson) about 5 years ago

  • File 0001-GC-Compaction-for-MRI.patch added
Actions #5

Updated by tenderlovemaking (Aaron Patterson) about 5 years ago

  • File deleted (0001-Compacting-GC-for-MRI.patch)
Actions #7

Updated by tenderlovemaking (Aaron Patterson) about 5 years ago

  • File deleted (0001-GC-Compaction-for-MRI.patch)

Updated by tad (Tadashi Saito) about 5 years ago

Really interesting, but what are the benefits for users who writes Ruby?

Did the Rails app get faster enough? Or you have some other motivations?

Updated by duerst (Martin Dürst) about 5 years ago

Agree, really interesting. Several comments/questions:

  • Why a full CG at the end? Wouldn't it be much cheaper to just collect the T_MOVED objects (these objects could contain an internal pointer each, chaining them together).

  • Hash key objects: You say most hash keys are strings, and they can be moved, but what about Symbols?

  • I had a look at the .png files. I see the distinction between red squares and other squares very well, but I can't see any distinction between movable and empty slots, everything that's not red is black.

Updated by ko1 (Koichi Sasada) about 5 years ago

Great post. I think there are no big changes we discussed before.

Points:

  • I'm not sure we can accept the "Known Issue". GC.verify_compaction_references can't discover this issue, right? It can introduce difficult bugs to remove... I have no good idea to discover it. "Time" (not time class, of course) can solve it, but...

  • Can we introduce it just after every major (full) gc?

Minor points:

  • Could you use gist (or something useful) to show your benchmark results with figures?

Updated by matz (Yukihiro Matsumoto) about 5 years ago

The idea of GC.compact sounds great. I understand the issue but it seems to be acceptable.

Matz.

Updated by naruse (Yui NARUSE) about 5 years ago

I want to introduce this feature in Ruby 2.7.0-preview1.
Therefore could you write an item for the release note?

The article should be like "JIT [Experimental]" in 2.6.0.
https://www.ruby-lang.org/en/news/2018/12/25/ruby-2-6-0-released/

The description should explain

  • What is the merit of this Compaction GC.
    (the audience of this sentence is News media)
  • The technical detail (appeal) of this feature.
    (the audience of this sentences is Ruby developer)
  • About the known issue and how to debug and workaround
    • If you hit SEGV, comment out GC.compact and try; if it works it's because of Compaction GC
    • check extension libraries and if you can identify it use rb_gc_mark or something
    • Add more debug mode for example just do compaction but not unmap the old page; set guard page or write garbage data.

Updated by nateberkopec (Nate Berkopec) about 5 years ago

Can we introduce it just after every major (full) gc?

I agree. If the necessary steps are Full GC -> Update -> Move -> Full GC, then it would be faster to just compact automatically after a full GC which is already occurring. I understand this could aggravate the "known issue" however by making compaction mandatory rather than optional.

Aaron, regarding the API: how do you envision this will be called "in the real world"? Unicorn::OOBGC, a similar project and API, was responsible for a lot of wasted CPU cycles because it ran after every request. Either you or Sam blogged about this at some point (I can't remember and can't find it), but that approach caused something like an extra 10% CPU utilization on a big Rails app.

This converts the T_MOVED slots back in to T_EMPTY slots. The GC automatically takes care of this because no object should reference a T_MOVED slot.

Is this strictly necessary? I think the fact that GC.compact requires 2 full GC cycles makes it rather expensive. What if we could just leave "T_MOVED" slots around until the next full GC cycle?

If we could take Koichi's suggestion and also "live with" T_MOVED slots living in the heap until the next full GC, then GC.compact does not add any full GC cycles to an application, and the only added time cost is the cost of updating and moving.

Actions #14

Updated by PikachuEXE (Pikachu EXE) about 5 years ago

nateberkopec (Nate Berkopec) wrote:

Can we introduce it just after every major (full) gc?

I agree. If the necessary steps are Full GC -> Update -> Move -> Full GC, then it would be faster to just compact automatically after a full GC which is already occurring. I understand this could aggravate the "known issue" however by making compaction mandatory rather than optional.

Aaron, regarding the API: how do you envision this will be called "in the real world"? Unicorn::OOBGC, a similar project and API, was responsible for a lot of wasted CPU cycles because it ran after every request. Either you or Sam blogged about this at some point (I can't remember and can't find it), but that approach caused something like an extra 10% CPU utilization on a big Rails app.

This converts the T_MOVED slots back in to T_EMPTY slots. The GC automatically takes care of this because no object should reference a T_MOVED slot.

Is this strictly necessary? I think the fact that GC.compact requires 2 full GC cycles makes it rather expensive. What if we could just leave "T_MOVED" slots around until the next full GC cycle?

If we could take Koichi's suggestion and also "live with" T_MOVED slots living in the heap until the next full GC, then GC.compact does not add any full GC cycles to an application, and the only added time cost is the cost of updating and moving.

As an experimental feature shouldn't it be disabled by default and only enabled via something like env variable?

But I do agree that given a full GC is always required, might as well run (2) and (3) after an existing full GC, leave (4) for next full GC

How do I write my own patch and test it myself?

Actions #15

Updated by headius (Charles Nutter) almost 5 years ago

We have already discussed in https://bugs.ruby-lang.org/issues/15408 the deprecation and eventual removal of _id2ref, so perhaps we can move these things together.

The additional hashing logic to maintain the Object-to-ID maps will add significant overhead to objects in any application using object_id for e.g. logging, debugging, or other possibly unsavory purposes.

Better to just move forward with EOL for _id2ref and go with just simple monotonic object IDs! :-D

https://bugs.ruby-lang.org/issues/15408

Updated by PikachuEXE (Pikachu EXE) almost 5 years ago

Made some changes for "auto compaction"
I am a newbie in C programming
This is just for discussion, not tested nor compiled yet

https://github.com/ruby/ruby/compare/trunk...PikachuEXE:feature/gc-compact/pika

Updated by PikachuEXE (Pikachu EXE) almost 5 years ago

To prevent this, a new "pinning" table was introduced. Any object marked via
rb_gc_mark is "pinned" and is not allowed to move. C extensions must mark
their references in order to keep them alive. Since C extensions use
rb_gc_mark to mark a reference, we know not to move that reference. In order
to ensure all references get marked, we perform a full GC.

I wrote a draft to allow myself to understand the GC flow a bit better
But I don't really understand which step contains code for "ensure all references get marked"

(Most of method calls that seems to be related to profiling logging are ignored)

## `rb_gc`
- `garbage_collect`
- `gc_finalize_deferred` (digged until finalize_list, something like killing zombies?)

## `garbage_collect`
- `gc_rest`
- `gc_start`

## `gc_rest`
- `gc_verify_internal_consistency` (when `RGENGC_CHECK_MODE >= 2`)
- `gc_marks_rest` (when `is_incremental_marking(objspace)`)
- `gc_sweep_rest` (when `is_lazy_sweeping(heap_eden)`)

## `gc_start`
- `gc_verify_internal_consistency` (when `RGENGC_CHECK_MODE >= 2`)
- `gc_prof_setup_new_record` (I skipped)
- `gc_reset_malloc_info` (I skipped)
- `rb_transient_heap_start_marking` (I skipped)
- `gc_marks` (main work)

## `gc_marks`
- `gc_marks_rest` (when `USE_RGENGC` and `!is_incremental_marking(objspace)`)
- `gc_marks_start` + `gc_marks_rest` (when not `USE_RGENGC`)

## `gc_marks_start`
- So many stuff in the middle I don't understand
- `gc_mark_roots` (I read but don't understand)

## `gc_marks_rest`
- `gc_mark_stacked_objects_incremental` + `gc_marks_finish` (if `is_incremental_marking(objspace)`)
- `gc_mark_stacked_objects_all` + `gc_marks_finish` (unless `is_incremental_marking(objspace)`)
- `gc_sweep`

## `gc_sweep`
- Auto Compaction (Added in my branch, when auto compaction allowed, not sure if this is the right place)
- `gc_sweep_start` + `gc_sweep_rest` (if `immediate_sweep`)
- `gc_sweep_start` + `gc_sweep_step` (unless `immediate_sweep`)
- `gc_heap_prepare_minimum_pages` (Prepare enough heap pages for ???)

## `gc_sweep_start`
- `gc_sweep_start_heap` with logging (I read but don't understand)

## `gc_sweep_step`
- Mostly implementation I don't understand
- `gc_sweep_finish`

## `gc_sweep_finish`
- `heap_pages_free_unused_pages`
- `heap_allocatable_pages_set` if `heap_allocatable_pages < heap_tomb->total_pages`
- `gc_verify_internal_consistency` if `heap_allocatable_pages < heap_tomb->total_pages`

Updated by PikachuEXE (Pikachu EXE) almost 5 years ago

I got segfault with this patch
I added 1 extra change to fix compile error on MacOS
https://github.com/ruby/ruby/compare/trunk...PikachuEXE:feature/gc-compact/15626

Installed via
rvm install ruby-head --reconfigure --repo https://github.com/PikachuEXE/ruby.git --branch feature/gc-compact/15626 -n ruby_27_gc_comapct

ruby-head has no such issue
Using passenger open source 6.0.2 for testing

Edit: Forgot to mention, I did not call GC.compact

Updated by noahgibbs (Noah Gibbs) almost 5 years ago

One problem with automatically compacting on full GC: compaction appears to require two full GCs to do its thing. It might be possible to integrate this into Ruby's normal cycle. But I don't think this is designed for that as-is.

Aaron has commented on this in his talks - usually he calls GC.compact once after giving everything a chance to "settle" (running the app for awhile.)

You could call it periodically, but I think this is a heavier-weight operation than a normal full GC.

Updated by duerst (Martin Dürst) almost 5 years ago

noahgibbs (Noah Gibbs) wrote:

Aaron has commented on this in his talks -

Any pointers?

usually he calls GC.compact once after giving everything a chance to "settle" (running the app for awhile.)

Updated by tenderlovemaking (Aaron Patterson) almost 5 years ago

@naruse (Yui NARUSE):

I want to introduce this feature in Ruby 2.7.0-preview1.
Therefore could you write an item for the release note?

Yes. I need to fix a bug with THEAP integration and I'll commit this along with NEWS.

@ko1 (Koichi Sasada):

I'm not sure we can accept the "Known Issue". GC.verify_compaction_references can't discover this issue, right? It can introduce difficult bugs to remove... I have no good idea to discover it. "Time" (not time class, of course) can solve it, but...

Neither do I. Doubling the heap size then compacting should give highest probability to find these references (I think). Unfortunately the way it finds them is by crashing.

Can we introduce it just after every major (full) gc?

I think it's a good idea, but maybe only with GC.stress = true? I don't have a good idea about the performance (wall clock time), but I'm worried about adding it automatically without doing some performance improvements.

Could you use gist (or something useful) to show your benchmark results with figures?

Yes, I'll make a gist containing benchmarks and figures. I'll post them here before committing.

@martin:

Why a full CG at the end? Wouldn't it be much cheaper to just collect the T_MOVED objects (these objects could contain an internal pointer each, chaining them together).

Mainly because I'm lazy. :D

We could definitely do that, I just wanted to prove that we could do compaction in the first place. I have a lot of ideas for performance improvements.

Hash key objects: You say most hash keys are strings, and they can be moved, but what about Symbols?

Symbols probably can be moved. My approach was conservative: assume nothing can move then slowly introduce things I know to be movable. We can probably do Symbols too, I just didn't bother because the results were so good with just strings.

I had a look at the .png files. I see the distinction between red squares and other squares very well, but I can't see any distinction between movable and empty slots, everything that's not red is black.

Sorry, I used translucent pixels for the "empty" locations. If you view the png on a white background it should look right, but I'll regenerate the pngs with a white background instead of translucent.

@nate:

Aaron, regarding the API: how do you envision this will be called "in the real world"?

Initially, just GC.compact before fork (is how we've used it). Once I address performance issues it should just be automatic; it should happen after the mark phase on a full GC (as a mark-compact GC).

@Pikachu:

But I don't really understand which step contains code for "ensure all references get marked"

Full GC marks everything in the mark table. Mark table bits aren't cleared until the next GC.

I got segfault with this patch
I added 1 extra change to fix compile error on MacOS

I'll take a look at the core file, but can you give more info so I can try locally?

Edit: Forgot to mention, I did not call GC.compact

Hmmm. My patch should have no impact without calling GC.compact.

@Noah:

Aaron has commented on this in his talks - usually he calls GC.compact once after giving everything a chance to "settle" (running the app for awhile.)

We call it after loading as much code as possible, and right before forking. This way we have as many heap pages packed with probably long lived objects before forking (to maximize CoW).

Updated by noahgibbs (Noah Gibbs) almost 5 years ago

@duerst (Martin Dürst): looks like he's updated his approach a bit since I last saw the talk (quite awhile ago!) But here's a 2017 version: https://www.youtube.com/watch?v=ptG5HtmvQx8

At around 19:45 he's talking about compacting before fork.

When I saw an older talk, he was talking about (I believe) running a large Rails app for awhile and then just showing that moving around long-lived blocks of memory reduced fragmentation. But I may remember wrong.

Updated by PikachuEXE (Pikachu EXE) almost 5 years ago

I'll take a look at the core file, but can you give more info so I can try locally?

I just start my current project with passenger installed
I will create a sample project to reproduce this issue
Takes time though (stealing work time to do this, though this is also legit work >:)

Any info I can provide without creating the sample project?

Updated by noahgibbs (Noah Gibbs) almost 5 years ago

I'm trying to run this and check speed and stability with Rails Ruby Bench. I'm having an odd bug when I try to use GC.compact. But it may be my fault so I won't post it (yet).

However, I do notice a speed regression when running w/ your patch from recent head-of-master (SHA c92c0a593593da2eb1ff94d83d80f71e7ae5343c before patching.)

I'm seeing unpatched performance of 187.4 req/sec with variance 3.8. But with your patch, I'm seeing 178.8 req/sec with variance 4.7. That's close enough it could be measurement error if I was unlucky, and I'll try running it again. But I think this may currently reduce performance even without calling GC.compact.

Due to the way my benchmark works, increased memory usage can also manifest as reduced speed -- I don't have an easy way to tease those apart with RRB. But one or the other seems likely.

On the plus side, I did not see any segfaults, and RRB is a pretty good stability torture-test for crashes. I ran this with 30 batches of 30,000 HTTP requests per Ruby version (unpatched vs patched.)

Updated by noahgibbs (Noah Gibbs) almost 5 years ago

Okay. Second benchmark run gives essentially the same results:

unpatched: 187.9 reqs/sec, variance 4.1
patched: 179.2 reqs/sec, variance 4.8

So yeah, I'm seeing a performance regression with the patched code with no call to GC.compact.

If I run GC.compact in an initializer in Discourse, I wind up getting "no such method" errors, though I seem to get different ones run-to-run. For instance:

bundler: failed to load command: ./start.rb (./start.rb)
NoMethodError: undefined method `create_id' for #<EmailHelper:0x0000555c88717e30>
  /home/ubuntu/mri-gccompact/lib/ruby/2.7.0/json/common.rb:156:in `initialize'
  /home/ubuntu/mri-gccompact/lib/ruby/2.7.0/json/common.rb:156:in `new'
  /home/ubuntu/mri-gccompact/lib/ruby/2.7.0/json/common.rb:156:in `parse'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/sprockets-3.7.1/lib/sprockets/manifest.rb:318:in `json_decode'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/sprockets-3.7.1/lib/sprockets/manifest.rb:73:in `initialize'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/sprockets-rails-3.2.0/lib/sprockets/railtie.rb:201:in `new'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/sprockets-rails-3.2.0/lib/sprockets/railtie.rb:201:in `build_manifest'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/sprockets-rails-3.2.0/lib/sprockets/railtie.rb:214:in `block in <class:Railtie>'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/activesupport-4.2.8/lib/active_support/lazy_load_hooks.rb:36:in `execute_hook'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/activesupport-4.2.8/lib/active_support/lazy_load_hooks.rb:45:in `block in run_load_hooks'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/activesupport-4.2.8/lib/active_support/lazy_load_hooks.rb:44:in `each'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/activesupport-4.2.8/lib/active_support/lazy_load_hooks.rb:44:in `run_load_hooks'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/railties-4.2.8/lib/rails/application/finisher.rb:62:in `block in <module:Finisher>'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/railties-4.2.8/lib/rails/initializable.rb:30:in `instance_exec'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/railties-4.2.8/lib/rails/initializable.rb:30:in `run'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/railties-4.2.8/lib/rails/initializable.rb:55:in `block in run_initializers'
  /home/ubuntu/mri-gccompact/lib/ruby/2.7.0/tsort.rb:228:in `block in tsort_each'
  /home/ubuntu/mri-gccompact/lib/ruby/2.7.0/tsort.rb:350:in `block (2 levels) in each_strongly_connected_component'
  /home/ubuntu/mri-gccompact/lib/ruby/2.7.0/tsort.rb:431:in `each_strongly_connected_component_from'
  /home/ubuntu/mri-gccompact/lib/ruby/2.7.0/tsort.rb:349:in `block in each_strongly_connected_component'
  /home/ubuntu/mri-gccompact/lib/ruby/2.7.0/tsort.rb:347:in `each'
  /home/ubuntu/mri-gccompact/lib/ruby/2.7.0/tsort.rb:347:in `call'
  /home/ubuntu/mri-gccompact/lib/ruby/2.7.0/tsort.rb:347:in `each_strongly_connected_component'
  /home/ubuntu/mri-gccompact/lib/ruby/2.7.0/tsort.rb:226:in `tsort_each'
  /home/ubuntu/mri-gccompact/lib/ruby/2.7.0/tsort.rb:205:in `tsort_each'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/railties-4.2.8/lib/rails/initializable.rb:54:in `run_initializers'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/railties-4.2.8/lib/rails/application.rb:352:in `initialize!'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/railties-4.2.8/lib/rails/railtie.rb:194:in `public_send'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/railties-4.2.8/lib/rails/railtie.rb:194:in `method_missing'
  /home/ubuntu/rails_ruby_bench/work/discourse/config/environment.rb:5:in `<top (required)>'
  /home/ubuntu/rails_ruby_bench/start.rb:13:in `require'
  /home/ubuntu/rails_ruby_bench/start.rb:13:in `<top (required)>'

The only difference between this run and the (error-free) patched version speed run above is that I add a single initializer in discourse/config/initializers/900-gc-compact.rb, consisting of only the text "GC.compact".

However, the stack trace is not always the same - I got a different "no-such-method"-type error on the previous run.

Here's another stack trace from running the exact same setup a second time in a row -- let me know if you'd like me to send information, or set you up with an account on a VM to look at it.

NoMethodError: undefined method `create_id' for ".avi.jst.coffee.erb":String
  /home/ubuntu/mri-gccompact/lib/ruby/2.7.0/json/common.rb:156:in `initialize'
  /home/ubuntu/mri-gccompact/lib/ruby/2.7.0/json/common.rb:156:in `new'
  /home/ubuntu/mri-gccompact/lib/ruby/2.7.0/json/common.rb:156:in `parse'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/sprockets-3.7.1/lib/sprockets/manifest.rb:318:in `json_decode'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/sprockets-3.7.1/lib/sprockets/manifest.rb:73:in `initialize'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/sprockets-rails-3.2.0/lib/sprockets/railtie.rb:201:in `new'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/sprockets-rails-3.2.0/lib/sprockets/railtie.rb:201:in `build_manifest'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/sprockets-rails-3.2.0/lib/sprockets/railtie.rb:214:in `block in <class:Railtie>'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/activesupport-4.2.8/lib/active_support/lazy_load_hooks.rb:36:in `execute_hook'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/activesupport-4.2.8/lib/active_support/lazy_load_hooks.rb:45:in `block in run_load_hooks'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/activesupport-4.2.8/lib/active_support/lazy_load_hooks.rb:44:in `each'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/activesupport-4.2.8/lib/active_support/lazy_load_hooks.rb:44:in `run_load_hooks'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/railties-4.2.8/lib/rails/application/finisher.rb:62:in `block in <module:Finisher>'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/railties-4.2.8/lib/rails/initializable.rb:30:in `instance_exec'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/railties-4.2.8/lib/rails/initializable.rb:30:in `run'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/railties-4.2.8/lib/rails/initializable.rb:55:in `block in run_initializers'
  /home/ubuntu/mri-gccompact/lib/ruby/2.7.0/tsort.rb:228:in `block in tsort_each'
  /home/ubuntu/mri-gccompact/lib/ruby/2.7.0/tsort.rb:350:in `block (2 levels) in each_strongly_connected_component'
  /home/ubuntu/mri-gccompact/lib/ruby/2.7.0/tsort.rb:431:in `each_strongly_connected_component_from'
  /home/ubuntu/mri-gccompact/lib/ruby/2.7.0/tsort.rb:349:in `block in each_strongly_connected_component'
  /home/ubuntu/mri-gccompact/lib/ruby/2.7.0/tsort.rb:347:in `each'
  /home/ubuntu/mri-gccompact/lib/ruby/2.7.0/tsort.rb:347:in `call'
  /home/ubuntu/mri-gccompact/lib/ruby/2.7.0/tsort.rb:347:in `each_strongly_connected_component'
  /home/ubuntu/mri-gccompact/lib/ruby/2.7.0/tsort.rb:226:in `tsort_each'
  /home/ubuntu/mri-gccompact/lib/ruby/2.7.0/tsort.rb:205:in `tsort_each'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/railties-4.2.8/lib/rails/initializable.rb:54:in `run_initializers'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/railties-4.2.8/lib/rails/application.rb:352:in `initialize!'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/railties-4.2.8/lib/rails/railtie.rb:194:in `public_send'
  /home/ubuntu/.rvm/gems/ext-mri-march-gccompact/gems/railties-4.2.8/lib/rails/railtie.rb:194:in `method_missing'
  /home/ubuntu/rails_ruby_bench/work/discourse/config/environment.rb:5:in `<top (required)>'
  /home/ubuntu/rails_ruby_bench/start.rb:13:in `require'
  /home/ubuntu/rails_ruby_bench/start.rb:13:in `<top (required)>'

Updated by tenderlovemaking (Aaron Patterson) almost 5 years ago

@noah I think I have these errors fixed on my branch (including the performance regression). I slowed down every call to free an object because they may need to be removed from the global object_id table. Should be fixed here though: https://github.com/ruby/ruby/commit/82f3d8b6cc4e491f13ec29f67daa2bb4ae0f1dbd (GC benchmarks are the same speed on trunk vs my branch)

I'll try to merge this this week, but I'm traveling tomorrow so I may commit this on Monday. I'm really sorry for the delay.

Updated by PikachuEXE (Pikachu EXE) almost 5 years ago

@tenderlovemaking (Aaron Patterson)

I try your patch again and no error this time (without the middleware to call GC.compact)
Maybe I forgot to restart server after switching to branch without middleware on my last test

However with middleware (I have restarted server after switching branch this time, yup)
I can see the ruby processes running at 100% after requests finished and it does not happen on app boot

However since I am still using old ruby-head without your commit
https://github.com/ruby/ruby/commit/82f3d8b6cc4e491f13ec29f67daa2bb4ae0f1dbd

What's the best way to install the latest ruby-head with for latest version of this change?

Updated by tenderlovemaking (Aaron Patterson) almost 5 years ago

Hi,

Here are some statistics I gathered. I'll present a few more at RubyKaigi using our application.

https://gist.github.com/tenderlove/99112e9fcc85d9c6c7d9d0ea40063fc6

Actions #30

Updated by tenderlovemaking (Aaron Patterson) almost 5 years ago

  • Status changed from Open to Closed

Applied in changeset trunk|r67479.


Adding GC.compact and compacting GC support.

This commit adds the new method GC.compact and compacting GC support.
Please see this issue for caveats:

https://bugs.ruby-lang.org/issues/15626

[Feature #15626]

Updated by noahgibbs (Noah Gibbs) almost 5 years ago

Tested again w/ merged version of this patch. My "before" and "after" timings are within the margin of measurement error. I no longer see a performance regression. Thanks!

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0