Project

General

Profile

Actions

Bug #20022

closed

GC.verify_compaction_references does not actually move all objects

Added by kjtsanaktsidis (KJ Tsanaktsidis) 5 months ago. Updated 5 months ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:115484]

Description

While debugging https://bugs.ruby-lang.org/issues/20021, I ran into a separate issue which I figured was worth resolving whilst I had it in my head.

The intention of GC.verify_compaction_references is, I believe, to force every single movable object to be moved, so that it's possible to debug native extensions which not correctly updating their references to objects they mark as movable (if this is not the case, you can stop reading now and accept my apologies!)

To do this, it doubles the number of allocated pages for each size pool, and sorts the heap pages so that the free ones are swept first; thus, every object in an old page should be moved into a free slot in one of the new pages.

This worked fine until movement of objects between size pools during compaction was implemented. That causes some problems for verify_compaction_references:

  • We were doubling the number of pages in each size pool, but actually if some objects need to move into a different pool, there's no guarantee that they'll be enough room in that one.
  • It's possible for the sweep & compact cursors to meet in one size pool before all the objects that want to move into that size pool from another are processed by the compaction.

You can see these problems by changing some of the movement tests in test_gc_compact.rb to try and move e.g. 50,000 objects instead of 500 (by changing HASH_COUNT); the test is not able to actually move all of the objects in a single compaction run.

What I implemented in this PR (https://github.com/ruby/ruby/pull/9041) is two things:

  • Firstly, it examines every object and determine where it wants to be compacted into; we use this information to calculate the desired number of pages to add to each size pool.
  • Secondly, it compacts twice; once in order of ascending size pool, and once descending. This means that we are guaranteed to be able to move everything we want to move into a size pool before we start advancing that pool's own compact cursor.

With these fixes in place, I was able to make the compaction tests move any amount of objects (previously, test_moving_hashes_down_size_pools would begin failing on my machine if I set HASH_COUNT to above about 6,000.

Actions #1

Updated by kjtsanaktsidis (KJ Tsanaktsidis) 5 months ago

  • Subject changed from GC.verify_compaction_references does not actually move alll objects to GC.verify_compaction_references does not actually move all objects

Updated by peterzhu2118 (Peter Zhu) 5 months ago

Secondly, it compacts twice; once in order of ascending size pool, and once descending. This means that we are guaranteed to be able to move everything we want to move into a size pool before we start advancing that pool's own compact cursor.

If we figure out the optimal size of every object and we set the heap to size (optimal_size * 2) + 1, it should guarantee that every object is moved before the cursors meet. Then, I think, we should only need to compact once.

Updated by peterzhu2118 (Peter Zhu) 5 months ago

Otherwise, I think this patch makes sense. A better implementation of GC.verify_compaction_references has been on my todo list for a while, but I haven't gotten around to implementing it. Thank you for working on this!

Updated by kjtsanaktsidis (KJ Tsanaktsidis) 5 months ago

If we figure out the optimal size of every object and we set the heap to size (optimal_size * 2) + 1, it should guarantee that every object is moved before the cursors meet.

This sounds like a good idea (not least because it avoids hacking at the actual compaction implementation just for this development tool). Unfortunately I don't think the right formula is quite that simple. 2N+1 isn't enough pages for all of the test cases - I added some iterations to some of the other move up/down test cases and they fail. They fail worse if I make them move up/down by more than one size pool... so I'm thinking the right factor has something to do with what size pool objects are coming from as well as going to (e.g. something along these lines:)

        size_t weighting_factor = 1;
        if (dest_pool->slot_size > page->size_pool->slot_size) {
            weighting_factor <<= (dest_pool - page->size_pool);
        } else if (dest_pool->slot_size < page->size_pool->slot_size) {
            weighting_factor <<= (page->size_pool - dest_pool);
        }
        tdata->required_slots_weighted[dest_pool_idx] += weighting_factor;

This isn't quite it either though.

I haven't been able to figure out the right formula yet but I'll try and have another go over the weekend... in the meanwhile I pushed up some WIP to my branch if you have a bright idea about the right formula.

Updated by peterzhu2118 (Peter Zhu) 5 months ago

I thought about this again, my previous formula was incorrect. I think we need number of pages we need to add for each size pool is num_pages(num_objects_moving_in + num_objects_in_size_pool) + 1, where num_objects_moving_in is the number of objects moving in from other size pools and num_objects_in_size_pool is the number of objects in the current size pool (regardless of whether the object will move out or not because even if the object moves out, it will leave a T_MOVED behind).

Updated by kjtsanaktsidis (KJ Tsanaktsidis) 5 months ago

I stared at this and did a bit of printf on it, and I think I worked out that the formula for how many pages to add in one heap actually relates to how many objects there are in all the other heaps, even if they're not moved. Here's my working...

Let us have two size pools, P0 and P1. Let P0 have slot size S0, and P1 have slot size S1. Let the page size be Z bytes.

Let us start with the heap of P0 in the following configuration:

HEAP START (sweep cursor starts here)
(F0 free pages)
(N pages of objects which want to be moved to P1, totaling N * (Z/S0) objects)
(M0 pages of objects which want to remain in P0, totaling M0 * (Z/S0) objects)
HEAP END (compact cursor starts here)

And with P1 in the following configuration:

HEAP START (sweep cursor starts here)
(F1 free pages)
(M1 pages of objects which want to remain in P1, totaling M1 * (Z/S1) objects)
HEAP END (compact cursor starts here)

Let M1 < M0.

Now, consider what happens to the loop in gc_sweep_compact as we compact this heap. The objective is to work out what values for F0 and F1 are required such that the sweep & compact cursors do not meet until all objects have been moved to their desired size pool.

For the first M1 iterations of the loop, we are copying objects only within the two size pools.

  • For P0, the compact cursor will advance by M1 (number of iterations), and the sweep cursor will advance by M1 (all objects are moved within the pool, and the newly-filled pages are swept).
  • For P1, the compact cursor will advance by M1, and the sweep cursor will advance by M1 (by the same logic).

For the subsequent (M0 - M1) loop iterations, we still need to copy objects within P0. P1, however will simply be compacting empty pages with nothing in it.

  • For P0, the compact cursor will advance by (M0 - M1), and the sweep cursor will advance by (M0 - M1) (by the same logic as the previous iterations)
  • For P1, the compact cursor will advance by (M0 - M1) (since it iterates too), but the sweep cursor will not advance at all (no objects are being copied into free pages in P1).

For the next N iterations, the compaction loop will be copying objects out of P0 and into P1 when compacting P0, and doing nothing again when compacting P1.

  • For P0, the compact cursor will advance by N (number of iterations), and the sweep cursor will not advance at all (no pages are being filled in P0).
  • For P1, the compact cursor will advance by N, and the sweep cursor will advance by ((S1/S0) * N) (we are copying N objects into new pages in P1, and their size changes from S0 to S1, so they occupy a different number of pages).

Thus, during the process:

  • P0 will have had its compact cursor advance by M1 + (M0 - M1) + N == M0 + N, and its sweep cursor advance by M1 + (M0 - M1) == M0
  • P0 will have had its compact cursor advance by M1 + (M0 - M1) + N == M0 + N, and its sweep cursor advance by M1 + (S1/S0)*N

And so, the correct number of free pages F0 and F1 which need to be in P0 and P1 for the cursors not to meet is:

  • F0 == (M0 + N) + (M0) == 2*M0 + N
  • F1 == (M0 + N) + (M1 + (S1/S0)*N) == M0 + M1 + (1 + (S1/S0)) * N

I have to have a think about how to generalize this to arbitrary heap layouts and five size pools - anyway figured I'd show my working as I go through it :P

Updated by kjtsanaktsidis (KJ Tsanaktsidis) 5 months ago

Alright, I have figured it out I think - @peterzhu2118 (Peter Zhu) do you mind taking another look at my PR? The insight is we need to add pages to make all size pools have the same number of pages (so their compact cursor can all advance the same number of times), and then add the actual new pages we're going to move objects into (so that there's guaranteed to be space in un-compacted-yet pages for all objects to move).

Actions #8

Updated by Anonymous 5 months ago

  • Status changed from Open to Closed

Applied in changeset git|5d832d16d9878766dfe527934ef1ad64698b9bf8.


Add objspace_each_pages to gc.c

This works like objspace_each_obj, except instead of being called with
the start & end address of each page, it's called with the page
structure itself.

[Bug #20022]

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0