Project

General

Profile

Actions

Feature #18819

closed

Moving Strings between size pools

Added by eightbitraptor (Matt V-H) over 2 years ago. Updated over 2 years ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:108796]

Description

Github PR

Motivation

Using GC Compaction we can move objects around on a Ruby heap, this allows us to decrease memory fragmentation for more efficient memory usage and better copy on write performance.

Ruby currently has several heaps, each heap can hold different sizes of objects. Currently compaction can only move objects within the same heap.

Some objects in Ruby can change size. For example, using String#squeeze! or String#<< will modify an existing String in place to either shrink it, or grow it respectively.

This can result in the String no longer being in the most appropriate sized heap, which can either mean wasted memory (when the string is shrunk), or negation of the benefits of Variable Width Allocation (when the String is grown).

In order to mitigate these problems we need a way of moving objects between heaps with different sized slots.

Example: Growing a String

Allocating a String 16 bytes or less will result in the creation of an embedded RString object in the 40 byte size pool (assuming a 64 bit architecture).

If we now use string << "foobar" to mutate the string, increasing its size by another 6 bytes we can no longer embed the string in a 40 byte slot.

In this case the string gets its NOEMBED bit set, and new memory is allocated for the buffer in the system heap using malloc. This new memory is no longer adjacent to the RString object in the Ruby heap and so any locality benefits are gone.

In addition, if the original String exists in a slot with a size larger than sizeof(RValue) then the additional space at the end of the slot after the ptr to the heap buffer is unused.

Example: Shrinking a String

Assuming a 64 but architecture, allocating a String longer than 16 bytes will result in the creation of an embedded string in one of the larger size pools (80, 160, 320 or 640 bytes).

If we were to shrink that string so that it is possible to fit into one of the smaller size pools (eg. using string.squeeze!), then at least half of the current slot size will be unused space at the end of the string.

PR Summary

This change builds on the work in this feature to reverse the compaction cursor movement to allow movement of objects between size pools during GC Compaction.

The algorithm works as follows

  • During GC Compaction
    • During object movement step
      • If the object being moved is a string
        • Calculate how much space the string would require as an embedded string
        • If the embedded size fits within a size pool
          • move the object to the scan cursor position within the desired pool
        • If the embedded size is larger than any possible size pool
          • The object must be NOEMBED
            • move the object to the scan cursor position within the smallest size pool
    • During reference update step
      • If the Object is a string and is not embedded
        • Calculate its size as if it was embedded
        • If the object can be embedded in its current slot
          • Convert the NOEMBED string into an embedded string
          • Copy the string buffer into the slot
          • Free the original string buffer

Notes

  • Currently we only support movement of T_STRING objects. More objects will be added in further PRs as they're given VWA support.
  • We don't attempt to re-embed shared strings, shared roots, strings that wrap C string literals (Strings with NOFREE set), or "fstrings" (Strings with STR_FAKESTR set)

Testing movement

This PR adds some extra keys to the hash returned by GC.compact. These keys are :moved_up and :moved_down. Each contains a hash of object types and a count of those object types that have either been moved into a larger or a smaller size pool.

Checking this can be done by creating some fragmentation in different heaps, and forcing strings to be resized, then running compaction.

moveables = []
large_slots = []

n = 1500

# Ensure fragmentation in the large heap
base_slot_size = GC.stat_heap[0].fetch(:slot_size)
n.times {
  String.new(+"a" * base_slot_size).downcase
  large_slots << String.new(+"a" * base_slot_size).downcase
}

n.times {
  # strings are created as shared strings when initialized from literals
  # use downcase to force the creation of an embedded string (it calls
  # rb_str_new internally)
  moveables << String.new("a").downcase
}
moveables.map { |s| s << ("bc" * base_slot_size) }

p GC.compact.fetch(:moved_up)

This script outputs the following on my development machine.

{:T_STRING=>319}

Future Work

This PR implements a framework for moving objects between size pools. There are two main areas to focus on following this change

  • Implementing movement for existing mutatable VWA types. Currently this is just Array, as Class objects cannot be mutated and therefore cannot move between pools
  • Add support for VWA to more types, implementing object movement where appropriate
Actions

Also available in: Atom PDF

Like0
Like0