Feature #18819
closedMoving Strings between size pools
Description
Motivation¶
Using GC Compaction we can move objects around on a Ruby heap, this allows us to decrease memory fragmentation for more efficient memory usage and better copy on write performance.
Ruby currently has several heaps, each heap can hold different sizes of objects. Currently compaction can only move objects within the same heap.
Some objects in Ruby can change size. For example, using String#squeeze!
or String#<<
will modify an existing String in place to either shrink it, or grow it respectively.
This can result in the String no longer being in the most appropriate sized heap, which can either mean wasted memory (when the string is shrunk), or negation of the benefits of Variable Width Allocation (when the String is grown).
In order to mitigate these problems we need a way of moving objects between heaps with different sized slots.
Example: Growing a String¶
Allocating a String 16 bytes or less will result in the creation of an embedded RString object in the 40 byte size pool (assuming a 64 bit architecture).
If we now use string << "foobar"
to mutate the string, increasing its size by another 6 bytes we can no longer embed the string in a 40 byte slot.
In this case the string gets its NOEMBED
bit set, and new memory is allocated for the buffer in the system heap using malloc
. This new memory is no longer adjacent to the RString object in the Ruby heap and so any locality benefits are gone.
In addition, if the original String exists in a slot with a size larger than sizeof(RValue)
then the additional space at the end of the slot after the ptr
to the heap buffer is unused.
Example: Shrinking a String¶
Assuming a 64 but architecture, allocating a String longer than 16 bytes will result in the creation of an embedded string in one of the larger size pools (80, 160, 320 or 640 bytes).
If we were to shrink that string so that it is possible to fit into one of the smaller size pools (eg. using string.squeeze!
), then at least half of the current slot size will be unused space at the end of the string.
PR Summary¶
This change builds on the work in this feature to reverse the compaction cursor movement to allow movement of objects between size pools during GC Compaction.
The algorithm works as follows
- During GC Compaction
- During object movement step
- If the object being moved is a string
- Calculate how much space the string would require as an embedded string
- If the embedded size fits within a size pool
- move the object to the scan cursor position within the desired pool
- If the embedded size is larger than any possible size pool
- The object must be
NOEMBED
- move the object to the scan cursor position within the smallest size pool
- The object must be
- If the object being moved is a string
- During reference update step
- If the Object is a string and is not embedded
- Calculate its size as if it was embedded
- If the object can be embedded in its current slot
- Convert the
NOEMBED
string into an embedded string - Copy the string buffer into the slot
- Free the original string buffer
- Convert the
- If the Object is a string and is not embedded
- During object movement step
Notes¶
- Currently we only support movement of
T_STRING
objects. More objects will be added in further PRs as they're given VWA support. - We don't attempt to re-embed shared strings, shared roots, strings that wrap C string literals (Strings with
NOFREE
set), or "fstrings" (Strings withSTR_FAKESTR
set)
Testing movement¶
This PR adds some extra keys to the hash returned by GC.compact
. These keys are :moved_up
and :moved_down
. Each contains a hash of object types and a count of those object types that have either been moved into a larger or a smaller size pool.
Checking this can be done by creating some fragmentation in different heaps, and forcing strings to be resized, then running compaction.
moveables = []
large_slots = []
n = 1500
# Ensure fragmentation in the large heap
base_slot_size = GC.stat_heap[0].fetch(:slot_size)
n.times {
String.new(+"a" * base_slot_size).downcase
large_slots << String.new(+"a" * base_slot_size).downcase
}
n.times {
# strings are created as shared strings when initialized from literals
# use downcase to force the creation of an embedded string (it calls
# rb_str_new internally)
moveables << String.new("a").downcase
}
moveables.map { |s| s << ("bc" * base_slot_size) }
p GC.compact.fetch(:moved_up)
This script outputs the following on my development machine.
{:T_STRING=>319}
Future Work¶
This PR implements a framework for moving objects between size pools. There are two main areas to focus on following this change
- Implementing movement for existing mutatable VWA types. Currently this is just Array, as Class objects cannot be mutated and therefore cannot move between pools
- Add support for VWA to more types, implementing object movement where appropriate