Project

General

Profile

Actions

Bug #21210

open

IO::Buffer gets invalidated on GC compaction

Added by hanazuki (Kasumi Hanazuki) 1 day ago. Updated about 19 hours ago.

Status:
Open
Assignee:
-
Target version:
-
ruby -v:
ruby 3.5.0dev (2025-04-01T16:11:01Z master 30e5e7c005) +PRISM [x86_64-linux]
[ruby-core:121498]

Description

6012145299cfa4ab561360c78710c7f2941a7e9d implemented compaction for IO::Buffer.
It looks like this doesn't work well with an IO::Buffer that shares memory region with a String object.
I think the problem is that an IO::Buffer holds the raw pointer to the String content, and now the content can be moved by GC when the String is embedded.

str = +"hello"
buf = IO::Buffer.for(str)
p buf.valid?

GC.verify_compaction_references(expand_heap: true, toward: :empty)

p buf.valid?  #=> should be true

This example should print two trues. Actually:

% ./ruby -v --disable-gems test.rb
ruby 3.5.0dev (2025-04-01T16:11:01Z master 30e5e7c005) +PRISM [x86_64-linux]
true
false
Actions #1

Updated by byroot (Jean Boussier) 1 day ago

  • Backport changed from 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN to 3.1: DONTNEED, 3.2: DONTNEED, 3.3: DONTNEED, 3.4: DONTNEED

Updated by hanazuki (Kasumi Hanazuki) 1 day ago

The source field in a struct rb_io_buffer can have a String or an IO::Buffer if not nil.
When source is a String, we need to pin the object (rb_str_locktmp does not keep the embedded String content from being moved by GC).
When source is an IO::Buffer, I believe it can be moved, because the base pointer refers to a malloced or mmapped memory region, which does not get compacted by GC.

Updated by eightbitraptor (Matt V-H) 1 day ago

I checked with the following test.rb that this patch does fixes the Buffer validity after compaction.

str = +"hello"

str_buf = IO::Buffer.for(str)

buf = IO::Buffer.new(128)
slice = buf.slice(8, 32)

GC.verify_compaction_references(expand_heap: true, toward: :empty)

p str_buf.valid?
p buf.valid?
p slice.valid?

Instead of pinning the source string, did you consider allowing the string to move by calculating the base offset, then moving the source string and updating the base pointer using rb_gc_location(buffer->source) and the calculated offset? Would this work?

Updated by alanwu (Alan Wu) about 22 hours ago

Another option that maintains validity across movement (untested):

diff --git a/io_buffer.c b/io_buffer.c
index 0534999319..13102b561d 100644
--- a/io_buffer.c
+++ b/io_buffer.c
@@ -570,7 +570,7 @@ rb_io_buffer_type_for(VALUE klass, VALUE string)
     }
     else {
         // This internally returns the source string if it's already frozen.
-        string = rb_str_tmp_frozen_acquire(string);
+        string = rb_str_tmp_frozen_no_embed_acquire(string);
         return io_buffer_for_make_instance(klass, string, RB_IO_BUFFER_READONLY);
     }
 }

Looks like the mutable string buffer code paths pin using rb_str_locktmp().

Updated by hanazuki (Kasumi Hanazuki) about 20 hours ago ยท Edited

eightbitraptor (Matt V-H) wrote in #note-4:

Instead of pinning the source string, did you consider allowing the string to move by calculating the base offset, then moving the source string and updating the base pointer using rb_gc_location(buffer->source) and the calculated offset?

I think that approach can be possible if we check that the buffer is not locked before allowing the String to move. The base pointer may have been passed to a syscall or C library call, in which case we cannot move it. Extensions using the raw pointer guard the section with rb_io_buffer_lock and rb_io_buffer_unlock. So we can check RB_IO_BUFFER_LOCKED flag to see if such a syscall is running.

Besides, there is another (unreported?) problem to be fixed. In the current implementation, the RB_IO_BUFFER_LOCKED flag of an IO::Buffer is not correctly synced among its slices, which share the same memory region.

str = +"hello"
buf = IO::Buffer.for(str)
slice = buf.slice

slice.locked do
  p buf.locked?  #=> false
  # Moving or freeing `buf` here may cause something bad, as we expect the base pointer of `slice` is pinned.
end

I will file this as another ticket. #=> #21212

Updated by hanazuki (Kasumi Hanazuki) about 19 hours ago

alanwu (Alan Wu) wrote in #note-5:

Another option that maintains validity across movement (untested):

diff --git a/io_buffer.c b/io_buffer.c
index 0534999319..13102b561d 100644
--- a/io_buffer.c
+++ b/io_buffer.c
@@ -570,7 +570,7 @@ rb_io_buffer_type_for(VALUE klass, VALUE string)
     }
     else {
         // This internally returns the source string if it's already frozen.
-        string = rb_str_tmp_frozen_acquire(string);
+        string = rb_str_tmp_frozen_no_embed_acquire(string);
         return io_buffer_for_make_instance(klass, string, RB_IO_BUFFER_READONLY);
     }
 }

So this will copy the embedded String content to a malloc'ed memory, right?

The default GC embeds up to 640 bytes minus RSTRING header (IIUC). I think we need to evaluate the performance impact of adding such size of copies.

packet = "x" * 600
buf = IO::Buffer.for(packet)
buf.get_value(:U8, 32)

Looks like the mutable string buffer code paths pin using rb_str_locktmp().

Ah, yes. rb_str_locktmp was not relevant to IO::Buffer.for without a block.

In case of IO::Buffer.for with a block, the source String is referred to by a C stack variable, and so the object will not be moved.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like1Like0Like0Like0Like0