Bug #14745
closedHigh memory usage when using String#replace with IO.copy_stream
Description
I'm using custom IO-like objects that implement #read as the first argument to IO.copy_stream, and I noticed odd memory behaviour when using String#replace on the output buffer versus String#clear. Here is an example of a "fake IO" object where #read uses String#clear on the output buffer:
GC.disable
require "stringio"
class FakeIO
def initialize(content)
@io = StringIO.new(content)
end
def read(length, outbuf)
chunk = @io.read(length)
if chunk
outbuf.clear
outbuf << chunk
chunk.clear
else
outbuf.clear
end
outbuf unless outbuf.empty?
end
end
io = FakeIO.new("a" * 50*1024*1024) # 50MB
IO.copy_stream(io, File::NULL)
system "top -pid #{Process.pid}"
This program outputs memory usage of 50MB at the end, as expected – 50MB was loaded into memory at the beginning and any new strings are deallocated. However, if I modify the #read implementation to use String#replace instead of String#clear:
def read(length, outbuf)
chunk = @io.read(length)
if chunk
outbuf.replace chunk
chunk.clear
else
outbuf.clear
end
outbuf unless outbuf.empty?
end
the memory usage has now doubled to 100MB at the end of the program, indicating that some string bytes weren't successfully deallocated. So, it seems that String#replace has different behaviour compared to String#clear + String#<<.
I was only able to reproduce this with IO.copy_stream
, the following program shows 50MB memory usage, regardless of whether the String#clear or String#replace approach is used:
GC.disable
buffer = "a" * 50*1024*1024
chunk = "b" * 50*1024*1024
if ARGV[0] == "clear"
buffer.clear
buffer << chunk
else
buffer.replace chunk
end
chunk.clear
system "top -pid #{Process.pid}"
With this program I also noticed one interesting thing. If I remove chunk.clear
, then the "clear" version uses 100MB as expected (because both buffer and chunk strings are 50MB large), but the "replace" version uses only 50MB, which makes it appear that the buffer
string doesn't use any memory when in fact it should use 50MB just like the chunk
string. I found that odd, and I think it might be a clue to the memory bug with String#replace I experienced when using IO.copy_stream
.