Project

General

Profile

Bug #14745

High memory usage when using String#replace with IO.copy_stream

Added by janko (Janko Marohnić) about 2 years ago. Updated 8 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin17]
[ruby-core:86954]

Description

I'm using custom IO-like objects that implement #read as the first argument to IO.copy_stream, and I noticed odd memory behaviour when using String#replace on the output buffer versus String#clear. Here is an example of a "fake IO" object where #read uses String#clear on the output buffer:

GC.disable

require "stringio"

class FakeIO
  def initialize(content)
    @io = StringIO.new(content)
  end

  def read(length, outbuf)
    chunk = @io.read(length)

    if chunk
      outbuf.clear
      outbuf << chunk
      chunk.clear
    else
      outbuf.clear
    end

    outbuf unless outbuf.empty?
  end
end

io = FakeIO.new("a" * 50*1024*1024) # 50MB

IO.copy_stream(io, File::NULL)

system "top -pid #{Process.pid}"

This program outputs memory usage of 50MB at the end, as expected – 50MB was loaded into memory at the beginning and any new strings are deallocated. However, if I modify the #read implementation to use String#replace instead of String#clear:

  def read(length, outbuf)
    chunk = @io.read(length)

    if chunk
      outbuf.replace chunk
      chunk.clear
    else
      outbuf.clear
    end

    outbuf unless outbuf.empty?
  end

the memory usage has now doubled to 100MB at the end of the program, indicating that some string bytes weren't successfully deallocated. So, it seems that String#replace has different behaviour compared to String#clear + String#<<.

I was only able to reproduce this with IO.copy_stream, the following program shows 50MB memory usage, regardless of whether the String#clear or String#replace approach is used:

GC.disable

buffer = "a" * 50*1024*1024
chunk  = "b" * 50*1024*1024

if ARGV[0] == "clear"
  buffer.clear
  buffer << chunk
else
  buffer.replace chunk
end

chunk.clear

system "top -pid #{Process.pid}"

With this program I also noticed one interesting thing. If I remove chunk.clear, then the "clear" version uses 100MB as expected (because both buffer and chunk strings are 50MB large), but the "replace" version uses only 50MB, which makes it appear that the buffer string doesn't use any memory when in fact it should use 50MB just like the chunk string. I found that odd, and I think it might be a clue to the memory bug with String#replace I experienced when using IO.copy_stream.

Also available in: Atom PDF