Project

General

Profile

Bug #14745

High memory usage when using String#replace with IO.copy_stream

Added by janko (Janko Marohnić) over 1 year ago. Updated about 2 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin17]
[ruby-core:86954]

Description

I'm using custom IO-like objects that implement #read as the first argument to IO.copy_stream, and I noticed odd memory behaviour when using String#replace on the output buffer versus String#clear. Here is an example of a "fake IO" object where #read uses String#clear on the output buffer:

GC.disable

require "stringio"

class FakeIO
  def initialize(content)
    @io = StringIO.new(content)
  end

  def read(length, outbuf)
    chunk = @io.read(length)

    if chunk
      outbuf.clear
      outbuf << chunk
      chunk.clear
    else
      outbuf.clear
    end

    outbuf unless outbuf.empty?
  end
end

io = FakeIO.new("a" * 50*1024*1024) # 50MB

IO.copy_stream(io, File::NULL)

system "top -pid #{Process.pid}"

This program outputs memory usage of 50MB at the end, as expected – 50MB was loaded into memory at the beginning and any new strings are deallocated. However, if I modify the #read implementation to use String#replace instead of String#clear:

  def read(length, outbuf)
    chunk = @io.read(length)

    if chunk
      outbuf.replace chunk
      chunk.clear
    else
      outbuf.clear
    end

    outbuf unless outbuf.empty?
  end

the memory usage has now doubled to 100MB at the end of the program, indicating that some string bytes weren't successfully deallocated. So, it seems that String#replace has different behaviour compared to String#clear + String#<<.

I was only able to reproduce this with IO.copy_stream, the following program shows 50MB memory usage, regardless of whether the String#clear or String#replace approach is used:

GC.disable

buffer = "a" * 50*1024*1024
chunk  = "b" * 50*1024*1024

if ARGV[0] == "clear"
  buffer.clear
  buffer << chunk
else
  buffer.replace chunk
end

chunk.clear

system "top -pid #{Process.pid}"

With this program I also noticed one interesting thing. If I remove chunk.clear, then the "clear" version uses 100MB as expected (because both buffer and chunk strings are 50MB large), but the "replace" version uses only 50MB, which makes it appear that the buffer string doesn't use any memory when in fact it should use 50MB just like the chunk string. I found that odd, and I think it might be a clue to the memory bug with String#replace I experienced when using IO.copy_stream.

History

Updated by normalperson (Eric Wong) over 1 year ago

janko.marohnic@gmail.com wrote:

the memory usage has now doubled to 100MB at the end of the
program, indicating that some string bytes weren't
successfully deallocated. So, it seems that String#replace has
different behaviour compared to String#clear + String#<<.

Yes, this is an unfortunate side effect because of copy-on-write
semantics of String#replace. If the arg (other_str) for
String#replace is non-frozen, a new frozen string is created with
using the existing malloc-ed pointer. Both the receiver string
and other_str point to that new, shared string.

I was only able to reproduce this with IO.copy_stream

I suspect part of this is because outbuf becomes a long-lived object
with IO.copy_stream (not sure), and the hidden frozen string
becomes long-lived, as well.

So yeah; a combination of well-intentioned optimizations hurt
when combined together.

The other part could be anything using IO#write could create
massive amounts of garbage before 2.5:
https://bugs.ruby-lang.org/issues/13085 And your copy_stream use
hits the IO#write case. Unfortunately, the "fix" for [Bug
#13085] won't work effectively with shared strings, because we
can't free+recycle the a string which wasn't created internally
by the VM.

Updated by normalperson (Eric Wong) over 1 year ago

janko.marohnic@gmail.com wrote:

  def read(length, outbuf)
    chunk = @io.read(length)

    if chunk
      outbuf.replace chunk
      chunk.clear

Elaborating on my previous comment, chunk.clear is a no-op in
this case because the shared frozen string is still in play and
now used by outbuf.

else
  outbuf.clear

Likewise, the final #clear is also no-op because the
hidden+frozen string is still floating around waiting to be
GC-ed.

end

outbuf unless outbuf.empty?

end

I don't think you need IO.copy_stream or IO#write to trigger, even.

 io = FakeIO.new("a" * 50*1024*1024) # 50MB
 buf = ''.b
 while z = io.read(16384, buf)
 end

I tried playing with shortening object lifetimes, but I guess GC
is not aggressive enough by default to matter.

Updated by normalperson (Eric Wong) over 1 year ago

janko.marohnic@gmail.com wrote:

def initialize(content)
@io = StringIO.new(content)
end

def read(length, outbuf)
chunk = @io.read(length)

Finally, I always assumed your example is a contrived case and
you're dealing with an interface somewhere (not StringIO) which
doesn't accept a destination buffer for .read.

Fwiw, I encountered a similar problem with Net::HTTP and
Zlib::GzipReader earlier this year and made r61665 to ensure
outbuf got used all the way down the call chain.

Sometimes I guess modifying/fixing interfaces is the only way to go.
(unless we can do escape analysis, but I'm not sure how feasible
it is with a dynamic language)

Updated by janko (Janko Marohnić) over 1 year ago

Yes, this is an unfortunate side effect because of copy-on-write
semantics of String#replace. If the arg (other_str) for
String#replace is non-frozen, a new frozen string is created with
using the existing malloc-ed pointer. Both the receiver string
and other_str point to that new, shared string.

So yeah; a combination of well-intentioned optimizations hurt
when combined together.

That makes sense, thanks for the explanation.

The other part could be anything using IO#write could create
massive amounts of garbage before 2.5

This is on Ruby 2.5.1, so I'm guessing it doesn't apply here.

Finally, I always assumed your example is a contrived case and
you're dealing with an interface somewhere (not StringIO) which
doesn't accept a destination buffer for .read.

The example was simplified for reproducing purposes. The place where I discovered this was in https://github.com/rocketjob/symmetric-encryption/pull/98 (I eventually managed to figure out String#replace was causing the high memory usage, so I switched to String#clear).

In short, the SymmetricEncryption::Reader object wraps an IO object with encrypted content, and when calling #read it reads data from the underlying IO object, decrypts it and returns the decrypted data. So, it's not patching the lack of outbuf argument (because the underlying IO object should accept the outbuf argument), rather it provides an IO#read interface over incrementally decrypting IO object content.

If this is necessary behaviour when having the copy-on-write optimization, feel free to close this ticket then.

Updated by normalperson (Eric Wong) over 1 year ago

janko.marohnic@gmail.com wrote:

I wrote:

Finally, I always assumed your example is a contrived case and
you're dealing with an interface somewhere (not StringIO) which
doesn't accept a destination buffer for .read.

The example was simplified for reproducing purposes. The place
where I discovered this was in
https://github.com/rocketjob/symmetric-encryption/pull/98 (I
eventually managed to figure out String#replace was causing
the high memory usage, so I switched to String#clear).

In short, the SymmetricEncryption::Reader object wraps an IO
object with encrypted content, and when calling #read it
reads data from the underlying IO object, decrypts it and
returns the decrypted data. So, it's not patching the lack of
outbuf argument (because the underlying IO object should
accept the outbuf argument), rather it provides an IO#read
interface over incrementally decrypting IO object content.

I may be misreading(*), but it looks like @stream_cipher.update
can take a second destination arg (like IO#read and friends) and
maybe that helps... (that appears to be OpenSSL::Cipher#update)

If that's not usable somehow, I've sometimes wondered if adding a
String#exchange! method might help:

foo = "hello............................"
bar = "world............................"
foo.exchange!(bar)
p foo # => "world............................"
p bar # => "hello............................"

In your case, you could still use String#clear with this:

dst.clear
dst.exchange!(tmp)
# tmp is now '' because dst was cleared before

For non-embed strings, it would swap the pointers and be zero-copy,
but won't suffer the new-frozen-string-behind-your-back problem
of String#replace.

(*) I'm looking at your commit 8d41efacff26f3357016d6b611bee174802fba66
after git clone-ing (since I don't do JS-infested web UIs)

Updated by janko (Janko Marohnić) over 1 year ago

it looks like @stream_cipher.update can take a second destination arg (like IO#read and friends) and maybe that helps... (that appears to be OpenSSL::Cipher#update)

Cool, I didn't know that! Thanks, this allowed me to greatly simplify the #read implementation, so now it's both simpler and has lower memory usage than the original implementation.

I've sometimes wondered if adding a String#exchange! method might help

That sounds very useful to have in general. In my particular scenario (the PR) I wouldn't have use of it after all, this String#clear vs String#replace thing was just something I encountered during the optimization, the current version doesn't need String#replace. But I can imagine String#exchange! being another useful tool for diligent string management.

Feel free to close the ticket, as this seems to be a necessary consequence of the copy-on-write optimization, and I'm not having problems with it anymore.

Updated by normalperson (Eric Wong) over 1 year ago

Good to know the destination buffer to #update helped :>

But I can imagine String#exchange! being a useful tool for
diligent string management.

Maybe... As it stands, destination buffers for #read and #update
are underutilized and I'm glad to see them being used more.
Thus I'm not hopeful of #exchange! being widely adopted...
When used, destination buffers would still offer better performance,
anyways, as there's no extra object slot used.

#8

Updated by jeremyevans0 (Jeremy Evans) about 2 months ago

  • Status changed from Open to Closed

Also available in: Atom PDF