Bug #20424


ZLib::GZipReader always double allocates strings when passed outbuf, significantly increasing memory usage

Added by martinemde (Martin Emde) about 2 months ago. Updated 15 days ago.

Target version:
ruby -v:
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]


In trying to improve the memory performance during the install of rubygems, we previously found a bug in eof?. Further investigation into the memory usage during the fix for this bug found wasteful allocating of strings in readpartial and read.

In ZLib, when reading with readpartial or read, a new string is always created for the bytes read from the buffer.

The current approach allocates a string no matter if there is an outbuf passed.

# vastly simplified psuedo implementation
def readpartial(len, dst=nil)
  if (buffer.empty?)
    buffer = gzipfile.readpartial(len, dst) # adds inflated bytes into dst if passed
  dst = allocate_new_string(len) # make a new string for the destination
  dst << # read from the buffer into the destination

The result is that readpartial always allocated at least double the bytes necessary.

Samuel Giddins submitted, and I have tested and reviewed, a pull request, zlib#61 that resolves the issue and vastly improves the memory usage and increases the speed of GZipReader by avoiding excess memcpy and rb_str_new calls that were wasted.

This PR also adds an outbuf to GZipReader#read for improvement memory management, very similar to IO#read

We appreciate your attention to this performance improvement. We believe it will further improve the performance of rubygems gem installs.

Updated by martinemde (Martin Emde) 15 days ago

zlib #61 was merged. It seems like we can consider this ticket complete.

Actions #2

Updated by hsbt (Hiroshi SHIBATA) 15 days ago

  • Status changed from Open to Closed

Also available in: Atom PDF