Bug #20424
closedZLib::GZipReader always double allocates strings when passed outbuf, significantly increasing memory usage
Description
In trying to improve the memory performance during the install of rubygems, we previously found a bug in eof?. Further investigation into the memory usage during the fix for this bug found wasteful allocating of strings in readpartial and read.
In ZLib, when reading with readpartial or read, a new string is always created for the bytes read from the buffer.
The current approach allocates a string no matter if there is an outbuf passed.
# vastly simplified psuedo implementation
def readpartial(len, dst=nil)
if (buffer.empty?)
buffer = gzipfile.readpartial(len, dst) # adds inflated bytes into dst if passed
end
dst = allocate_new_string(len) # make a new string for the destination
dst << buffer.read(len) # read from the buffer into the destination
end
The result is that readpartial always allocated at least double the bytes necessary.
Samuel Giddins submitted, and I have tested and reviewed, a pull request, zlib#61 that resolves the issue and vastly improves the memory usage and increases the speed of GZipReader by avoiding excess memcpy and rb_str_new calls that were wasted.
This PR also adds an outbuf to GZipReader#read for improvement memory management, very similar to IO#read
We appreciate your attention to this performance improvement. We believe it will further improve the performance of rubygems gem installs.
Updated by martinemde (Martin Emde) over 1 year ago
zlib #61 was merged. It seems like we can consider this ticket complete.
Updated by hsbt (Hiroshi SHIBATA) over 1 year ago
- Status changed from Open to Closed