Feature #6612

Add streaming inflate and deflate to Zlib

Added by Eric Hodel about 3 years ago. Updated almost 3 years ago.

[ruby-core:45724]
Status:Closed
Priority:Normal
Assignee:-

Description

=begin
Currently there is no way to control the inflate output size of a deflate stream from ruby. For example, 50MB of "0"s compress to just under 50KB:

ruby -rzlib -e 'p Zlib.deflate("0" * 50_000_000, Zlib::BEST_COMPRESSION).length' #=> 48611

When inflating this, 50MB are allocated which is undesirable.

The attached patch allows Zlib::Inflate#inflate, Zlib::Deflate#deflate, Zlib::ZStream#finish and other methods that end up calling zstream_expand_buffer or zstream_detach_buffer to be called with a block which gives the user more control over the amount of memory allocated in their process. (A fixed maximum chunk size of 16384 bytes is used in this patch.)

The new API looks like:

z = Zlib::Inflate.new
z.inflate deflate_string do |chunk|
# write chunk to output stream
# nil is returned from inflate
end

footer = z.finish
# flush buffer to output stream

Here's a comparison of resource usage:

$ dd if=/dev/zero of=/dev/stdout bs=1m count=1024 | gzip -c > 1G.gz
$ cat test.rb
require 'zlib'

gzipped = File.read '1G.gz'

z = Zlib::Inflate.new Zlib::MAX_WBITS + 32

z.inflate gzipped do |chunk|
# do nothing with chunk
# current ruby will ignore this block
end

z.finish

With existing ruby (trunk 35758) 1GB of memory is allocated:

$ /usr/bin/time -l ruby20 test.rb
3.86 real 3.24 user 0.61 sys
1080475648 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
263860 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
1 voluntary context switches
34 involuntary context switches

With the patch only 37MB are allocated:

$ /usr/bin/time -l ./ruby20 -I .ext/x86_64-darwin11.4.0 test.rb
3.47 real 3.43 user 0.03 sys
36724736 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
8981 page reclaims
0 page faults
0 swaps
0 block input operations
9 block output operations
0 messages sent
0 messages received
0 signals received
1 voluntary context switches
90 involuntary context switches

Some notes about this patch:

zstream_expand_buffer() yields each block of inflate (or deflate) output instead of expanding the buffer (as in non-block output). zstream_expand_buffer_into() is used to reduce duplication as well.

zstream_detach_buffer() yields the buffer and returns nil if a block was given. OBJ_INFECT was moved into zstream_detach_buffer() as well.

A new stream flag ZSTREAM_FLAG_GZFILE is added to prevent a block given to a gzip method that invokes zstream_expand_buffer() or zstream_detach_buffer() from yielding.

To ensure proper handling of ((|z->buf|)) such as resetting the buffer after yielding a chunk, rb_protect is used in zstream_expand_buffer and zstream_run. In zstream_run, the for loop has been extracted to zstream_run_loop.

In zstream_run_loop, Zlib::BufError is no longer raised when flushing (finishing) the stream. This allows the stream to be flushed into one string when the inflate block is interrupted.

=end

zlib.inflate_deflate_chunked.patch Magnifier - Fixed patch (18.1 KB) Eric Hodel, 06/20/2012 03:43 PM

zlib.inflate_deflate_chunked.2.patch Magnifier (16 KB) Eric Hodel, 07/04/2012 05:38 AM

Associated revisions

Revision 36349
Added by Eric Hodel almost 3 years ago

  • ext/zlib/zlib.c: Added streaming support to inflate processing. This allows zlib streams to be processed without huge memory growth. [Feature #6612]
  • NEWS: ditto
  • ext/zlib/zlib.c (zstream_expand_buffer): Uses rb_yield when a block is given for streaming support. Refactored to use zstream_expand_buffer_into to remove duplicate code.
  • ext/zlib/zlib.c (zstream_expand_buffer_protect): Added wrapper function to pass jump state back through GVL-free section to allow zstream clean-up before terminating the ruby call.
  • ext/zlib/zlib.c (zstream_expand_buffer_without_gvl): Acquire GVL to yield processed chunk of output stream.
  • ext/zlib/zlib.c (zstream_detach_buffer): When a block is given, returns Qnil mid-stream and yields the output buffer at the end of the stream.
  • ext/zlib/extconf.rb: Update INCFLAGS to find internal.h for rb_thread_call_with_gvl
  • test/zlib/test_zlib.rb: Updated tests

Revision 36349
Added by Eric Hodel almost 3 years ago

  • ext/zlib/zlib.c: Added streaming support to inflate processing. This allows zlib streams to be processed without huge memory growth. [Feature #6612]
  • NEWS: ditto
  • ext/zlib/zlib.c (zstream_expand_buffer): Uses rb_yield when a block is given for streaming support. Refactored to use zstream_expand_buffer_into to remove duplicate code.
  • ext/zlib/zlib.c (zstream_expand_buffer_protect): Added wrapper function to pass jump state back through GVL-free section to allow zstream clean-up before terminating the ruby call.
  • ext/zlib/zlib.c (zstream_expand_buffer_without_gvl): Acquire GVL to yield processed chunk of output stream.
  • ext/zlib/zlib.c (zstream_detach_buffer): When a block is given, returns Qnil mid-stream and yields the output buffer at the end of the stream.
  • ext/zlib/extconf.rb: Update INCFLAGS to find internal.h for rb_thread_call_with_gvl
  • test/zlib/test_zlib.rb: Updated tests

Revision 36356
Added by Eric Hodel almost 3 years ago

  • ext/zlib/zlib.c: Added streaming support to inflate processing. This allows zlib streams to be processed without huge memory growth. [Feature #6612]
  • NEWS: ditto
  • ext/zlib/zlib.c (zstream_expand_buffer): Uses rb_yield when a block is given for streaming support. Refactored to use zstream_expand_buffer_into to remove duplicate code.
  • ext/zlib/zlib.c (zstream_expand_buffer_protect): Added wrapper function to pass jump state back through GVL-free section to allow zstream clean-up before terminating the ruby call.
  • ext/zlib/zlib.c (zstream_expand_buffer_without_gvl): Acquire GVL to yield processed chunk of output stream.
  • ext/zlib/zlib.c (zstream_detach_buffer): When a block is given, returns Qnil mid-stream and yields the output buffer at the end of the stream.
  • test/zlib/test_zlib.rb: Updated tests

Revision 36356
Added by Eric Hodel almost 3 years ago

  • ext/zlib/zlib.c: Added streaming support to inflate processing. This allows zlib streams to be processed without huge memory growth. [Feature #6612]
  • NEWS: ditto
  • ext/zlib/zlib.c (zstream_expand_buffer): Uses rb_yield when a block is given for streaming support. Refactored to use zstream_expand_buffer_into to remove duplicate code.
  • ext/zlib/zlib.c (zstream_expand_buffer_protect): Added wrapper function to pass jump state back through GVL-free section to allow zstream clean-up before terminating the ruby call.
  • ext/zlib/zlib.c (zstream_expand_buffer_without_gvl): Acquire GVL to yield processed chunk of output stream.
  • ext/zlib/zlib.c (zstream_detach_buffer): When a block is given, returns Qnil mid-stream and yields the output buffer at the end of the stream.
  • test/zlib/test_zlib.rb: Updated tests

History

#1 Updated by Eric Hodel about 3 years ago

  • File deleted (zlib.inflate_deflate_chunked.patch)

#2 Updated by Eric Hodel about 3 years ago

Oops, I uploaded the wrong patch.

#3 Updated by Eric Hodel almost 3 years ago

Updated patch that will apply after #6615

I committed the OBJ_INFECT refactoring separately.

#4 Updated by Nobuyoshi Nakada almost 3 years ago

Don't modify a String object with realloc().
It causes a crash when CALC_EXACT_MALLOC_SIZE is set.

#5 Updated by Eric Hodel almost 3 years ago

It looks like using ruby_xrealloc() will solve this, correct?

#6 Updated by Eric Hodel almost 3 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r36349.
Eric, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • ext/zlib/zlib.c: Added streaming support to inflate processing. This allows zlib streams to be processed without huge memory growth. [Feature #6612]
  • NEWS: ditto
  • ext/zlib/zlib.c (zstream_expand_buffer): Uses rb_yield when a block is given for streaming support. Refactored to use zstream_expand_buffer_into to remove duplicate code.
  • ext/zlib/zlib.c (zstream_expand_buffer_protect): Added wrapper function to pass jump state back through GVL-free section to allow zstream clean-up before terminating the ruby call.
  • ext/zlib/zlib.c (zstream_expand_buffer_without_gvl): Acquire GVL to yield processed chunk of output stream.
  • ext/zlib/zlib.c (zstream_detach_buffer): When a block is given, returns Qnil mid-stream and yields the output buffer at the end of the stream.
  • ext/zlib/extconf.rb: Update INCFLAGS to find internal.h for rb_thread_call_with_gvl
  • test/zlib/test_zlib.rb: Updated tests

Also available in: Atom PDF