Project

General

Profile

Bug #9790

Zlib::GzipReader only decompressed the first of concatenated files

Added by quainjn (Jake Quain) over 5 years ago. Updated 8 days ago.

Status:
Assigned
Priority:
Normal
Target version:
-
ruby -v:
2.1.1
[ruby-core:62257]

Description

There is a similar old issue in Node that I came across that perfectly describes the situation in ruby:

https://github.com/joyent/node/issues/6032

In ruby given the following setup:

echo "1" > 1.txt
echo "2" > 2.txt
gzip 1.txt
gzip 2.txt
cat 1.txt.gz 2.txt.gz > 3.txt.gz

Calling:

Zlib::GzipReader.open("3.txt.gz") do |gz|
  print gz.read
end

would just print:

1

Files

zlib-gzreader-each_file-9790.patch (3.47 KB) zlib-gzreader-each_file-9790.patch jeremyevans0 (Jeremy Evans), 11/27/2019 03:35 PM

Related issues

Related to Ruby master - Bug #14804: GzipReader cannot read Freebase dump (but gzcat/zless can)OpenActions
Has duplicate Ruby master - Bug #11180: Missing lines with Zlib::GzipReaderOpenActions

History

Updated by drbrain (Eric Hodel) over 5 years ago

  • Category set to ext
  • Status changed from Open to Assigned
  • Assignee set to drbrain (Eric Hodel)
  • Target version set to 2.2.0

Updated by akostadinov (Aleksandar Kostadinov) almost 5 years ago

Because gzip format allows multiple entries with filename I'd suggest to support a method like Java's ZipInputStream getNextEntry() [1]. This way programmer can choose to read everything as one chunk of data or multiple chunks each with its own name. This would allow storing and then retrieving multiple files in/from one gz.

On the other hand the command line gzip utility only supports reading the whole thing as one. So a convenience method to read everything in one go, would also be nice.

[1] http://docs.oracle.com/javase/7/docs/api/java/util/zip/ZipInputStream.html

Updated by duerst (Martin Dürst) almost 5 years ago

Aleksandar Kostadinov wrote:

Because gzip format allows multiple entries with filename I'd suggest to support a method like Java's ZipInputStream getNextEntry() [1]. This way programmer can choose to read everything as one chunk of data or multiple chunks each with its own name. This would allow storing and then retrieving multiple files in/from one gz.

Good idea, but it should be more Ruby-like, such as .each_file or so.

Updated by exAspArk (Evgeny Li) over 4 years ago

Hey guys, is there any updates?

I have created a small gem yesterday to make it able to read multiple files https://github.com/exAspArk/multiple_files_gzip_reader

> MultipleFilesGzipReader.open("3.txt.gz") do |gz|
>   puts gz.read
> end

# 1
# 2
# => nil
#5

Updated by nagachika (Tomoyuki Chikanaga) over 4 years ago

  • Has duplicate Bug #11180: Missing lines with Zlib::GzipReader added
#6

Updated by jeremyevans0 (Jeremy Evans) about 1 month ago

  • Related to Bug #14804: GzipReader cannot read Freebase dump (but gzcat/zless can) added

Updated by jeremyevans0 (Jeremy Evans) 8 days ago

Attached is a patch that adds Zlib::GzipReader.each_file will which handle multiple concatenated gzip streams in the same file, similar to common tools that operate on .gz files. Zlib::GzipReader.each_file yields one Zlib::GzipReader instance per gzip stream in the file.

Also available in: Atom PDF