Project

General

Profile

Actions

Bug #16672

closed

net/http leaves original content-length header intact after inflating response

Added by jmreid (Justin Reid) almost 5 years ago. Updated over 2 years ago.

Status:
Closed
Target version:
-
ruby -v:
ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-darwin19]
[ruby-core:97359]

Description

When using net/http to make a request to a resource, the default request headers are the following (when you have ZLIB available):
"accept-encoding"=>["gzip;q=1.0,deflate;q=0.6,identity;q=0.3"], "accept"=>["*/*"], "user-agent"=>["Ruby"]

This means that a resource will return a gzipped response if it can provide it. Take this URL for example:
https://storage.googleapis.com/justin-reid-test/test.js

This is a JS file that has a content-length of 2733 when gzipped and 9995 when inflated:

curl "https://storage.googleapis.com/justin-reid-test/test.js" -H "accept-encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3" | wc -c
2733

curl "https://storage.googleapis.com/justin-reid-test/test.js" | wc -c
9995

When making a simple request for this asset using net/http:

uri = URI('https://storage.googleapis.com/justin-reid-test/test.js')
res = Net::HTTP.get_response(uri)

Ruby will (https://github.com/ruby/ruby/blob/f08cd708b11dd5b293986b92bb5e227731665b36/lib/net/http/response.rb#L264-L278):

  • Delete the content-encoding header
  • inflate the body
  • return the inflated body

The issue here is that Ruby also leaves the content-length header set to the original request's value:

require 'net/http'

uri = URI('https://storage.googleapis.com/justin-reid-test/test.js')
res = Net::HTTP.get_response(uri)

puts "Fetching: https://storage.googleapis.com/justin-reid-test/test.js"
puts "Body size using String#bytesize: #{res.body.to_s.bytesize}"
puts "Content-Length response header: #{res.content_length}"

Results in:

Fetching: https://storage.googleapis.com/justin-reid-test/test.js
Body size using String#bytesize: 9995
Content-Length response header: 2733

This means that an incorrect content-length header is passed back when net/http makes requests for gzip objects and inflates them.

This issue was noticed when Rack changed their behaviour in how they compute content-length. They used to compute the content-length for each body, but that changed in 2.0.8:
https://github.com/rack/rack/commit/8c62821f4a464858a6b6ca3c3966ec308d2bb53e#diff-10b933d2c1fdc82ceecade456c64e1c2L92
https://github.com/rack/rack/issues/1472#issuecomment-574362342

Using Rack::ContentLength is now the method they prefer if you need to compute the content-length. However, Rack::ContentLength will not try to re-compute the value if that header already exists:
https://github.com/rack/rack/blob/6196377654b7ff7ce7abaecea62bb285d77d53aa/lib/rack/content_length.rb#L21

Should Ruby:

Actions #1

Updated by jmreid (Justin Reid) almost 5 years ago

  • Description updated (diff)

Updated by ioquatix (Samuel Williams) almost 5 years ago

Are you using res.content_length? Why can't you just use res.body.to_s.bytesize?

Updated by jmreid (Justin Reid) almost 5 years ago

ioquatix (Samuel Williams) wrote in #note-2:

Are you using res.content_length? Why can't you just use res.body.to_s.bytesize?

res.content_length is just the content-length response header from the net/http request:

uri = URI('https://storage.googleapis.com/justin-reid-test/test.js')
res = Net::HTTP.get_response(uri)

res.to_hash
{"x-guploader-uploadid"=>["AEnB2UrJSkEHDPO5iCRug2Qp0bzweRd8pd05d5eRq0fIUtRVZuibaGfQZhLHBN58g0ZW1N-qMcsUiZACutpDoObHTijYs3_DrUNOy8SH9HA1hhTW0RtIbco"],
 "date"=>["Thu, 05 Mar 2020 01:35:08 GMT"],
 "expires"=>["Fri, 05 Mar 2021 01:35:08 GMT"],
 "last-modified"=>["Wed, 04 Mar 2020 20:53:40 GMT"],
 "etag"=>["\"e5994ce974ae1b8e426810037812e7d5\""],
 "x-goog-generation"=>["1583355220705723"],
 "x-goog-metageneration"=>["2"],
 "x-goog-stored-content-encoding"=>["gzip"],
 "x-goog-stored-content-length"=>["2733"],
 "content-type"=>["application/javascript; charset=utf-8"],
 "x-goog-hash"=>["crc32c=VCx7Dg==", "md5=5ZlM6XSuG45CaBADeBLn1Q=="],
 "x-goog-storage-class"=>["STANDARD"],
 "accept-ranges"=>["bytes"],
 "vary"=>["Accept-Encoding"],
 "content-length"=>["2733"],
 "server"=>["UploadServer"],
 "cache-control"=>["public, max-age=31536000, immutable"],
 "age"=>["6"],
 "alt-svc"=>["quic=\":443\"; ma=2592000; v=\"46,43\",h3-Q050=\":443\"; ma=2592000,h3-Q049=\":443\"; ma=2592000,h3-Q048=\":443\"; ma=2592000,h3-Q046=\":443\"; ma=2592000,h3-Q043=\":443\"; ma=2592000"]}

The issue here is that this "content-length"=>["2733"] value is 2733. The res.body at this point is 9995, so content-length needs to match that or not exist. Otherwise it causes browsers to only partially download the file.

Updated by jeremyevans0 (Jeremy Evans) almost 5 years ago

jmreid (Justin Reid) wrote in #note-3:

The issue here is that this "content-length"=>["2733"] value is 2733. The res.body at this point is 9995, so content-length needs to match that or not exist. Otherwise it causes browsers to only partially download the file.

I don't think I agree with that logic. content_length is documented to return the value of the header, not the size of the decompressed body. So the method appears to be operating exactly as documented.

I'm also not sure why you are mentioning browsers in this context. If I open up Chrome and go into Development Tools, when I request https://storage.googleapis.com/justin-reid-test/test.js and look at the response headers, I see 2733 for Content-Length, not 9995.

The only possible interpretation I can think of where browsers come into play is if you were writing a server, using net/http inside a request handler, and taking the response status, headers, and body returned by net/http and returning them directly as the response. In which case the proper fix would be not decompressing the body automatically:

res = Net::HTTP.get_response(uri){|res| res.decode_content = false}
Actions #5

Updated by jmreid (Justin Reid) almost 5 years ago

So the method appears to be operating exactly as documented.

I'm not saying that method isn't working as intended. It's working as intended and I'm just using it to show the size of the header for the request net/http made. My comment saying that content-length needs to match 9995 meant: The Content-Length header that net/http returns needs to actually match the content length of the body for that request.

If I open up Chrome and go into Development Tools, when I request https://storage.googleapis.com/justin-reid-test/test.js and look at the response headers, I see 2733 for Content-Length, not 9995.

This is because Chrome reports the size of the file over the network and not the decompressed size in dev tools. If you find the resource in the network panel and hover over the "size" column, you'll see that "resource size" is the uncompressed size.

It's best to use curl to test these URLs.

My core concern is that Ruby shouldn't be leaving an incorrect header value around after it mutates the response. I can't see a reason why leaving Content-Length: 2733 makes sense when the actual body is 9995.

The only possible interpretation I can think of where browsers come into play is if you were writing a server, using net/http inside a request handler, and taking the response status, headers, and body returned by net/http and returning them directly as the response. In which case the proper fix would be not decompressing the body automatically.

There are valid reasons to let net/http inflate the body. Just disabling gzip inflation because Ruby passes incorrect values in a mutated response doesn't seem like a proper fix.

Updated by jeremyevans0 (Jeremy Evans) almost 5 years ago

jmreid (Justin Reid) wrote in #note-5:

So the method appears to be operating exactly as documented.

I'm not saying that method isn't working as intended. It's working as intended and I'm just using it to show the size of the header for the request net/http made. My comment saying that content-length needs to match 9995 meant: The Content-Length header that net/http returns needs to actually match the content length of the body for that request.

I think I understand your reasoning better now. net/http is currently deleting the Content-Encoding header, so deleting or modifying the Content-Length header makes sense in that light. Modifying the Content-Length header is tricky because you do not know the decoded size until after decoding. Anyway, the current implementation, by deleting Content-Encoding and not changing Content-Length, is inconsistent.

I don't think net/http should be modifying the response headers. I think the response headers should remain exactly as sent by the server.

The deletion of the Content-Encoding header has been present since the initial support was merged in ff7f462bf49a1199b1657de6a73a0dc91deae1fa back in 2007. My guess as to why is so that existing callers that decoded bodies based on the value of the Content-Encoding header would not attempt to decode an already decoded body after the support was merged. And that does seem a reasonable way of keeping backwards compatibility while adding transparent decoding of bodies. I'm not sure how much code still exists that uses net/http and manually decodes bodies based on the Content-Encoding header, but maybe we can consider whether to remove the automatic deletion of the Content-Encoding header when decoding, possibly indicating whether decoding happened using a separate method.

Updated by jmreid (Justin Reid) almost 5 years ago

jeremyevans0 (Jeremy Evans) wrote in #note-6:

I don't think net/http should be modifying the response headers. I think the response headers should remain exactly as sent by the server.

I do agree in general. However, in this case net/http is actually mutating the response itself. To me that feels like a time when modifying the headers should happen as well to match those changes.

Deleting Content-Encoding makes sense because the encoding is no longer gzip. Same logic should be applied to Content-Length, in my opinion.

I do understand that net/http shouldn't be modifying headers for just any reason, but again, changing the actual response contents necessitates changing these two headers.

maybe we can consider whether to remove the automatic deletion of the Content-Encoding header when decoding, possibly indicating whether decoding happened using a separate method.

This is definitely another option, but feels like it might set people up for more misunderstanding. If net/http inflates the body but leaves Content-Encoding: gzip, that opens the door for even more confusion in my opinion.

Modifying the Content-Length header is tricky because you do not know the decoded size until after decoding

This is true, but Zlib::Inflate does have a total_out method that could be used (just a hacky patch I made while testing this locally):
https://github.com/ruby/ruby/compare/master...jmreid:content-length

Updated by jeremyevans0 (Jeremy Evans) almost 5 years ago

jmreid (Justin Reid) wrote in #note-7:

Modifying the Content-Length header is tricky because you do not know the decoded size until after decoding

This is true, but Zlib::Inflate does have a total_out method that could be used (just a hacky patch I made while testing this locally):
https://github.com/ruby/ruby/compare/master...jmreid:content-length

total_out doesn't give you the full size of the output until after the input is fully processed. So if there is an exception or other early exit from the block passed to inflater before the body is fully inflated, you can end up with an incorrect result. Also, the Content-Length header inside the block would still be wrong. You could remove it before the block and only set it on success after the block, that's probably the best way to handle it if you want to modify it.

s = Zlib::Deflate.deflate('a'* 100)
s.bytesize # => 12
io = Zlib::Inflate.new
io.total_out # => 0
io << s.byteslice(0,6)
io.total_out # => 2
io << s.byteslice(6,7)
io.total_out # => 100

Updated by jmreid (Justin Reid) almost 5 years ago

jeremyevans0 (Jeremy Evans) wrote in #note-8:

total_out doesn't give you the full size of the output until after the input is fully processed. So if there is an exception or other early exit from the block passed to inflater before the body is fully inflated, you can end up with an incorrect result. Also, the Content-Length header inside the block would still be wrong. You could remove it before the block and only set it on success after the block, that's probably the best way to handle it if you want to modify it.

Ah, understood!

Updated by jeremyevans0 (Jeremy Evans) almost 4 years ago

  • Status changed from Open to Assigned
  • Assignee set to naruse (Yui NARUSE)

After some time to think about this, I agree with @jmreid (Justin Reid) that updating the Content-Length is the simplest way to address this issue. I've added a pull request that implements this change: https://github.com/ruby/net-http/pull/16

Actions #11

Updated by Eregon (Benoit Daloze) almost 4 years ago

Looks good to me.
I think it's important to still remove the Content-Encoding headers, otherwise a caller of net/http might inflate again (notably RubyGems had code related to that until recently).

Actions #12

Updated by jeremyevans (Jeremy Evans) over 2 years ago

  • Status changed from Assigned to Closed

Applied in changeset git|58adb1636be32fb95173f01e448673dbae4511b0.


[ruby/net-http] Update the content-length heading when decoding bodies

Previously, the content-encoding header was removed and the body
was modified, but the content-length header was not modified,
resulting in the content-length header not matching the body
length.

Fixes [Bug #16672]

https://github.com/ruby/net-http/commit/a7cb30124c

Updated by MSP-Greg (Greg L) over 2 years ago

I posted a comment in the commit, see https://github.com/ruby/ruby/commit/58adb1636be32fb95173f01e448673dbae4511b0#commitcomment-70298450

Noticed failures in another repo this morning, happened in Rubygems/Bundler operations, and only with Ruby master. Tried some Net::HTTP things locally on both Windows ucrt and WSL2/Ubuntu, and they failed. Simple repo:

ruby -ropen-uri -e "puts URI.parse('http://raw.githubusercontent.com/ruby/ruby/master/version.h').read"

Reverting the change, and it works as expected... Haven't had time to review the comments here.

Updated by st0012 (Stan Lo) over 2 years ago

@MSP-Greg (Greg L) I've also noticed this today and reported it in the net/http repo: https://github.com/ruby/net-http/issues/49
And also proposed a fix for it: https://github.com/ruby/net-http/pull/50

Updated by jeremyevans0 (Jeremy Evans) over 2 years ago

  • Status changed from Closed to Open

I reverted the commit for now. I'll try testing the proposed fix in the coming week.

Updated by st0012 (Stan Lo) over 2 years ago

That's great, thank you!

Actions #17

Updated by jeremyevans (Jeremy Evans) over 2 years ago

  • Status changed from Open to Closed

Applied in changeset git|0579486f154e80d17521494003dcd2499ef74688.


[ruby/net-http] Update the content-length heading when decoding bodies

Previously, the content-encoding header was removed and the body
was modified, but the content-length header was not modified,
resulting in the content-length header not matching the body
length.

Don't delete content-length before yielding inflate body, as that
causes a switch to read the entire body instead of reading in
chunks.

Fixes [Bug #16672]

https://github.com/ruby/net-http/commit/58284e9710

Co-authored-by: st0012

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0