Bug #13926
closedNon UTF response headers raise an Argument error since 2.4.2p198
Description
When setting headers using Net::HTTPHeader#add_field
or Net::HTTPHeader#[]=
in v2.4.2, an ArgumentError (invalid byte sequence in UTF-8)
is raised.
In 2.4.1, this behaviour didn't exist and it looks like it was introduced in one of the revisions associated with https://bugs.ruby-lang.org/issues/13852, where the header value is matched against a regular expression to prevent newlines.
Previously, Net::HTTP
would accept non-UTF8 header values and just return them as invalid UTF8 strings. It was then on the user of Net::HTTP
to handle this. With this change, there's now no way for the user to handle the case where they receive non-UTF8 header values as Net::HTTP
raises an error.
RFC2616 allowed an HTTP header field content to be made up of any non-whitespace octets. Because of this RFC7230 makes an allowance for all characters in the ISO-8859-1 charset (both lower and extended ASCII characters).
Specifically, this section of RFC7230 suggests that although ideally response header values would be compatible with UTF-8, we can't assume this to be the case.
Historically, HTTP has allowed field content with text in the
ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
through use of [RFC2047] encoding. In practice, most HTTP header
field values use only a subset of the US-ASCII charset [USASCII].
Newly defined header fields SHOULD limit their field values to
US-ASCII octets. A recipient SHOULD treat other octets in field
content (obs-text) as opaque data.
Not entirely sure where to go from here or what the fix is but given this is a behaviour change, it'd be great to hear your thoughts.
Files