Incorrect wrapping of base64 output of Array.pack()
String format directive
m for Array
pack() is documented as:
m | String | base64 encoded string (see RFC 2045, count is width) | | (if count is 0, no line feed are added, see RFC 4648)
While the description of the meaning of count argument is rather limited, it seems it's supposed to mean the maximum length of the line in the output before line break is added. However, that's not what actually happens:
$ ruby -e 'print ["a"*40].pack("m20")' YWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFh YWFhYQ==
In this example, output lines have 24 characters. To have 20 character long output lines,
m15 has to be specified:
$ ruby -e 'print ["a"*40].pack("m15")' YWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYQ==
This is caused by the following in
len = len / 3 * 3;
This looks like a typo / thinko. Base64 encoding produces 4 bytes of output for every 3 bytes of input. Hence to get output line of length N, encoding should process N / 4 * 3 input bytes before inserting line break. The
len argument passed to
encodes() is the number of input bytes to process to generate one output line.
The same applies to UU-encoding (the
u format), with the difference that every line starts with and additional character specifying line length. Hence even with the above fixed,
u20 would produces lines with 21 characters.
Updated by jeremyevans0 (Jeremy Evans) 12 months ago
- Status changed from Open to Feedback
- File pack-m-width-output.patch pack-m-width-output.patch added
- File pack-doc.patch pack-doc.patch added
I agree this is a bug. I am not sure if it is a documentation bug or code bug. The existing documentation for
Array#pack does suggest the count should specify output bytes (
width of the resulting field), while the
m count currently specifies input bytes between each
Attached are two patches, one considering this a documentation bug (which tries to make the documentation more clear), and one considering this a code bug (which fixes the calculation to use output bytes instead of input bytes.
I'm leaning toward considering this a documentation bug, since that is a better choice for backwards compatibility.