Bug #19108
closed
Format routines like pack blindly treat a string as ASCII-encoded
Added by chrisseaton (Chris Seaton) about 2 years ago.
Updated about 2 years ago.
Description
Format routines like pack and unpack blindly treat a string as ASCII-encoded, even if they aren't ASCII or ASCII-compatible.
I tried to construct code that was misleading using ASCII-incompatible-encodings but couldn't do it in practice (no ASCII-incompatible encodings have a pack directive ASCII byte that is encoded as a printable character.)
But I could demonstrate at least some strange behaviour:
p ['foo'].pack('u').encoding # => #<Encoding:US-ASCII>
p ['foo'].pack('u'.encode('UTF-32BE')).encoding # => #<Encoding:ASCII-8BIT>
This is because the NUL characters in the second one (which aren't really NUL characters - they're part of the directive characters) explicitly trigger the encoding to change to binary.
There is a warning, but the warning is only for unexpected directives. How about disallowing or warning for non-ascii compatible format strings?
Possibly we should raise an exception if the string is not ascii_only?
I agree that at the very least the unknown pack directive
warning should be made non-verbose (displayed even with $VERBOSE=false
, and would make sense as ArgumentError
.
Agreed, I think it should be ArgumentError since it's otherwise silently ignoring characters in the pack format string.
A non-verbose warning is better than the current state if ArgumentError is deemed too incompatible.
Here is real case where the silent warning caused confusion for [1].pack('<L')
: https://github.com/oracle/truffleruby/issues/2791
EDIT: extracted to #19150
I think you want to mean "if the string is not ASCII-compatible".
Can you explain why?
I think a string is only a valid pack format string if it is ascii_only?
- if it isn't ascii_only?
then there is a silent warning and the output encoding is changed. We're proposing raising an error up front if the string is not ascii_only?
.
Checking ascii_only?
would reject non-ascii comments which are fine:
p [2, 89].pack(<<~PACK)
C # 🚗
c
PACK
p [2, 89].pack('Cc')
# Same output
- Related to Bug #19150: pack/unpack silently ignores unknown directives added
Template strings should be ASCII compatible, exceptions otherwise.
Matz.
- Status changed from Open to Closed
Also available in: Atom
PDF
Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0