Format routines like pack and unpack blindly treat a string as ASCII-encoded, even if they aren't ASCII or ASCII-compatible.
I tried to construct code that was misleading using ASCII-incompatible-encodings but couldn't do it in practice (no ASCII-incompatible encodings have a pack directive ASCII byte that is encoded as a printable character.)
But I could demonstrate at least some strange behaviour:
p ['foo'].pack('u').encoding # => #<Encoding:US-ASCII>
p ['foo'].pack('u'.encode('UTF-32BE')).encoding # => #<Encoding:ASCII-8BIT>
This is because the NUL characters in the second one (which aren't really NUL characters - they're part of the directive characters) explicitly trigger the encoding to change to binary.
There is a warning, but the warning is only for unexpected directives. How about disallowing or warning for non-ascii compatible format strings?
I agree that at the very least the unknown pack directive warning should be made non-verbose (displayed even with $VERBOSE=false, and would make sense as ArgumentError.
Agreed, I think it should be ArgumentError since it's otherwise silently ignoring characters in the pack format string.
A non-verbose warning is better than the current state if ArgumentError is deemed too incompatible.
I think you want to mean "if the string is not ASCII-compatible".
Can you explain why?
I think a string is only a valid pack format string if it is ascii_only? - if it isn't ascii_only? then there is a silent warning and the output encoding is changed. We're proposing raising an error up front if the string is not ascii_only?.