Project

General

Profile

Actions

Bug #19108

closed

Format routines like pack blindly treat a string as ASCII-encoded

Added by chrisseaton (Chris Seaton) 3 months ago. Updated about 2 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:110636]

Description

Format routines like pack and unpack blindly treat a string as ASCII-encoded, even if they aren't ASCII or ASCII-compatible.

I tried to construct code that was misleading using ASCII-incompatible-encodings but couldn't do it in practice (no ASCII-incompatible encodings have a pack directive ASCII byte that is encoded as a printable character.)

But I could demonstrate at least some strange behaviour:

p ['foo'].pack('u').encoding # => #<Encoding:US-ASCII>
p ['foo'].pack('u'.encode('UTF-32BE')).encoding # => #<Encoding:ASCII-8BIT>

This is because the NUL characters in the second one (which aren't really NUL characters - they're part of the directive characters) explicitly trigger the encoding to change to binary.

There is a warning, but the warning is only for unexpected directives. How about disallowing or warning for non-ascii compatible format strings?


Related issues 1 (0 open1 closed)

Related to Ruby master - Bug #19150: pack/unpack silently ignores unknown directivesClosedActions

Updated by chrisseaton (Chris Seaton) 3 months ago

Possibly we should raise an exception if the string is not ascii_only?

Updated by byroot (Jean Boussier) 3 months ago

I agree that at the very least the unknown pack directive warning should be made non-verbose (displayed even with $VERBOSE=false, and would make sense as ArgumentError.

Updated by Eregon (Benoit Daloze) 2 months ago

Agreed, I think it should be ArgumentError since it's otherwise silently ignoring characters in the pack format string.
A non-verbose warning is better than the current state if ArgumentError is deemed too incompatible.

Here is real case where the silent warning caused confusion for [1].pack('<L'): https://github.com/oracle/truffleruby/issues/2791

EDIT: extracted to #19150

Updated by nobu (Nobuyoshi Nakada) 2 months ago

chrisseaton (Chris Seaton) wrote in #note-1:

Possibly we should raise an exception if the string is not ascii_only?

I think you want to mean "if the string is not ASCII-compatible".

https://github.com/ruby/ruby/pull/6785

Updated by chrisseaton (Chris Seaton) 2 months ago

I think you want to mean "if the string is not ASCII-compatible".

Can you explain why?

I think a string is only a valid pack format string if it is ascii_only? - if it isn't ascii_only? then there is a silent warning and the output encoding is changed. We're proposing raising an error up front if the string is not ascii_only?.

Updated by alanwu (Alan Wu) 2 months ago

Checking ascii_only? would reject non-ascii comments which are fine:

p [2, 89].pack(<<~PACK)
  C # 🚗
  c
PACK
p [2, 89].pack('Cc')
# Same output
Actions #7

Updated by Eregon (Benoit Daloze) 2 months ago

  • Related to Bug #19150: pack/unpack silently ignores unknown directives added

Updated by matz (Yukihiro Matsumoto) about 2 months ago

Template strings should be ASCII compatible, exceptions otherwise.

Matz.

Actions #9

Updated by nobu (Nobuyoshi Nakada) about 2 months ago

  • Status changed from Open to Closed

Applied in changeset git|9869bd1d612b489df806cf95bcb56965a02424e0.


[Bug #19108] Check for the encoding of pack/unpack format

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0