Bug #10132
closedunpack() ignores default encoding when generating strings, always uses ASCII-8BIT
Description
New strings are generated in the default encoding:
...but not if they're generated by unpack:
irb> "ünicode".split.pack('M*').unpack('M*').first
=> "\xC3\xBCnicode"
irb> "ünicode".split.pack('M*').unpack('M*').first.encoding.name
=> "ASCII-8BIT"
Workaround is to force the encoding on every string unpack generates:
Updated by meta (mathew murphy) almost 12 years ago
Updated by nobu (Nobuyoshi Nakada) almost 12 years ago
- Status changed from Open to Rejected
pack("M*") (and pack("C*")) are for binary data primarily.
Updated by meta (mathew murphy) almost 12 years ago
The Ruby documentation says:
M | String | quoted printable, MIME encoding (see RFC2045)
And RFC 2045 section 6.7 says:
The Quoted-Printable encoding is intended to represent data that largely consists of octets that correspond to printable characters in the US-ASCII character set.
So the Ruby documentation itself says that it's a string not binary data, and it refers to an RFC that says the encoding is intended for textual (printable) characters.
Perhaps you were thinking of base64? I don't think I've ever seen quoted-printable used for binary data.
Updated by meta (mathew murphy) almost 12 years ago
Now that I read the documentation on encodings more carefully, I think the real problem is more fundamental: __ENCODING__ doesn't determine the encoding of all created strings; it only affects strings created using string constants in the source code.
So:
So Ruby is actually behaving as documented, it's just that I find the behavior surprising. Maybe I'm alone in that, though.
Any chance we could have a way to specify a default encoding for all created strings?