Bug #10132: unpack() ignores default encoding when generating strings, always uses ASCII-8BIT - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #10132

closed

unpack() ignores default encoding when generating strings, always uses ASCII-8BIT

Bug #10132: unpack() ignores default encoding when generating strings, always uses ASCII-8BIT

Added by meta (mathew murphy) almost 12 years ago. Updated almost 12 years ago.

Status:

Rejected

Assignee:

Target version:

ruby -v:

ruby 2.1.1p76 (2014-02-24 revision 45161) [x86_64-linux]

Backport:

2.0.0: UNKNOWN, 2.1: UNKNOWN

[ruby-core:64359]

Description

New strings are generated in the default encoding:

irb> __ENCODING__.name
=> "UTF-8"
irb> "ünicode".encoding.name
=> "UTF-8"

...but not if they're generated by unpack:

irb> "ünicode".split.pack('M*').unpack('M*').first
=> "\xC3\xBCnicode"
irb> "ünicode".split.pack('M*').unpack('M*').first.encoding.name
=> "ASCII-8BIT"

Workaround is to force the encoding on every string unpack generates:

irb> "ünicode".split.pack('M*').unpack('M*').first.force_encoding(__ENCODING__.name)
=> "ünicode"

Updated by meta (mathew murphy) almost 12 years ago Actions
Copy link
#1 [ruby-core:64360]

In case there's confusion because of the strange splits in my examples:

["ünicode"].pack('M*').unpack('M*').first.encoding.name
=> "ASCII-8BIT"

Updated by nobu (Nobuyoshi Nakada) almost 12 years ago Actions
Copy link
#2 [ruby-core:64368]

Status changed from Open to Rejected

pack("M*") (and pack("C*")) are for binary data primarily.

Updated by meta (mathew murphy) almost 12 years ago Actions
Copy link
#3 [ruby-core:64404]

The Ruby documentation says:

M | String | quoted printable, MIME encoding (see RFC2045)

And RFC 2045 section 6.7 says:

The Quoted-Printable encoding is intended to represent data that largely consists of octets that correspond to printable characters in the US-ASCII character set.

So the Ruby documentation itself says that it's a string not binary data, and it refers to an RFC that says the encoding is intended for textual (printable) characters.

Perhaps you were thinking of base64? I don't think I've ever seen quoted-printable used for binary data.

Updated by meta (mathew murphy) almost 12 years ago Actions
Copy link
#4 [ruby-core:64405]

Now that I read the documentation on encodings more carefully, I think the real problem is more fundamental: __ENCODING__ doesn't determine the encoding of all created strings; it only affects strings created using string constants in the source code.

String.new.encoding => #<Encoding:ASCII-8BIT>
"".encoding         => #<Encoding:UTF-8>

So:

> String.new == ""
=> true
> String.new.encoding == "".encoding
=> false

So Ruby is actually behaving as documented, it's just that I find the behavior surprising. Maybe I'm alone in that, though.

Any chance we could have a way to specify a default encoding for all created strings?

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Bug #10132

unpack() ignores default encoding when generating strings, always uses ASCII-8BIT

Updated by meta (mathew murphy) almost 12 years ago Actions
Copy link
#1 [ruby-core:64360]

Updated by nobu (Nobuyoshi Nakada) almost 12 years ago Actions
Copy link
#2 [ruby-core:64368]

Updated by meta (mathew murphy) almost 12 years ago Actions
Copy link
#3 [ruby-core:64404]

Updated by meta (mathew murphy) almost 12 years ago Actions
Copy link
#4 [ruby-core:64405]

Project

General

Profile

Ruby

Custom queries

Bug #10132

unpack() ignores default encoding when generating strings, always uses ASCII-8BIT

Updated by meta (mathew murphy) almost 12 years ago ActionsCopy link #1 [ruby-core:64360]

Updated by nobu (Nobuyoshi Nakada) almost 12 years ago ActionsCopy link #2 [ruby-core:64368]

Updated by meta (mathew murphy) almost 12 years ago ActionsCopy link #3 [ruby-core:64404]

Updated by meta (mathew murphy) almost 12 years ago ActionsCopy link #4 [ruby-core:64405]

Updated by meta (mathew murphy) almost 12 years ago Actions
Copy link
#1 [ruby-core:64360]

Updated by nobu (Nobuyoshi Nakada) almost 12 years ago Actions
Copy link
#2 [ruby-core:64368]

Updated by meta (mathew murphy) almost 12 years ago Actions
Copy link
#3 [ruby-core:64404]

Updated by meta (mathew murphy) almost 12 years ago Actions
Copy link
#4 [ruby-core:64405]