Project

General

Profile

Actions

Bug #2411

closed

String#encode fails but eval("#coding:") works

Added by sunaku (Suraj Kurapati) almost 15 years ago. Updated over 13 years ago.

Status:
Rejected
Assignee:
-
Target version:
-
ruby -v:
ruby 1.9.1p243 (2009-07-16 revision 24175) [i686-linux]
Backport:
[ruby-core:26941]

Description

=begin
Hello,

[Summary] String#encode() should internally try the eval() approach
shown below before giving up hope and raising
Encoding::UndefinedConversionError

I found a surprising (POLS please!) workaround for encoding conversion
errors in Ruby 1.9 while trying to understand why some Chinese text
returned by screen scraping (via Net::HTTP) was appearing in escaped
form when it was written to a file.

$ irb

ruby 1.9.1p243 (2009-07-16 revision 24175) [i686-linux]

Encoding.default_external
=> #Encoding:UTF-8

s = "%s\xE5\x92\x8C%s"
=> "%s和%s"

s.encoding
=> #Encoding:UTF-8

This works because my IRB session began in UTF-8 mode, and anything I
enter there is naturally treated with UTF-8 encoding.

To simulate the actual problem I faced when Net::HTTP returned that
string to me with ASCII-8BIT encoding, I tried the following conversion:

ascii_8bit = s.encode('ascii-8bit')
Encoding::UndefinedConversionError: "\xE5\x92\x8C" from UTF-8 to ASCII-8BIT
from (irb):4:in encode' from (irb):4 from /usr/bin/irb:12:in '

No luck. Let us try eval() because Ruby 1.9 has per-file encoding:

ascii_8bit = eval("# encoding: ascii-8bit\n#{ s.inspect }")
=> "%s\xE5\x92\x8C%s"

ascii_8bit.encoding
=> #Encoding:ASCII-8BIT

That worked! Surprising! I wonder immediately: couldn't
String#encode() fall back to the eval() approach internally instead
of giving up & raising Encoding::UndefinedConversionError?

And now, if we take that ASCII-8BIT string and try to convert into
UTF-8 (just like the problem I faced with the result of Net::HTTP),
we face a similar, but opposite problem:

utf_8 = ascii_8bit.encode('utf-8')
Encoding::UndefinedConversionError: "\xE5" from ASCII-8BIT to UTF-8
from (irb):7:in encode' from (irb):7 from /usr/bin/irb:12:in '

No luck again. Let us try eval():

utf_8 = eval("# encoding: utf-8\n#{ ascii_8bit.inspect }")
=> "%s和%s"

utf_8.encoding
=> #Encoding:UTF-8

Thanks for your consideration.
=end


Related issues 1 (0 open1 closed)

Is duplicate of Ruby master - Bug #2313: Incomplete encoding conversion?Rejected10/30/2009Actions
Actions

Also available in: Atom PDF

Like0
Like0Like0