Bug #2411
closedString#encode fails but eval("#coding:") works
Description
=begin
Hello,
[Summary] String#encode() should internally try the eval() approach
shown below before giving up hope and raising
Encoding::UndefinedConversionError
I found a surprising (POLS please!) workaround for encoding conversion
errors in Ruby 1.9 while trying to understand why some Chinese text
returned by screen scraping (via Net::HTTP) was appearing in escaped
form when it was written to a file.
$ irb
ruby 1.9.1p243 (2009-07-16 revision 24175) [i686-linux]¶
Encoding.default_external
=> #Encoding:UTF-8
s = "%s\xE5\x92\x8C%s"
=> "%s和%s"
s.encoding
=> #Encoding:UTF-8
This works because my IRB session began in UTF-8 mode, and anything I
enter there is naturally treated with UTF-8 encoding.
To simulate the actual problem I faced when Net::HTTP returned that
string to me with ASCII-8BIT encoding, I tried the following conversion:
ascii_8bit = s.encode('ascii-8bit')
Encoding::UndefinedConversionError: "\xE5\x92\x8C" from UTF-8 to ASCII-8BIT
from (irb):4:inencode' from (irb):4 from /usr/bin/irb:12:in
'
No luck. Let us try eval() because Ruby 1.9 has per-file encoding:
ascii_8bit = eval("# encoding: ascii-8bit\n#{ s.inspect }")
=> "%s\xE5\x92\x8C%s"
ascii_8bit.encoding
=> #Encoding:ASCII-8BIT
That worked! Surprising! I wonder immediately: couldn't
String#encode() fall back to the eval() approach internally instead
of giving up & raising Encoding::UndefinedConversionError?
And now, if we take that ASCII-8BIT string and try to convert into
UTF-8 (just like the problem I faced with the result of Net::HTTP),
we face a similar, but opposite problem:
utf_8 = ascii_8bit.encode('utf-8')
Encoding::UndefinedConversionError: "\xE5" from ASCII-8BIT to UTF-8
from (irb):7:inencode' from (irb):7 from /usr/bin/irb:12:in
'
No luck again. Let us try eval():
utf_8 = eval("# encoding: utf-8\n#{ ascii_8bit.inspect }")
=> "%s和%s"
utf_8.encoding
=> #Encoding:UTF-8
Thanks for your consideration.
=end