Bug #1098
closedUnclear encoding error: #<Encoding::UndefinedConversionError: "\xE2\x96\x80" from UTF-8 to ISO-8859-1 in conversion from CP850 to ISO-8859-1>
Description
=begin
The test script below exits with the error: #<Encoding::UndefinedConversionError: "\xE2\x96\x80" from UTF-8 to ISO-8859-1 in conversion from CP850 to ISO-8859-1>
This is weird/unclear/incomprehensible because I fail to see what makes ruby think I'm working with utf8. If this isn't a bug, I would kindly ask to make the error message slightly more intelligible by adding information about what is set to UTF-8, what to ISO-8859-1 etc. The way it is now this message is slighlty esoteric.
Test script:
Encoding: CP850¶
p Encoding.default_internal, Encoding.default_external # => nil, CP850
s = "weiß"
p s, s.encoding
p s.encode('ISO-8859-1')
=end
Updated by matz (Yukihiro Matsumoto) almost 16 years ago
=begin
Hi,
In message "Re: [ruby-core:21802] [Bug #1098] Unclear encoding error: #<Encoding::UndefinedConversionError: "\xE2\x96\x80" from UTF-8 to ISO-8859-1 in conversion from CP850 to ISO-8859-1>"
on Tue, 3 Feb 2009 22:53:34 +0900, Tom Link redmine@ruby-lang.org writes:
|The test script below exits with the error: #<Encoding::UndefinedConversionError: "\xE2\x96\x80" from UTF-8 to ISO-8859-1 in conversion from CP850 to ISO-8859-1>
First, since we haven't implemented direct conversion path from CP850
to ISO-8859-1 (yet), Ruby converts strings via UTF-8, hence the
message. If you have suggestion for better description, we are open.
Second, I couldn't reproduce the problem from your test script. The
conversion process goes from CP850 to UTF-8, then from UTF-8 to
ISO-8859-1. The message says resulting UTF-8 text is "\xE2\x96\x80",
which does not have corresponding character in ISO-8859-1 at all.
We have no more clue to draw any conclusion. There are a lot of
possibilities, from a bug in your script, to a bug in Cygwin, of
course including a bug in the trancoding engine.
matz.
=end
Updated by naruse (Yui NARUSE) almost 16 years ago
- Category set to M17N
- Status changed from Open to Rejected
- Assignee set to naruse (Yui NARUSE)
=begin
You declared in magic comment as CP850, but your exact script encoding seems ISO-8859-1.
Your character: U+00DF (Latin Small Leter Sharp S) is,
\xDF in ISO-8859-1
\xE1 in CP850
"\xdf".encode("iso-8859-1","CP850")
Encoding::UndefinedConversionError: "\xE2\x96\x80" from UTF-8 to ISO-8859-1 in conversion from CP850 to ISO-8859-1
from (irb):24:in `encode'
=end
Updated by tomel (Tom Link) almost 16 years ago
=begin
First, since we haven't implemented direct conversion path from CP850
to ISO-8859-1 (yet), Ruby converts strings via UTF-8, hence the
message. If you have suggestion for better description, we are open.
I'd suggest some duplication of information:
UndefinedConversionError: "..." from UTF-8 to ISO-8859-1 in indirect
conversion from CP850 to UTF-8 to ISO-8859-1
or "in indirect conversion from CP850 to ISO-8859-1 via UTF-8"
Second, I couldn't reproduce the problem from your test script.
Well, the problem was that the input really wasn't CP850 but latin-1
and that the setting LANG to xx_XX.ISO-8859-1 doesn't seem make ruby
set the external encoding properly -- although I had assumed that
http://redmine.ruby-lang.org/issues/show/956 would make that possible.
=end
Updated by tomel (Tom Link) almost 16 years ago
=begin
You declared in magic comment as CP850, but your exact script encoding seems ISO-8859-1.
It wasn't there in the original script. But you're right.
=end
Updated by duerst (Martin Dürst) almost 16 years ago
=begin
At 10:38 09/02/04, you wrote:
Hi,
In message "Re: [ruby-core:21802] [Bug #1098] Unclear encoding error:
#<Encoding::UndefinedConversionError: "\xE2\x96\x80" from UTF-8 to
ISO-8859-1 in conversion from CP850 to ISO-8859-1>"
on Tue, 3 Feb 2009 22:53:34 +0900, Tom Link redmine@ruby-lang.org writes:|The test script below exits with the error:
#<Encoding::UndefinedConversionError: "\xE2\x96\x80" from UTF-8 to
ISO-8859-1 in conversion from CP850 to ISO-8859-1>First, since we haven't implemented direct conversion path from CP850
to ISO-8859-1 (yet),
Frankly speaking, I don't think we ever will. It's simply unrealistic
to expect Ruby to have N*(N-1) data tables for N encodings. No
transcoding engine I know does that. We can always add direct
conversions between two non-UTF-8 encodings if it turns out to
be really necessary, but I don't see the reason in this case,
and there's definitely no sense to do it just for improving
error messages.
Ruby converts strings via UTF-8, hence the
message. If you have suggestion for better description, we are open.
Yes indeed. I think one step is to explain better what happened.
Second, I couldn't reproduce the problem from your test script. The
conversion process goes from CP850 to UTF-8, then from UTF-8 to
ISO-8859-1. The message says resulting UTF-8 text is "\xE2\x96\x80",
which does not have corresponding character in ISO-8859-1 at all.
Yes, this is character U+2580 (for a handy conversion script,
I use http://people.w3.org/rishida/scripts/uniview/conversion.php),
UPPER HALF BLOCK. It doesn't exist in ISO-8859-1, and therefore
the script produces the above error. It simply says that there
is no defined conversion between UTF-8 and ISO-8859-1 for that
character, which by extension means that there is no defined
conversion from CP850 to ISO-8859-1 for this character.
We have no more clue to draw any conclusion. There are a lot of
possibilities, from a bug in your script, to a bug in Cygwin, of
course including a bug in the trancoding engine.
This part is wrong. The conclusion is very clear. The script,
Cygwin, and the transcoding engine all are okay (at least as
far as this issue is concerned).
Regards, Martin.
#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
=end