Bug #18679
closedEncoding::UndefinedConversionError: "\xE2" from ASCII-8BIT to UTF-8
Description
We are facing an issue only when running ruby on arm from amazon linux. In some cases when we puts a string we'll receive the above error message. However when we run the same data through puts on Intel we do not receive this error. I am not sure if this is a ruby issue maybe an iconv issue... but what would be the best way to capture more data to help from here?
Updated by taf2 (Todd Fisher) about 3 years ago
I found some additional insight... on Intel we can puts File.read("this-file-contains-utf8") # and no crash
On arm in some cases when we do
puts File.read("this-file-contains-uf8") # it crashes with an encoding error ...
Adding encoding: 'UTF-8' # does resolve this but... still in some cases we have found that if we receive bytes say from an HTTP request... and puts it'll crash... on arm but not intel...
Updated by duerst (Martin Dürst) about 3 years ago
First, if the error says Encoding::UndefinedConversionError, then I think it's not related to iconv, because iconv only gets used when you explicitly say so. Ruby has its own internal character conversion code.
Second, it's very clear that you get a conversion error when you try to convert "\xE2" from ASCII-8BIT to UTF-8. In ASCII-8BIT, "\xE2" is just a binary byte, without any character defined on it. There's no way to convert that to a character in UTF-8.
The "\xE2" byte may be the start of an UTF-8 byte sequence, somewhere between U+2000 (E2 80 80) and U+2FFF (E2 BF BF). But in that case, there would be no need to convert, only a need to label the encoding correctly. Of course, the "\E2" byte may also be something else.
Updated by byroot (Jean Boussier) about 3 years ago
You might want to look at wether Encoding.default_internal
and Encoding.default_external
matches on your two platforms.
Updated by Eregon (Benoit Daloze) about 3 years ago
My bet would be the locale is not set properly on the arm machine.
locale
probably shows C
or POSIX
and many things don't work with that.
You probably need export LANG=en_US.UTF-8
or so.
I think CRuby should warn in that case. TruffleRuby already does.
Updated by taf2 (Todd Fisher) about 3 years ago
@byroot (Jean Boussier) thank you! that was it on intel: Encoding.default_internal
=> #Encoding:UTF-8
On arm:
Encoding.default_internal
=> nil
Updated by byroot (Jean Boussier) about 3 years ago
@taf2, in that case it's indeed a $LANG
problem.
Updated by duerst (Martin Dürst) about 3 years ago
- Status changed from Open to Rejected
It seems clear that this isn't a Ruby bug. So I'm closing this issue. But please feel free to continue discussing the solution here if that helps.