Feature #2043
incompatible character encodings
| Status: | Rejected | Start date: | 09/04/2009 | |
|---|---|---|---|---|
| Priority: | Normal | Due date: | ||
| Assignee: | % Done: | 0% |
||
| Category: | - | |||
| Target version: | 3.0 |
Description
Why the following example fails with the "Encoding::CompatibilityError: incompatible character encodings: Windows-1250 and UTF-8" exception? s = "\u017Elu\u0165ou\u010dk\u00fd" a = s.encode 'cp1250' b = s.encode 'utf-8' c = a + b I would expect that if the strings are not in the same encoding, that Ruby will do everything they can to satisfy me, but they just tries if there is possible conversion to ASCII otherwise exception is fired. This is really annoying behavior. Have you considered to allow such string merge?
History
Updated by Yui NARUSE over 2 years ago
- Status changed from Open to Assigned
- Assignee set to Yui NARUSE
Sorry, what is "possible conversion to ASCII" ?
Updated by Yuki Sonoda over 2 years ago
- Target version set to 3.0
Updated by Vit Ondruch over 2 years ago
In following example, just characters from US-ASCII are used and in this case the addition works well. s = 'abc' a = s.encode 'cp1250' b = s.encode 'utf-8' c = a + b
Updated by Yui NARUSE over 2 years ago
Ruby 1.9 doesn't automatic conversion. ASCII character set is a special because those characters of ASCII compatible encodings are the same characters. On Ruby 1.9's view, Unicode is not a superset of Windows-1252.
Updated by Nobuyoshi Nakada over 2 years ago
- Status changed from Assigned to Rejected
Updated by Vit Ondruch over 2 years ago
> On Ruby 1.9's view, Unicode is not a superset of Windows-1252. Is the "Ruby 1.9's view" somewhere described in detail? I still have the feeling that it is just half baked :/
Updated by Yui NARUSE over 2 years ago
http://jp.rubyist.net/magazine/?0025-Ruby19_m17n http://yokolet.blogspot.com/2009/07/design-and-implementation-of-ruby-m17n.html I wrote above but in Japanese, second is its translation. http://github.com/candlerb/string19/tree/master James' and string19 is also well documented. People in ISO 8859 may think why Unicode is not a super set of Windows-1252. In Japan, because of lack of standard conversion tables between Japanese legacy encoding (Shift_JIS, EUC-JP, ISO-2022-JP) and Unicode, vendors use different tables. This sad situation made that Unicode is not a simple super set of legacy. Ruby 1.9 inherits this. If wide consensus for the standard table was made before Ruby 2.0, Ruby 2.0 may have automatic conversion (or Unicode comes to be the internal code).
Updated by Vit Ondruch over 2 years ago
Thank you for the links. It was interesting.
I'm looking forward Ruby 2.0 and their automatic conversions, since writing c = a.encode('utf-8') + b.encode('utf-8') to safely concatenate two strings is not sexy at all.
Vit