Feature #2043: incompatible character encodings - Ruby - Ruby Issue Tracking System

Actions

Copy link

Feature #2043

closed

incompatible character encodings

Added by vo.x (Vit Ondruch) almost 16 years ago. Updated over 14 years ago.

Status:

Rejected

Assignee:

naruse (Yui NARUSE)

Target version:

3.0

[ruby-core:25360]

Description

=begin
Why the following example fails with the "Encoding::CompatibilityError: incompatible character encodings: Windows-1250 and UTF-8" exception?

s = "\u017Elu\u0165ou\u010dk\u00fd"
a = s.encode 'cp1250'
b = s.encode 'utf-8'
c = a + b

I would expect that if the strings are not in the same encoding, that Ruby will do everything they can to satisfy me, but they just tries if there is possible conversion to ASCII otherwise exception is fired. This is really annoying behavior.

Have you considered to allow such string merge?
=end

Actions

Copy link

Updated by naruse (Yui NARUSE) almost 16 years ago

Status changed from Open to Assigned
Assignee set to naruse (Yui NARUSE)

=begin
Sorry, what is "possible conversion to ASCII" ?
=end

Actions

Copy link

Updated by yugui (Yuki Sonoda) almost 16 years ago

Target version set to 3.0

=begin

=end

Actions

Copy link

Updated by vo.x (Vit Ondruch) almost 16 years ago

=begin
In following example, just characters from US-ASCII are used and in this case the addition works well.

s = 'abc'
a = s.encode 'cp1250'
b = s.encode 'utf-8'
c = a + b
=end

Actions

Copy link

Updated by naruse (Yui NARUSE) almost 16 years ago

=begin
Ruby 1.9 doesn't automatic conversion.
ASCII character set is a special
because those characters of ASCII compatible encodings are the same characters.

On Ruby 1.9's view, Unicode is not a superset of Windows-1252.
=end

Actions

Copy link

Updated by nobu (Nobuyoshi Nakada) almost 16 years ago

Status changed from Assigned to Rejected

=begin

=end

Actions

Copy link

Updated by vo.x (Vit Ondruch) almost 16 years ago

=begin

On Ruby 1.9's view, Unicode is not a superset of Windows-1252.

Is the "Ruby 1.9's view" somewhere described in detail? I still have the feeling that it is just half baked :/
=end

Actions

Copy link

Updated by naruse (Yui NARUSE) almost 16 years ago

=begin
http://jp.rubyist.net/magazine/?0025-Ruby19_m17n
http://yokolet.blogspot.com/2009/07/design-and-implementation-of-ruby-m17n.html
I wrote above but in Japanese, second is its translation.

http://github.com/candlerb/string19/tree/master
James' and string19 is also well documented.

People in ISO 8859 may think why Unicode is not a super set of Windows-1252.
In Japan, because of lack of standard conversion tables
between Japanese legacy encoding (Shift_JIS, EUC-JP, ISO-2022-JP) and Unicode,
vendors use different tables.
This sad situation made that Unicode is not a simple super set of legacy.
Ruby 1.9 inherits this.

If wide consensus for the standard table was made before Ruby 2.0,
Ruby 2.0 may have automatic conversion (or Unicode comes to be the internal code).
=end

Actions

Copy link

Updated by vo.x (Vit Ondruch) almost 16 years ago

=begin
Thank you for the links. It was interesting.

I'm looking forward Ruby 2.0 and their automatic conversions, since writing c = a.encode('utf-8') + b.encode('utf-8') to safely concatenate two strings is not sexy at all.

Vit
=end

Actions

Copy link

Also available in: Atom PDF

Like0

Like0Like0Like0Like0Like0Like0Like0Like0

Project

General

Profile

Ruby

Tags

Custom queries

Feature #2043

incompatible character encodings

Updated by naruse (Yui NARUSE) almost 16 years ago

Updated by yugui (Yuki Sonoda) almost 16 years ago

Updated by vo.x (Vit Ondruch) almost 16 years ago

Updated by naruse (Yui NARUSE) almost 16 years ago

Updated by nobu (Nobuyoshi Nakada) almost 16 years ago

Updated by vo.x (Vit Ondruch) almost 16 years ago

Updated by naruse (Yui NARUSE) almost 16 years ago

Updated by vo.x (Vit Ondruch) almost 16 years ago