Bug #7742: System encoding (Windows-1258) is not recognized by Ruby to convert back to UTF-8 - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #7742

open

System encoding (Windows-1258) is not recognized by Ruby to convert back to UTF-8

Bug #7742: System encoding (Windows-1258) is not recognized by Ruby to convert back to UTF-8

Added by Mars (Hong Ha Dang ) over 13 years ago. Updated over 2 years ago.

Status:

Assigned

Assignee:

duerst (Martin Dürst)

Target version:

ruby -v:

1.9.3

Backport:

[ruby-core:51702]

Tags:

encoding

Description

I installed Railsinstaller in win8. After intall complete the screen set to

configuration Railsinstaller on cmd (step 2). I give user name: DHH Mars and
email: dhhma...@gmail.com. It ran and have following massage:

C:/RailsInstaller/scripts/config_check.rb:64:in 'exist?': code converter not
found Encoding::ConverterNotFoundError from
C:/RailsInstaller/scripts/config_check.rb:64:in 'main'

C:\Sites>

Related issues 1 (1 open — 0 closed)

Updated by duerst (Martin Dürst) over 13 years ago Actions
Copy link
#1 [ruby-core:51703]

Mars (Hong Ha Dang ) wrote:

C:/RailsInstaller/scripts/config_check.rb:64:in 'exist?': code converter not
found

Yes, windows-1258 (for Vietnamese) is currently not supported. The reason for this is because conversion from windows-1258 to UTF-8 should produce output in Unicode Normalization Form C. As an example, the sequence 0xE3 0xEC (LATIN SMALL LETTER A WITH BREVE followed by COMBINING ACCUTE ACCENT) should not be converted to the sequence U+0103 U+0301, but to the single character U+1EAF (LATIN SMALL LETTER A WITH BREVE AND ACCUTE).

This means that this bug depends on bug #6351. Unfortunately, I don't have time now to work on that bug; this will have to wait for March, sorry.

Updated by duerst (Martin Dürst) over 13 years ago Actions
Copy link
#2 [ruby-core:51704]

Assignee set to duerst (Martin Dürst)
Target version set to 2.6

Updated by thegcat (Felix Schäfer) over 12 years ago Actions
Copy link
#3 [ruby-core:59645]

=begin
We (((<Planio|URL:https://plan.io>))) are also in need of Windows-1258 to UTF-8 conversion, is there anything we can do to help?
=end

Updated by duerst (Martin Dürst) over 12 years ago Actions
Copy link
#4 [ruby-core:59655]

thegcat (Felix Schäfer) wrote:

=begin
We (((<Planio|URL:https://plan.io>))) are also in need of Windows-1258 to UTF-8 conversion, is there anything we can do to help?
=end

As explained above, the problem is with normalization. If you are okay with a version that just does one-to-one conversion, then that can be produced rather quickly (maybe even over the weekend). But most Vietnamese content, e.g. on the Web, is normalized (NFC), and I guess you'd want to have that, too. But then you also have to be careful with respect to round-tripping, because windows-1258->UTF-8 will be .encode('UTF-8', 'windows-1258').to_nfc or so, but backwards conversion would need special code because neither NFC nor NFD can directly be converted to windows-1258.

A slightly more elaborate version would do one-to-one conversion from windows-1258 to UTF-8, but would convert that kind of data as well as data in NFC back to windows-1258 (but not arbitrarily non-normalized data) back to windows-1258. Such a converter might be relatively easy to produce, or it might be more difficult; I can't say which off the top of my head.

So if you use a normalization library after conversion, that might work out, but it would be somewhat of a special case. Also, when we later introduce a different (more normalizing) converter, that may be seen as a non-backwards-compatible change.

One solution to backwards-compatibility would be to use different encoding labels to differentiate versions of conversion. But this has the problem that in the current state of affairs, it introduces additional "encodings" that are not really different, but just variants produced by different conversions. That's the problem e.g. with the current UTF8-MAC, and I don't want to create more of these.

A more long-term solution would be to introduce a difference between encodings and conversions, where e.g. one could use .encode('windows-1258--non-normalized', 'utf-8') or so to indicate a non-normalized version of conversion. But that would need some more general discussion among the Ruby experts in this field.

So Felix, if you tell me what you need, and we can make sure that it doesn't affect later backwards-compatibility, I might be able to work on something.

Updated by phasis68 (Heesob Park) over 12 years ago Actions
Copy link
#5 [ruby-core:59662]

As I know, VISCII(Vietnamese Standard Code for Information Interchange) can round trip UTF-8. So the implementation of the converter between VISCII and UTF-8 might be easy.

I am not sure if a converter between Windows-1258 and VISCII is possible, Windows-1258 can be supported via VISCII.
Windows-1258 <-> VISCII <-> UTF-8

Anyway, it would be nice if ruby supports VISCII encoding.

Updated by duerst (Martin Dürst) over 12 years ago Actions
Copy link
#6 [ruby-core:59663]

phasis68 (Heesob Park) wrote:

As I know, VISCII(Vietnamese Standard Code for Information Interchange) can round trip UTF-8. So the implementation of the converter between VISCII and UTF-8 might be easy.

Yes, it should be easy. Can you open a separate ticket? I'll give it a try over the weekend.

I am not sure if a converter between Windows-1258 and VISCII is possible, Windows-1258 can be supported via VISCII.

Conversion between Windows-1258 and VISCII is actually as difficult as the conversion between Windows-1258 and NFC-normalized UTF-8, which is the most difficult variant as I have explained above.

Updated by naruse (Yui NARUSE) over 8 years ago Actions
Copy link
#7

Target version deleted (~~2.6~~)

Updated by JesseJohnson (Jesse Johnson) over 2 years ago Actions
Copy link
#8 [ruby-core:115363]

If I understand correctly this test case should convert correctly and not raise a Encoding::ConverterNotFoundError error.

"\xE3\xEC".force_encoding(Encoding::Windows_1258).encode(Encoding::UTF_8)

Updated by hsbt (Hiroshi SHIBATA) over 2 years ago Actions
Copy link
#9

Status changed from Open to Assigned

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Bug #7742

System encoding (Windows-1258) is not recognized by Ruby to convert back to UTF-8

Updated by duerst (Martin Dürst) over 13 years ago Actions
Copy link
#1 [ruby-core:51703]

Updated by duerst (Martin Dürst) over 13 years ago Actions
Copy link
#2 [ruby-core:51704]

Updated by thegcat (Felix Schäfer) over 12 years ago Actions
Copy link
#3 [ruby-core:59645]

Updated by duerst (Martin Dürst) over 12 years ago Actions
Copy link
#4 [ruby-core:59655]

Updated by phasis68 (Heesob Park) over 12 years ago Actions
Copy link
#5 [ruby-core:59662]

Updated by duerst (Martin Dürst) over 12 years ago Actions
Copy link
#6 [ruby-core:59663]

Updated by naruse (Yui NARUSE) over 8 years ago Actions
Copy link
#7

Updated by JesseJohnson (Jesse Johnson) over 2 years ago Actions
Copy link
#8 [ruby-core:115363]

Updated by hsbt (Hiroshi SHIBATA) over 2 years ago Actions
Copy link
#9

Project

General

Profile

Ruby

Custom queries

Bug #7742

System encoding (Windows-1258) is not recognized by Ruby to convert back to UTF-8

Updated by duerst (Martin Dürst) over 13 years ago ActionsCopy link #1 [ruby-core:51703]

Updated by duerst (Martin Dürst) over 13 years ago ActionsCopy link #2 [ruby-core:51704]

Updated by thegcat (Felix Schäfer) over 12 years ago ActionsCopy link #3 [ruby-core:59645]

Updated by duerst (Martin Dürst) over 12 years ago ActionsCopy link #4 [ruby-core:59655]

Updated by phasis68 (Heesob Park) over 12 years ago ActionsCopy link #5 [ruby-core:59662]

Updated by duerst (Martin Dürst) over 12 years ago ActionsCopy link #6 [ruby-core:59663]

Updated by naruse (Yui NARUSE) over 8 years ago ActionsCopy link #7

Updated by JesseJohnson (Jesse Johnson) over 2 years ago ActionsCopy link #8 [ruby-core:115363]

Updated by hsbt (Hiroshi SHIBATA) over 2 years ago ActionsCopy link #9

Updated by duerst (Martin Dürst) over 13 years ago Actions
Copy link
#1 [ruby-core:51703]

Updated by duerst (Martin Dürst) over 13 years ago Actions
Copy link
#2 [ruby-core:51704]

Updated by thegcat (Felix Schäfer) over 12 years ago Actions
Copy link
#3 [ruby-core:59645]

Updated by duerst (Martin Dürst) over 12 years ago Actions
Copy link
#4 [ruby-core:59655]

Updated by phasis68 (Heesob Park) over 12 years ago Actions
Copy link
#5 [ruby-core:59662]

Updated by duerst (Martin Dürst) over 12 years ago Actions
Copy link
#6 [ruby-core:59663]

Updated by naruse (Yui NARUSE) over 8 years ago Actions
Copy link
#7

Updated by JesseJohnson (Jesse Johnson) over 2 years ago Actions
Copy link
#8 [ruby-core:115363]

Updated by hsbt (Hiroshi SHIBATA) over 2 years ago Actions
Copy link
#9