Actions
Bug #10149
closedSome characters in EUC-KR does not encode to UTF-8 properly
Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-darwin13.0]
Backport:
Description
This bug is confirmed on 2.1.2p95
There are (at least) two valid euc-kr characters that do not get converted to utf-8 properly
1. "\xA2\xE6" should convert to U+20AC (Euro Sign)
Current behavior:
irb(main):001:0> "\xA2\xE6".encode('UTF-8', 'EUC-KR')
Encoding::UndefinedConversionError: "\xA2\xE6" from EUC-KR to UTF-8
2. "\xA2\xE7" should convert to U+00AE (Registered Sign)
Current behavior:
irb(main):002:0> "\xA2\xE7".encode('UTF-8', 'EUC-KR')
Encoding::UndefinedConversionError: "\xA2\xE7" from EUC-KR to UTF-8
I confirmed both characters convert correctly on python:
>>> "\xA2\xE7".decode('euc-kr')
u'\xae'
I am guessing this is because these two characters are missing in this mapping: http://svn.ruby-lang.org/repos/ruby/trunk/enc/trans/euckr-tbl.rb
Actions
Like0
Like0Like0Like0Like0