Project

General

Profile

Actions

Bug #10149

closed

Some characters in EUC-KR does not encode to UTF-8 properly

Added by paingo (Eric Seo) over 9 years ago. Updated over 9 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-darwin13.0]
[ruby-core:64452]

Description

This bug is confirmed on 2.1.2p95
There are (at least) two valid euc-kr characters that do not get converted to utf-8 properly

1. "\xA2\xE6" should convert to U+20AC (Euro Sign)
Current behavior:

irb(main):001:0> "\xA2\xE6".encode('UTF-8', 'EUC-KR')
Encoding::UndefinedConversionError: "\xA2\xE6" from EUC-KR to UTF-8

2. "\xA2\xE7" should convert to U+00AE (Registered Sign)
Current behavior:

irb(main):002:0> "\xA2\xE7".encode('UTF-8', 'EUC-KR')
Encoding::UndefinedConversionError: "\xA2\xE7" from EUC-KR to UTF-8

I confirmed both characters convert correctly on python:

>>> "\xA2\xE7".decode('euc-kr')
u'\xae'

I am guessing this is because these two characters are missing in this mapping: http://svn.ruby-lang.org/repos/ruby/trunk/enc/trans/euckr-tbl.rb

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0