Project

General

Profile

Actions

Bug #10149

closed

Some characters in EUC-KR does not encode to UTF-8 properly

Added by paingo (Eric Seo) almost 8 years ago. Updated almost 8 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-darwin13.0]
[ruby-core:64452]

Description

This bug is confirmed on 2.1.2p95
There are (at least) two valid euc-kr characters that do not get converted to utf-8 properly

1. "\xA2\xE6" should convert to U+20AC (Euro Sign)
Current behavior:

irb(main):001:0> "\xA2\xE6".encode('UTF-8', 'EUC-KR')
Encoding::UndefinedConversionError: "\xA2\xE6" from EUC-KR to UTF-8

2. "\xA2\xE7" should convert to U+00AE (Registered Sign)
Current behavior:

irb(main):002:0> "\xA2\xE7".encode('UTF-8', 'EUC-KR')
Encoding::UndefinedConversionError: "\xA2\xE7" from EUC-KR to UTF-8

I confirmed both characters convert correctly on python:

>>> "\xA2\xE7".decode('euc-kr')
u'\xae'

I am guessing this is because these two characters are missing in this mapping: http://svn.ruby-lang.org/repos/ruby/trunk/enc/trans/euckr-tbl.rb

Updated by nobu (Nobuyoshi Nakada) almost 8 years ago

  • Description updated (diff)
  • Backport changed from 2.0.0: UNKNOWN, 2.1: UNKNOWN to 2.0.0: REQUIRED, 2.1: REQUIRED

Updated by nobu (Nobuyoshi Nakada) almost 8 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

Applied in changeset r47221.


euckr-tbl.rb: euro and registered signs

Updated by nagachika (Tomoyuki Chikanaga) almost 8 years ago

  • Backport changed from 2.0.0: REQUIRED, 2.1: REQUIRED to 2.0.0: REQUIRED, 2.1: DONE

Backported into ruby_2_1 branch at r47485.

Updated by usa (Usaku NAKAMURA) almost 8 years ago

  • Backport changed from 2.0.0: REQUIRED, 2.1: DONE to 2.0.0: DONE, 2.1: DONE

backported into ruby_2_0_0 at r47502.

Actions

Also available in: Atom PDF