Project

General

Profile

Bug #14934

Unicode: Hangul normalize bug

Added by MaLin (Lin Ma) over 1 year ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Target version:
-
[ruby-core:88070]

Description

I was involved to fix a similar bug in Python, I found Ruby also has bug code.

We should fix this line[1] like this:
[1] https://github.com/ruby/ruby/blob/96db72ce38b27799dd8e80ca00696e41234db6ba/lib/unicode_normalize/normalize.rb#L73

-if length>2 and 0 <= (trail=string[2].ord-TBASE) and trail < TCOUNT
+if length>2 and 0 < (trail=string[2].ord-TBASE) and trail < TCOUNT


There was a change of Unicode Standard's demonstration code.

Before Unicode 4.1.0 (draft), here is: TBase <= code <= TBase+TCount
see: http://www.unicode.org/reports/tr15/tr15-24.html#hangul_composition

After Unicode 4.1.0, here is TBase < code < TBase+TCount, which in line with Unicode 10.0
see: http://www.unicode.org/reports/tr15/tr15-25.html#hangul_composition

This change happened in 2005.

Please note: The normalize algorithm didn't changed, only the demonstration code changed, see this discussion[2] about this point.
[2] https://bugs.python.org/issue29456


Here is some test code[3] for Python, maybe useful for this fix.
[3] https://github.com/python/cpython/commit/d134809cd3764c6a634eab7bb8995e3e2eff14d5

Also available in: Atom PDF