Bug #14934
closed
I'm not a expert of Ruby, I would suggest to examine the code of Hangul normalize, it looks too simple than the Unicode Standard's demonstration code.
- Assignee set to duerst (Martin Dürst)
Can you provide some test case(s)?
That is what frustrated me. I simply translated Python's test-cases for this issue[1] to Ruby.
[1] https://github.com/python/cpython/commit/d134809cd3764c6a634eab7bb8995e3e2eff14d5
But them passed without rasing exception.
Ruby's code seems relatived to the \u11a7
character.
I won't have much time to look at this issue this week. I'll get around to it next week (maybe even this Friday).
Need not hurry, it's a very old bug, and passed test-cases mystically.
I think I have figured things out:
The patch is technically correct. While LBASE and VBASE are the values of the first actual leading and vowel jamos, the value of TBASE is one smaller than the first actual trailing jamo at 0x11A8. This is to account for the fact that the lowest value of the "trailing digit" of the Hangul syllable representation indicates the absence of a trailing jamo. So in contrast to the <= tests related to LBASE and VBASE, it is indeed technically correct to have a < comparison operator in the comparison related to TBASE.
However, I have also figured out why this apparent bug doesn't actually affect Ruby. The reason is that we use regular expressions to extract "normalization runs" from the string to be normalized. We know that a U+11A7 character can never participate in a normalization operation because it is a classical Hangul Jamo not used in modern Hangul. So U+11A7 never appears in a normalization run, and there's thus no error.
I committed the tests adapted from Python and the fix of the comparison operator, because it's technically correct and we never know when this would lead to an actual bug if something somewhere else in the code gets changed.
@MaLin (Lin Ma), thanks again for the report, this helped me find another (real!) bug with file names fixed at r64085, and make an improvement at r64086.
- Status changed from Open to Closed
- Backport changed from 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN to 2.3: DONTNEED, 2.4: DONTNEED, 2.5: DONTNEED
Closed. Because there is no actual bug, there is no need to backport this.
Also available in: Atom
PDF
Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0