Actions
Bug #21559
openUnicode normalization nfd -> nfc -> nfd is not reversible
Description
I expect nfd(nfc(str)) == nfd(str)
but found a string that doesn't.
# Ruby 3.1 - 3.5
str = "s\u{11930}\u{323}\u{11930}\u{307}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
# ruby 3.5.0dev
str = "s\u{1611e}\u{323}\u{1611e}\u{307}\u{1611f}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
Updated by nobu (Nobuyoshi Nakada) 2 days ago
"s\u{11930 323 11930 307}".unicode_normalize(:nfc).dump #=> "\u1E69\u{11930}\u{11930}"
"s\u{323 307}".unicode_normalize(:nfc).dump #=> "\u1E69"
Are U+0323 and U+0307 composed to s
jumping over U+11930?
Updated by ima1zumi (Mari Imaizumi) 2 days ago
- Assignee set to ima1zumi (Mari Imaizumi)
This looks like a bug. Per Unicode TR15, the identity toNFD(x) == toNFD(toNFC(x)) must be maintained. https://unicode.org/reports/tr15/#Design_Goals
It seems the NFC process is combining characters across U+11930, even though its CCC is 0.
Updated by duerst (Martin Dürst) 1 day ago
- Assignee changed from ima1zumi (Mari Imaizumi) to duerst (Martin Dürst)
@ima1zumi (Mari Imaizumi) Not sure this is even allowed, but I'm sure I'm responsible for this behavior, and want to fix it myself, so I change the Assignee to myself.
Updated by ima1zumi (Mari Imaizumi) about 22 hours ago
@duerst (Martin Dürst) Thank you, I appreciate you taking care of it.
Actions
Like0
Like0Like0Like0Like0