Bug #15343
closedString#each_grapheme_cluster wrongly splits some emoji (genie, zombie, wrestling)
Description
All the codepoint combinations that turn up in the various emoji files provided by Unicode (currently we use those at https://www.unicode.org/Public/emoji/5.0/) are recognized as grapheme clusters by String#each_grapheme_cluster
, except those relating to genies, zombies, and wrestling (THIS IS NOT A JOKE!).
Taking an example from https://www.unicode.org/Public/emoji/5.0/emoji-zwj-sequences.txt, line 396:
$ ./ruby -e '"\u{1F9DE 200D 2640 FE0F}".each_grapheme_cluster.to_a.length.display'
2
The correct result is 1, not 2. The sequence of codepoints represents a woman genie.
I will commit the file test/ruby/enc/test_emoji_breaks.rb, which excludes genie, zombie, and wrestling emoji to make sure the tests pass.
I would like to make sure that this is correct for Unicode 10.0.0 before moving to Unicode 11.0.0. I will try to find out how to fix this by myself, but would definitely appreciate help.
Files