test/ruby/enc/test_emoji_breaks.rb does not deal with Unicode ranges in file emoji-sequences.txt
While working on issue #17750, I found out that test_emoji_breaks.rb does not deal with Unicode ranges in the file emoji-sequences.txt. That means that the tests may not cover all emoji. This should eventually be fixed, but requires some rewriting of the code, which I plan to do independently of the Unicode/Emoji version upgrade.
Updated by duerst (Martin Dürst) 11 months ago
One of the testing scripts (
test/ruby/enc/test_emoji_break.rb) that the version declared internally in a data files matches the version we expect. In that context, I ran into the following problem, reported via standard channels to the Unicode Consortium:
Emoji data files in https://www.unicode.org/Public/emoji/13.1/ internally say they are for version 13.1. But the files moved to https://www.unicode.org/Public/13.0.0/ucd/emoji/, say "# Version: 13.0". We keep both an Unicode version and an Emoji version (available in Ruby via RbConfig::CONFIG['UNICODE_VERSION'] and RbConfig::CONFIG['UNICODE_EMOJI_VERSION']). But neither of them matches 13.0. For the files moved under https://www.unicode.org/Public/13.0.0/ucd/emoji/, they really should indicate the Unicode version, not the Emoji version, because they are updated in sync with Unicode versions, and not updated when only Emoji versions get updated.
As a temporary measure, I plan to ignore the version in the moved file(s).