Update extended grapheme cluster implementation for Unicode 11
Reported by naruse (Yui NARUSE) at https://bugs.ruby-lang.org/issues/14802#change-74213:
The definition of extended grapheme cluster is changed in Unicode 11 (Unicode® Standard Annex #29
UNICODE TEXT SEGMENTATION revision 33: https://www.unicode.org/reports/tr29/tr29-33.html)
This affects Regexp /\X/ which is hardcoded in node_extended_grapheme_cluster() in regparse.c.
( CRLF | Prepend* ( RI-sequence | Hangul-Syllable | !Control ) ( Grapheme_Extend | SpacingMark )* | . )
crlf | Control | precore* core postcore*
Updated by duerst (Martin Dürst) over 1 year ago
- Assignee set to duerst (Martin Dürst)
- Status changed from Open to Closed
Implemented though a long series of patches, centered on regparse.c.
Related patches start at r65085 and end at r66269. The main patch is r66213.
New tests are at test/ruby/enc/test_grapheme_breaks.rb and test/ruby/enc/test_emoji_breaks.c.
enc/unicode.c is also modified.