Project

General

Profile

Feature #15182

Update extended grapheme cluster implementation for Unicode 11

Added by duerst (Martin Dürst) 10 months ago. Updated 8 months ago.

Status:
Closed
Priority:
Normal
Target version:
[ruby-core:89224]

Description

Reported by naruse (Yui NARUSE) at https://bugs.ruby-lang.org/issues/14802#change-74213:

The definition of extended grapheme cluster is changed in Unicode 11 (Unicode® Standard Annex #29
UNICODE TEXT SEGMENTATION revision 33: https://www.unicode.org/reports/tr29/tr29-33.html)
This affects Regexp /\X/ which is hardcoded in node_extended_grapheme_cluster() in regparse.c.

( CRLF
| Prepend*
( RI-sequence | Hangul-Syllable | !Control )
( Grapheme_Extend | SpacingMark )*
| . )
crlf 
| Control 
| precore* core postcore*

Related issues

Blocks Ruby master - Feature #14802: Update Unicode data to Unicode Version 11.0.0ClosedActions
Blocked by Ruby master - Feature #15341: Provide emoji version as RbConfig::CONFIG['UNICODE_EMOJI_VERSION']ClosedActions
Blocked by Ruby master - Bug #15343: String#each_grapheme_cluster wrongly splits some emoji (genie, zombie, wrestling)ClosedActions

History

#1

Updated by duerst (Martin Dürst) 10 months ago

  • Blocks Feature #14802: Update Unicode data to Unicode Version 11.0.0 added
#2

Updated by duerst (Martin Dürst) 8 months ago

  • Blocked by Feature #15341: Provide emoji version as RbConfig::CONFIG['UNICODE_EMOJI_VERSION'] added
#3

Updated by duerst (Martin Dürst) 8 months ago

  • Blocked by Bug #15343: String#each_grapheme_cluster wrongly splits some emoji (genie, zombie, wrestling) added

Updated by duerst (Martin Dürst) 8 months ago

  • Assignee set to duerst (Martin Dürst)
  • Status changed from Open to Closed

Implemented though a long series of patches, centered on regparse.c.

Related patches start at r65085 and end at r66269. The main patch is r66213.
New tests are at test/ruby/enc/test_grapheme_breaks.rb and test/ruby/enc/test_emoji_breaks.c.
enc/unicode.c is also modified.

Also available in: Atom PDF