https://bugs.ruby-lang.org/https://bugs.ruby-lang.org/favicon.ico?17113305112018-11-18T09:36:28ZRuby Issue Tracking SystemRuby master - Feature #15317: How to deal with obsolete property values in Unicode 11.0.0https://bugs.ruby-lang.org/issues/15317?journal_id=749132018-11-18T09:36:28Zduerst (Martin Dürst)duerst@it.aoyama.ac.jp
<ul><li><strong>Blocks</strong> <i><a class="issue tracker-2 status-5 priority-4 priority-default closed" href="/issues/14802">Feature #14802</a>: Update Unicode data to Unicode Version 11.0.0</i> added</li></ul> Ruby master - Feature #15317: How to deal with obsolete property values in Unicode 11.0.0https://bugs.ruby-lang.org/issues/15317?journal_id=749162018-11-18T10:43:45Zshevegen (Robert A. Heiler)shevegen@gmail.com
<ul></ul><p>Could a warning be issued as well, at the least for a transition period?</p>
<p>On a side note, does anyone happen to know how perl5/perl6 and python handle<br>
these situations? Perhaps if what they do makes sense, we could have a<br>
consistent behaviour in this regard across the languages (but only if it<br>
makes sense what they do in this context).</p> Ruby master - Feature #15317: How to deal with obsolete property values in Unicode 11.0.0https://bugs.ruby-lang.org/issues/15317?journal_id=749382018-11-19T10:57:53Zduerst (Martin Dürst)duerst@it.aoyama.ac.jp
<ul></ul><p>shevegen (Robert A. Heiler) wrote:</p>
<blockquote>
<p>Could a warning be issued as well, at the least for a transition period?</p>
</blockquote>
<p>I warning might make sense, but then we would get into the question of whether we need a warning for those cases w<br>
here property values changed (because obsoleting a property value essentially is the same as changing the value of the property for those characters that previously had the now obsoleted property value).</p>
<p>Often, property values only change for new characters (a defined character usually has different properties form an unassigned code point), which would not need a warning, and edge cases. That could lead to many superfluous warnings. It would also be quite difficult to implement, because the implementation would have to look at two or more sets of property files in parallel.</p> Ruby master - Feature #15317: How to deal with obsolete property values in Unicode 11.0.0https://bugs.ruby-lang.org/issues/15317?journal_id=749842018-11-20T10:22:39Zduerst (Martin Dürst)duerst@it.aoyama.ac.jp
<ul></ul><p>Some pointers obtained from an Unicode-internal discussion:</p>
<ul>
<li>
<p>All (including past) property values are available from the Relax NG schema for UCD in XML at <a href="http://www.unicode.org/reports/tr42/tr42-23.rnc" class="external">http://www.unicode.org/reports/tr42/tr42-23.rnc</a>, linked off <a href="https://www.unicode.org/reports/tr42/" class="external">https://www.unicode.org/reports/tr42/</a>.</p>
</li>
<li>
<p>PropertyAliases.txt lists all the properties, and PropertyValueAliases.txt provides lists of property values for enumerated values. We already download these files as part of the Ruby make process.</p>
</li>
<li>
<p>Hiragana_or_Katakana is an old obsolete script property, which currently leads to an error with <code>'abc' =~ /\p{hiragara_or_katakana}/'</code></p>
</li>
</ul> Ruby master - Feature #15317: How to deal with obsolete property values in Unicode 11.0.0https://bugs.ruby-lang.org/issues/15317?journal_id=750532018-11-22T07:51:30Zduerst (Martin Dürst)duerst@it.aoyama.ac.jp
<ul></ul><p>duerst (Martin Dürst) wrote:</p>
<blockquote>
<ul>
<li>Hiragana_or_Katakana is an old obsolete script property, which currently leads to an error with <code>'abc' =~ /\p{hiragara_or_katakana}/'</code>
</li>
</ul>
</blockquote>
<p>A more recent example: <code>'abc' =~ /\p{Grapheme_Cluster_Break=E_Modifier}/'</code>. This will work with Unicode 10.0.0, but may produce an error with Unicode 11.0.0.</p>
<p>The data for this property is available at <a href="https://www.unicode.org/Public/UCD/latest/ucd/auxiliary/GraphemeBreakProperty.txt" class="external">https://www.unicode.org/Public/UCD/latest/ucd/auxiliary/GraphemeBreakProperty.txt</a> (latest, i.e. 11.0.0) or the versioned <a href="https://www.unicode.org/Public/10.0.0/ucd/auxiliary/GraphemeBreakProperty.txt" class="external">https://www.unicode.org/Public/10.0.0/ucd/auxiliary/GraphemeBreakProperty.txt</a>.</p>
<p>Property values that are not used anymore do not show up in this data file. So E_Modifier is in the 10.0.0 version, but not in the latest version. This results in <code>Grapheme_Cluster_Break=E_Modifier</code> not showing up in enc/unicode/11.0.0/name2ctype.h, which produces an error.</p> Ruby master - Feature #15317: How to deal with obsolete property values in Unicode 11.0.0https://bugs.ruby-lang.org/issues/15317?journal_id=750552018-11-22T07:54:06Zduerst (Martin Dürst)duerst@it.aoyama.ac.jp
<ul></ul><p>The opinions at the committer meeting were tending towards producing an error or a warning, because this would make it possible to find places that need to be rewritten to produce whatever may have been the desired result.</p>
<p>The discussion on the Unicode expert mailing list on the other hand tended towards not producing an error.</p> Ruby master - Feature #15317: How to deal with obsolete property values in Unicode 11.0.0https://bugs.ruby-lang.org/issues/15317?journal_id=750632018-11-22T08:21:00Znaruse (Yui NARUSE)naruse@airemix.jp
<ul></ul><p>duerst (Martin Dürst) wrote:</p>
<blockquote>
<p>duerst (Martin Dürst) wrote:</p>
<blockquote>
<ul>
<li>Hiragana_or_Katakana is an old obsolete script property, which currently leads to an error with <code>'abc' =~ /\p{hiragara_or_katakana}/'</code>
</li>
</ul>
</blockquote>
<p>A more recent example: <code>'abc' =~ /\p{Grapheme_Cluster_Break=E_Modifier}/'</code>. This will work with Unicode 10.0.0, but may produce an error with Unicode 11.0.0.</p>
<p>The data for this property is available at <a href="https://www.unicode.org/Public/UCD/latest/ucd/auxiliary/GraphemeBreakProperty.txt" class="external">https://www.unicode.org/Public/UCD/latest/ucd/auxiliary/GraphemeBreakProperty.txt</a> (latest, i.e. 11.0.0) or the versioned <a href="https://www.unicode.org/Public/10.0.0/ucd/auxiliary/GraphemeBreakProperty.txt" class="external">https://www.unicode.org/Public/10.0.0/ucd/auxiliary/GraphemeBreakProperty.txt</a>.</p>
<p>Property values that are not used anymore do not show up in this data file. So E_Modifier is in the 10.0.0 version, but not in the latest version. This results in <code>Grapheme_Cluster_Break=E_Modifier</code> not showing up in enc/unicode/11.0.0/name2ctype.h, which produces an error.</p>
</blockquote>
<p><code>/\p{Grapheme_Cluster_Break=E_Modifier}/</code> is specially introduced for <code>/\X/</code>.<br>
But the source of \X, Unicode Text Segmentation (<a href="https://unicode.org/reports/tr29/" class="external">https://unicode.org/reports/tr29/</a>) but whose definition is changed.<br>
Therefore the compatibility about this is not important so much.</p>
<p>So just error seems ok.</p> Ruby master - Feature #15317: How to deal with obsolete property values in Unicode 11.0.0https://bugs.ruby-lang.org/issues/15317?journal_id=754752018-12-07T10:06:23Zduerst (Martin Dürst)duerst@it.aoyama.ac.jp
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Closed</i></li></ul><p>naruse (Yui NARUSE) wrote:</p>
<blockquote>
<p><code>/\p{Grapheme_Cluster_Break=E_Modifier}/</code> is specially introduced for <code>/\X/</code>.<br>
But the source of \X, Unicode Text Segmentation (<a href="https://unicode.org/reports/tr29/" class="external">https://unicode.org/reports/tr29/</a>) but whose definition is changed.<br>
Therefore the compatibility about this is not important so much.</p>
<p>So just error seems ok.</p>
</blockquote>
<p>Ok. <code>/\p{Grapheme_Cluster_Break=E_Modifier}/</code> now produces an error. If we get any bug reports, we can still revisit this issue (actually, better open a new one).</p>