https://bugs.ruby-lang.org/https://bugs.ruby-lang.org/favicon.ico?17113305112022-02-21T03:34:02ZRuby Issue Tracking SystemRuby master - Bug #18590: String#downcase and CAPITAL LETTER I WITH DOT ABOVEhttps://bugs.ruby-lang.org/issues/18590?journal_id=965962022-02-21T03:34:02Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Assigned</i></li><li><strong>Assignee</strong> set to <i>duerst (Martin Dürst)</i></li></ul><p>The document of Unicode case folding (<a href="http://www.unicode.org/Public/UCD/latest/ucd/CaseFolding.txt" class="external">http://www.unicode.org/Public/UCD/latest/ucd/CaseFolding.txt</a>) says:</p>
<pre><code>0130; F; 0069 0307; # LATIN CAPITAL LETTER I WITH DOT ABOVE
0130; T; 0069; # LATIN CAPITAL LETTER I WITH DOT ABOVE
</code></pre>
<p>"F" is for "full case folding", and "T" is for "Turkic languages".</p>
<p>String#downcase uses full Unicode case mapping by default (See <a href="https://docs.ruby-lang.org/en/3.0/String.html#method-i-downcase" class="external">https://docs.ruby-lang.org/en/3.0/String.html#method-i-downcase</a>). You can get the result you expected by <code>:turkic</code> option.</p>
<pre><code>'İ'.downcase(:turkic).chars
=> ["i"]
</code></pre> Ruby master - Bug #18590: String#downcase and CAPITAL LETTER I WITH DOT ABOVEhttps://bugs.ruby-lang.org/issues/18590?journal_id=965972022-02-21T03:44:59Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul></ul><p><a class="user active user-mention" href="https://bugs.ruby-lang.org/users/50">@duerst (Martin Dürst)</a> Looks like this document <a href="https://www.unicode.org/charts/case/" class="external">https://www.unicode.org/charts/case/</a> (which is referred by <a href="https://docs.ruby-lang.org/en/master/doc/case_mapping_rdoc.html" class="external">https://docs.ruby-lang.org/en/master/doc/case_mapping_rdoc.html</a>) says that the lowercase of U+0130 is U+0069. Which is correct?</p> Ruby master - Bug #18590: String#downcase and CAPITAL LETTER I WITH DOT ABOVEhttps://bugs.ruby-lang.org/issues/18590?journal_id=966502022-02-22T23:15:33Zandrykonchin (Andrew Konchin)
<ul></ul><p>Thank you for the suggestion.</p>
<p>I am wondering whether <code>String#downcase</code> (when called without arguments) follows only Unicode case mapping rules (as stated in the <a href="https://ruby-doc.org/core-3.0.0/String.html#method-i-downcase" class="external">documentation</a>). Or also the folding ones?</p>
<p>I would expect that a call of <code>String#downcase</code> without arguments uses the one-to-one case mapping rules, that are specified in the <a href="https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt" class="external">UnicodeData.txt</a> file.</p> Ruby master - Bug #18590: String#downcase and CAPITAL LETTER I WITH DOT ABOVEhttps://bugs.ruby-lang.org/issues/18590?journal_id=966542022-02-23T08:17:52Zduerst (Martin Dürst)duerst@it.aoyama.ac.jp
<ul><li><strong>Status</strong> changed from <i>Assigned</i> to <i>Closed</i></li></ul><p>andrykonchin (Andrew Konchin) wrote in <a href="#note-3">#note-3</a>:</p>
<blockquote>
<p>Thank you for the suggestion.</p>
<p>I am wondering whether <code>String#downcase</code> (when called without arguments) follows only Unicode case mapping rules (as stated in the <a href="https://ruby-doc.org/core-3.0.0/String.html#method-i-downcase" class="external">documentation</a>). Or also the folding ones?</p>
<p>I would expect that a call of <code>String#downcase</code> without arguments uses the one-to-one case mapping rules, that are specified in the <a href="https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt" class="external">UnicodeData.txt</a> file.</p>
</blockquote>
<p>It should use the mappings in <a href="https://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt" class="external">https://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt</a>.</p>
<p>And that is 0069 0307 (i.e. 'i' followed by dot above) for 'İ'.downcase.</p>
<blockquote>
</blockquote>
<p>The data in UnicodeData is restricted to simple case mappings (i.e. mappings that don't change the length of the string in terms of number of codepoints). In Ruby, there is no need for such a restriction. See also <a href="https://www.sw.it.aoyama.ac.jp/2016/pub/RubyKaigi/" class="external">https://www.sw.it.aoyama.ac.jp/2016/pub/RubyKaigi/</a>, slide 23.</p>
<p>I'm closing this, because it works as intended/described, as far as I can see.</p> Ruby master - Bug #18590: String#downcase and CAPITAL LETTER I WITH DOT ABOVEhttps://bugs.ruby-lang.org/issues/18590?journal_id=966552022-02-23T09:27:25Zandrykonchin (Andrew Konchin)
<ul></ul><p>Thank you for your clarification.</p> Ruby master - Bug #18590: String#downcase and CAPITAL LETTER I WITH DOT ABOVEhttps://bugs.ruby-lang.org/issues/18590?journal_id=966612022-02-24T03:05:26Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul><li><strong>Status</strong> changed from <i>Closed</i> to <i>Open</i></li></ul><p><a class="user active user-mention" href="https://bugs.ruby-lang.org/users/50">@duerst (Martin Dürst)</a> Let me confirm. The rdoc of 3.1 and master refers to <a href="https://www.unicode.org/charts/case/" class="external">https://www.unicode.org/charts/case/</a>.</p>
<blockquote>
<p>Default Case Mapping<br>
By default, all of these methods use full Unicode case mapping, which is suitable for most languages. See <a href="https://www.unicode.org/charts/case/" class="external">Unicode Latin Case Chart</a>.</p>
</blockquote>
<p>It is not clear to me that the document says "0069 0307 for 'İ'.downcase". Is it okay? Should it be replaced with <a href="https://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt" class="external">https://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt</a> ?</p> Ruby master - Bug #18590: String#downcase and CAPITAL LETTER I WITH DOT ABOVEhttps://bugs.ruby-lang.org/issues/18590?journal_id=966622022-02-24T03:09:01Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul></ul><p>BTW, the rdoc of String#downcase in 3.1 and master is very less informative, and has a broken link (which is maybe the same issue as <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: Tutorial Link for Optionparser is broken (Closed)" href="https://bugs.ruby-lang.org/issues/18468">#18468</a>). It was changed at <a class="changeset" title="Enhanced RDoc for case mapping (#5245) Adds file doc/case_mapping.rdoc, which describes case map..." href="https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/f7e266e6d2ccad63e4245a106a80c82ef2b38cbf">f7e266e6d2ccad63e4245a106a80c82ef2b38cbf</a> between 3.0 and 3.1. Personally I strongly prefer <a href="https://ruby-doc.org/core-3.0.0/String.html#method-i-downcase" class="external">the 3.0 style</a>.</p> Ruby master - Bug #18590: String#downcase and CAPITAL LETTER I WITH DOT ABOVEhttps://bugs.ruby-lang.org/issues/18590?journal_id=966632022-02-24T09:50:28Zduerst (Martin Dürst)duerst@it.aoyama.ac.jp
<ul></ul><p>mame (Yusuke Endoh) wrote in <a href="#note-7">#note-7</a>:</p>
<blockquote>
<p>BTW, the rdoc of String#downcase in 3.1 and master is very less informative, and has a broken link (which is maybe the same issue as <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: Tutorial Link for Optionparser is broken (Closed)" href="https://bugs.ruby-lang.org/issues/18468">#18468</a>). It was changed at <a class="changeset" title="Enhanced RDoc for case mapping (#5245) Adds file doc/case_mapping.rdoc, which describes case map..." href="https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/f7e266e6d2ccad63e4245a106a80c82ef2b38cbf">f7e266e6d2ccad63e4245a106a80c82ef2b38cbf</a> between 3.0 and 3.1. Personally I strongly prefer <a href="https://ruby-doc.org/core-3.0.0/String.html#method-i-downcase" class="external">the 3.0 style</a>.</p>
</blockquote>
<p>I also prefer the 3.0 version, but that's probably because I wrote that documentation of these methods (when I implemented them). Anyway, I think the 3.1 way of documenting things could also work, but the options link on each casing method should include a fragment and point to <a href="https://ruby-doc.org/core-3.1.0/doc/case_mapping_rdoc.html#label-Default+Case+Mapping" class="external">https://ruby-doc.org/core-3.1.0/doc/case_mapping_rdoc.html#label-Default+Case+Mapping</a>, not just to <a href="https://ruby-doc.org/core-3.1.0/doc/case_mapping_rdoc.html" class="external">https://ruby-doc.org/core-3.1.0/doc/case_mapping_rdoc.html</a>. <a class="user active user-mention" href="https://bugs.ruby-lang.org/users/52355">@burdettelamar (Burdette Lamar)</a></p>
<p>mame (Yusuke Endoh) wrote in <a href="#note-6">#note-6</a>:</p>
<blockquote>
<p><a class="user active user-mention" href="https://bugs.ruby-lang.org/users/50">@duerst (Martin Dürst)</a> Let me confirm. The rdoc of 3.1 and master refers to <a href="https://www.unicode.org/charts/case/" class="external">https://www.unicode.org/charts/case/</a>.</p>
<blockquote>
<p>Default Case Mapping<br>
By default, all of these methods use full Unicode case mapping, which is suitable for most languages. See <a href="https://www.unicode.org/charts/case/" class="external">Unicode Latin Case Chart</a>.</p>
</blockquote>
<p>It is not clear to me that the document says "0069 0307 for 'İ'.downcase".</p>
</blockquote>
<p>That document does NOT say "0069 0307 for 'İ'.downcase".</p>
<blockquote>
<p>Is it okay?</p>
</blockquote>
<p>I reported to Unicode that they should check it an clarify how this chart was made.</p>
<blockquote>
<p>Should it be replaced with <a href="https://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt" class="external">https://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt</a> ?</p>
</blockquote>
<p>In the Ruby documentation, probably yes. SpecialCasing.txt is an official Unicode data file. The case charts are just a Web page. But the case charts may be easier to understand for non-experts.</p> Ruby master - Bug #18590: String#downcase and CAPITAL LETTER I WITH DOT ABOVEhttps://bugs.ruby-lang.org/issues/18590?journal_id=966642022-02-24T13:37:50Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul></ul><p>duerst (Martin Dürst) wrote in <a href="#note-8">#note-8</a>:</p>
<blockquote>
<blockquote>
<p>Is it okay?</p>
</blockquote>
<p>I reported to Unicode that they should check it an clarify how this chart was made.</p>
</blockquote>
<p>I see, thanks!</p>
<blockquote>
<blockquote>
<p>Should it be replaced with <a href="https://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt" class="external">https://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt</a> ?</p>
</blockquote>
<p>In the Ruby documentation, probably yes. SpecialCasing.txt is an official Unicode data file. The case charts are just a Web page. But the case charts may be easier to understand for non-experts.</p>
</blockquote>
<p>It's certainly easy to understand, but if it's wrong, I don't think it's even worth considering.</p>
<p>I wanted to create a PR to fix the document, but I am unsure what document is the best reference for full case mapping. <a class="user active user-mention" href="https://bugs.ruby-lang.org/users/50">@duerst (Martin Dürst)</a> Could you please fix it? Or should we wait until the chart will be fixed?</p> Ruby master - Bug #18590: String#downcase and CAPITAL LETTER I WITH DOT ABOVEhttps://bugs.ruby-lang.org/issues/18590?journal_id=966782022-02-27T05:36:28Zduerst (Martin Dürst)duerst@it.aoyama.ac.jp
<ul></ul><p>mame (Yusuke Endoh) wrote in <a href="#note-9">#note-9</a>:</p>
<blockquote>
<p>I wanted to create a PR to fix the document, but I am unsure what document is the best reference for full case mapping. <a class="user active user-mention" href="https://bugs.ruby-lang.org/users/50">@duerst (Martin Dürst)</a> Could you please fix it? Or should we wait until the chart will be fixed?</p>
</blockquote>
<p>The best reference is section 3.13 (Default Case Algorithms) of <a href="https://www.unicode.org/versions/latest/ch03.pdf" class="external">https://www.unicode.org/versions/latest/ch03.pdf</a>. This is a lot of text, not as easy to understand as a table. But maybe this is better. People don't need a table, it's easy to create one with Ruby :-).<br>
[Please not that this URI currently redirects to https://www.unicode.org/versions/Unicode14.0.0/ch03.pdf, but I still have to upgrade Ruby to Unicode 14.0.0; hope to be able to do this in the next couple weeks.]</p> Ruby master - Bug #18590: String#downcase and CAPITAL LETTER I WITH DOT ABOVEhttps://bugs.ruby-lang.org/issues/18590?journal_id=966832022-02-28T08:04:43Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul></ul><p><a class="user active user-mention" href="https://bugs.ruby-lang.org/users/50">@duerst (Martin Dürst)</a> Thanks, I have created a PR. <a href="https://github.com/ruby/ruby/pull/5607" class="external">https://github.com/ruby/ruby/pull/5607</a></p> Ruby master - Bug #18590: String#downcase and CAPITAL LETTER I WITH DOT ABOVEhttps://bugs.ruby-lang.org/issues/18590?journal_id=979022022-06-09T09:25:36Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Closed</i></li></ul><p>Fixed at <a class="changeset" title="doc/case_mapping.rdoc: Fix references for case mapping The chart (https://www.unicode.org/charts..." href="https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/bda4d91f0599a8e2d278bc13660a5576d4ced353">bda4d91f0599a8e2d278bc13660a5576d4ced353</a></p>