https://bugs.ruby-lang.org/https://bugs.ruby-lang.org/favicon.ico?17113305112013-11-14T23:17:57ZRuby Issue Tracking SystemRuby master - Feature #9111: Encoding-free String comparisonhttps://bugs.ruby-lang.org/issues/9111?journal_id=429352013-11-14T23:17:57Znobu (Nobuyoshi Nakada)nobu@ruby-lang.org
<ul></ul><p>sawa (Tsuyoshi Sawada) wrote:</p>
<blockquote>
<p>I suggest that the comparison <code>String#<=></code> should not be based on the respective encoding of the strings, but all the strings should be internally converted to UTF-8 for the purpose of comparison.</p>
</blockquote>
<p>It's unacceptable to always convert all strings to UTF-8, should restrict to comparison with an ASCII-8BIT string.</p> Ruby master - Feature #9111: Encoding-free String comparisonhttps://bugs.ruby-lang.org/issues/9111?journal_id=429372013-11-15T00:04:21Zsawa (Tsuyoshi Sawada)
<ul></ul><p>Following nobu's suggestion, I came up with the following several possibilities:</p>
<p>When two strings with different encodings are to be compared by <code>String#<=></code>, then one of the following options should be taken:</p>
<ul>
<li>Raise a Warning message</li>
<li>Raise an error</li>
<li>Convert one of the strings to the other one.</li>
</ul>
<p>I am not sure which option would be the best, but feel the feature should not be left as is now.</p> Ruby master - Feature #9111: Encoding-free String comparisonhttps://bugs.ruby-lang.org/issues/9111?journal_id=429412013-11-15T05:20:18ZHanmac (Hans Mackowiak)hanmac@gmx.de
<ul></ul><p>what about strings with the same encoding, but different content, but that is turned the same?<br>
like "â" can be maked from "a" + "^" somehow, should they also treated as equal?</p> Ruby master - Feature #9111: Encoding-free String comparisonhttps://bugs.ruby-lang.org/issues/9111?journal_id=429512013-11-15T14:41:49Zsawa (Tsuyoshi Sawada)
<ul></ul><blockquote>
<p>Hanmac: "â" can be maked from "a" + "^"</p>
</blockquote>
<p>Treating them the same is too much, I think. There are various marking methods. For example, <code>â</code> would have a different marking in TeX. Assuming them equal is going too much. They should be treated differently.</p> Ruby master - Feature #9111: Encoding-free String comparisonhttps://bugs.ruby-lang.org/issues/9111?journal_id=429612013-11-15T17:15:40ZHanmac (Hans Mackowiak)hanmac@gmx.de
<ul></ul><p>i found the wikipedia source: <a href="http://en.wikipedia.org/wiki/Combining_character" class="external">http://en.wikipedia.org/wiki/Combining_character</a><br>
its not about treating "^a" or "a^" the same as "â" but there is a way to clue the chars together</p>
<p>i think thats also a reason for <a href="http://api.rubyonrails.org/classes/String.html#method-i-mb_chars" class="external">http://api.rubyonrails.org/classes/String.html#method-i-mb_chars</a> ?</p>
<p>i found another interesting gems <a href="http://rubygems.org/gems/unicode_utils" class="external">http://rubygems.org/gems/unicode_utils</a><br>
with that is also possible to do something like this: "ä".upcase => "Ä"</p>
<p>there is another page about combining character: <a href="http://sbp.so/supercombiner" class="external">http://sbp.so/supercombiner</a></p> Ruby master - Feature #9111: Encoding-free String comparisonhttps://bugs.ruby-lang.org/issues/9111?journal_id=430542013-11-21T16:35:20Znaruse (Yui NARUSE)naruse@airemix.jp
<ul></ul><p>Hanmac (Hans Mackowiak) wrote:</p>
<blockquote>
<p>what about strings with the same encoding, but different content, but that is turned the same?<br>
like "â" can be maked from "a" + "^" somehow, should they also treated as equal?</p>
</blockquote>
<p>The standard practice is NFD("â") == NFD("a" + "^").<br>
To NFD, you can use some libraries.<br>
see also <a href="http://bibwild.wordpress.com/2013/11/19/benchmarking-ruby-unicode-normalization-alternatives/" class="external">http://bibwild.wordpress.com/2013/11/19/benchmarking-ruby-unicode-normalization-alternatives/</a></p> Ruby master - Feature #9111: Encoding-free String comparisonhttps://bugs.ruby-lang.org/issues/9111?journal_id=479812014-07-23T10:11:25Zduerst (Martin Dürst)duerst@it.aoyama.ac.jp
<ul><li><strong>Related to</strong> <i><a class="issue tracker-2 status-5 priority-4 priority-default closed" href="/issues/10084">Feature #10084</a>: Add Unicode String Normalization to String class</i> added</li></ul>