Ruby master - Feature #9111: Encoding-free String comparison</h1> <article> <h1>Ruby master - Feature #9111: Encoding-free String comparison</h1> <p>2013-11-14T23:17:57Z</p> <ul></ul><p>sawa (Tsuyoshi Sawada) wrote:</p> <blockquote> <p>I suggest that the comparison <code>String#<=></code> should not be based on the respective encoding of the strings, but all the strings should be internally converted to UTF-8 for the purpose of comparison.</p> </blockquote> <p>It's unacceptable to always convert all strings to UTF-8, should restrict to comparison with an ASCII-8BIT string.</p> </article> <article> <h1>Ruby master - Feature #9111: Encoding-free String comparison</h1> <p>2013-11-15T00:04:21Z</p> <ul></ul><p>Following nobu's suggestion, I came up with the following several possibilities:</p> <p>When two strings with different encodings are to be compared by <code>String#<=></code>, then one of the following options should be taken:</p> <ul> <li>Raise a Warning message</li> <li>Raise an error</li> <li>Convert one of the strings to the other one.</li> </ul> <p>I am not sure which option would be the best, but feel the feature should not be left as is now.</p> </article> <article> <h1>Ruby master - Feature #9111: Encoding-free String comparison</h1> <p>2013-11-15T05:20:18Z</p> <ul></ul><p>what about strings with the same encoding, but different content, but that is turned the same?<br> like "â" can be maked from "a" + "^" somehow, should they also treated as equal?</p> </article> <article> <h1>Ruby master - Feature #9111: Encoding-free String comparison</h1> <p>2013-11-15T14:41:49Z</p> <ul></ul><blockquote> <p>Hanmac: "â" can be maked from "a" + "^"</p> </blockquote> <p>Treating them the same is too much, I think. There are various marking methods. For example, <code>â</code> would have a different marking in TeX. Assuming them equal is going too much. They should be treated differently.</p> </article> <article> <h1>Ruby master - Feature #9111: Encoding-free String comparison</h1> <p>2013-11-15T17:15:40Z</p> <ul></ul><p>i found the wikipedia source: <a href="http://en.wikipedia.org/wiki/Combining_character" class="external">http://en.wikipedia.org/wiki/Combining_character</a><br> its not about treating "^a" or "a^" the same as "â" but there is a way to clue the chars together</p> <p>i think thats also a reason for <a href="http://api.rubyonrails.org/classes/String.html#method-i-mb_chars" class="external">http://api.rubyonrails.org/classes/String.html#method-i-mb_chars</a> ?</p> <p>i found another interesting gems <a href="http://rubygems.org/gems/unicode_utils" class="external">http://rubygems.org/gems/unicode_utils</a><br> with that is also possible to do something like this: "ä".upcase => "Ä"</p> <p>there is another page about combining character: <a href="http://sbp.so/supercombiner" class="external">http://sbp.so/supercombiner</a></p> </article> <article> <h1>Ruby master - Feature #9111: Encoding-free String comparison</h1> <p>2013-11-21T16:35:20Z</p> <ul></ul><p>Hanmac (Hans Mackowiak) wrote:</p> <blockquote> <p>what about strings with the same encoding, but different content, but that is turned the same?<br> like "â" can be maked from "a" + "^" somehow, should they also treated as equal?</p> </blockquote> <p>The standard practice is NFD("â") == NFD("a" + "^").<br> To NFD, you can use some libraries.<br> see also <a href="http://bibwild.wordpress.com/2013/11/19/benchmarking-ruby-unicode-normalization-alternatives/" class="external">http://bibwild.wordpress.com/2013/11/19/benchmarking-ruby-unicode-normalization-alternatives/</a></p> </article> <article> <h1>Ruby master - Feature #9111: Encoding-free String comparison</h1> <p>2014-07-23T10:11:25Z</p> <ul><li><strong>Related to</strong> <i><a class="issue tracker-2 status-5 priority-4 priority-default closed" href="/issues/10084">Feature #10084</a>: Add Unicode String Normalization to String class</i> added</li></ul> </article> </main></body></html>