Project

General

Profile

Feature #9111

Encoding-free String comparison

Added by sawa (Tsuyoshi Sawada) about 7 years ago. Updated about 7 years ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:58337]

Description

=begin
Currently, strings with the same content but with different encodings count as different strings. This causes strange behaviour as below (noted in StackOverflow question http://stackoverflow.com/questions/19977788/strange-behavior-in-packed-ruby-strings#19978206):

[128].pack("C")             # => "\x80"
[128].pack("C") == "\x80"   # => false

Since [128].pack("C") has the encoding ASCII-8BIT and "\x80" (by default) has the encoding UTF-8, the two strings are not equal.

Also, comparison of strings with different encodings may end up with a messy, unintended result.

I suggest that the comparison String#<=> should not be based on the respective encoding of the strings, but all the strings should be internally converted to UTF-8 for the purpose of comparison.

=end


Related issues

Related to CommonRuby - Feature #10084: Add Unicode String Normalization to String classClosedduerst (Martin Dürst)07/23/2014Actions

Also available in: Atom PDF