Actions
Bug #13950
closedString#tr incorrectly marks strings as CR_7BIT
Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 2.4.2p198 (2017-09-14 revision 59899) [x86_64-linux]
Backport:
Description
String#tr has a curious bit of code attributable to r22547, dating back to Ruby 1.9.2. It seems to blindly change the calculated code range from CR_VALID
to CR_7BIT
:
From tr_trans
in string.c
:
if (cr == ENC_CODERANGE_VALID)
cr = ENC_CODERANGE_7BIT;
The net result of this is strings that can't possibly be CR_7BIT
simply by virtue of their encoding end up incorrectly be marked as CR_7BIT
. For example:
s = "b".encode("utf-16le")
from = "a-z".encode("utf-16le")
to = "*".encode("utf-16le")
result = s.tr(from, to)
p to
p to.encoding
p to.bytes
p to.ascii_only?
puts
p result
p result.encoding
p result.bytes
p result.ascii_only?
puts
p Encoding::UTF_16LE.ascii_compatible?
That produces the following output:
"*"
#<Encoding:UTF-16LE>
[42, 0]
false
"*"
#<Encoding:UTF-16LE>
[42, 0]
true
false
In this case, the original to
string is identical to the result
string. They have the same encoding and the same bytes. However, the result is marked as CR_7BIT
(indicated by the String#ascii_only?
value). UTF-16LE is not ASCII-compatible and should never have strings that are CR_7BIT
.
Actions
Like0
Like0Like0Like0Like0Like0