Project

General

Profile

Actions

Bug #13950

closed

String#tr incorrectly marks strings as CR_7BIT

Added by nirvdrum (Kevin Menard) about 7 years ago. Updated almost 7 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 2.4.2p198 (2017-09-14 revision 59899) [x86_64-linux]
[ruby-core:83056]

Description

String#tr has a curious bit of code attributable to r22547, dating back to Ruby 1.9.2. It seems to blindly change the calculated code range from CR_VALID to CR_7BIT:

From tr_trans in string.c:

if (cr == ENC_CODERANGE_VALID)
    cr = ENC_CODERANGE_7BIT;

The net result of this is strings that can't possibly be CR_7BIT simply by virtue of their encoding end up incorrectly be marked as CR_7BIT. For example:

s = "b".encode("utf-16le")
from = "a-z".encode("utf-16le")
to = "*".encode("utf-16le")
result = s.tr(from, to)

p to
p to.encoding
p to.bytes
p to.ascii_only?

puts

p result
p result.encoding
p result.bytes
p result.ascii_only?

puts
p Encoding::UTF_16LE.ascii_compatible?

That produces the following output:

"*"
#<Encoding:UTF-16LE>
[42, 0]
false

"*"
#<Encoding:UTF-16LE>
[42, 0]
true

false

In this case, the original to string is identical to the result string. They have the same encoding and the same bytes. However, the result is marked as CR_7BIT (indicated by the String#ascii_only? value). UTF-16LE is not ASCII-compatible and should never have strings that are CR_7BIT.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0