Project

General

Profile

Bug #13950

String#tr incorrectly marks strings as CR_7BIT

Added by nirvdrum (Kevin Menard) over 2 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.4.2p198 (2017-09-14 revision 59899) [x86_64-linux]
[ruby-core:83056]

Description

String#tr has a curious bit of code attributable to r22547, dating back to Ruby 1.9.2. It seems to blindly change the calculated code range from CR_VALID to CR_7BIT:

From tr_trans in string.c:

if (cr == ENC_CODERANGE_VALID)
    cr = ENC_CODERANGE_7BIT;

The net result of this is strings that can't possibly be CR_7BIT simply by virtue of their encoding end up incorrectly be marked as CR_7BIT. For example:

s = "b".encode("utf-16le")
from = "a-z".encode("utf-16le")
to = "*".encode("utf-16le")
result = s.tr(from, to)

p to
p to.encoding
p to.bytes
p to.ascii_only?

puts

p result
p result.encoding
p result.bytes
p result.ascii_only?

puts
p Encoding::UTF_16LE.ascii_compatible?

That produces the following output:

"*"
#<Encoding:UTF-16LE>
[42, 0]
false

"*"
#<Encoding:UTF-16LE>
[42, 0]
true

false

In this case, the original to string is identical to the result string. They have the same encoding and the same bytes. However, the result is marked as CR_7BIT (indicated by the String#ascii_only? value). UTF-16LE is not ASCII-compatible and should never have strings that are CR_7BIT.

Updated by nirvdrum (Kevin Menard) over 2 years ago

For what it's worth, I may have the root cause of this wrong. It looks like setting to CR_7BIT might be designed to help out the CHECK_IF_ASCII macro. But that macro is invoked on a byte-by-byte basis, ignoring the result's encoding and the clen value, both of which would help guide the correct code range. In this case, the two bytes being inspected (42 and 0) are both ASCII characters, but clen is 2 so the bytes should be considered as a pair.

#2

Updated by nobu (Nobuyoshi Nakada) over 2 years ago

  • Status changed from Open to Closed

Applied in changeset trunk|r60060.


string.c: ASCII-incompatible is not ASCII only

  • string.c (tr_trans): ASCII-incompatible encoding strings cannot be ASCII-only even if valid. [ruby-core:83056] [Bug #13950]
#3

Updated by nagachika (Tomoyuki Chikanaga) over 2 years ago

  • Backport changed from 2.3: UNKNOWN, 2.4: UNKNOWN to 2.3: REQUIRED, 2.4: REQUIRED

Updated by nagachika (Tomoyuki Chikanaga) over 2 years ago

  • Backport changed from 2.3: REQUIRED, 2.4: REQUIRED to 2.3: REQUIRED, 2.4: DONE

ruby_2_4 r61453 merged revision(s) 60060.

Updated by usa (Usaku NAKAMURA) over 2 years ago

  • Backport changed from 2.3: REQUIRED, 2.4: DONE to 2.3: DONE, 2.4: DONE

ruby_2_3 r62137 merged revision(s) 60060.

Also available in: Atom PDF