Bug #13950
closedString#tr incorrectly marks strings as CR_7BIT
Description
String#tr has a curious bit of code attributable to r22547, dating back to Ruby 1.9.2. It seems to blindly change the calculated code range from CR_VALID
to CR_7BIT
:
From tr_trans
in string.c
:
if (cr == ENC_CODERANGE_VALID)
cr = ENC_CODERANGE_7BIT;
The net result of this is strings that can't possibly be CR_7BIT
simply by virtue of their encoding end up incorrectly be marked as CR_7BIT
. For example:
s = "b".encode("utf-16le")
from = "a-z".encode("utf-16le")
to = "*".encode("utf-16le")
result = s.tr(from, to)
p to
p to.encoding
p to.bytes
p to.ascii_only?
puts
p result
p result.encoding
p result.bytes
p result.ascii_only?
puts
p Encoding::UTF_16LE.ascii_compatible?
That produces the following output:
"*"
#<Encoding:UTF-16LE>
[42, 0]
false
"*"
#<Encoding:UTF-16LE>
[42, 0]
true
false
In this case, the original to
string is identical to the result
string. They have the same encoding and the same bytes. However, the result is marked as CR_7BIT
(indicated by the String#ascii_only?
value). UTF-16LE is not ASCII-compatible and should never have strings that are CR_7BIT
.
Updated by nirvdrum (Kevin Menard) about 7 years ago
For what it's worth, I may have the root cause of this wrong. It looks like setting to CR_7BIT
might be designed to help out the CHECK_IF_ASCII
macro. But that macro is invoked on a byte-by-byte basis, ignoring the result's encoding and the clen
value, both of which would help guide the correct code range. In this case, the two bytes being inspected (42 and 0) are both ASCII characters, but clen
is 2 so the bytes should be considered as a pair.
Updated by nobu (Nobuyoshi Nakada) about 7 years ago
- Status changed from Open to Closed
Applied in changeset trunk|r60060.
string.c: ASCII-incompatible is not ASCII only
- string.c (tr_trans): ASCII-incompatible encoding strings cannot
be ASCII-only even if valid. [ruby-core:83056] [Bug #13950]
Updated by nagachika (Tomoyuki Chikanaga) about 7 years ago
- Backport changed from 2.3: UNKNOWN, 2.4: UNKNOWN to 2.3: REQUIRED, 2.4: REQUIRED
Updated by nagachika (Tomoyuki Chikanaga) about 7 years ago
- Backport changed from 2.3: REQUIRED, 2.4: REQUIRED to 2.3: REQUIRED, 2.4: DONE
ruby_2_4 r61453 merged revision(s) 60060.
Updated by usa (Usaku NAKAMURA) almost 7 years ago
- Backport changed from 2.3: REQUIRED, 2.4: DONE to 2.3: DONE, 2.4: DONE
ruby_2_3 r62137 merged revision(s) 60060.