Bug #13950
closedString#tr incorrectly marks strings as CR_7BIT
Description
String#tr has a curious bit of code attributable to r22547, dating back to Ruby 1.9.2. It seems to blindly change the calculated code range from CR_VALID to CR_7BIT:
From tr_trans in string.c:
if (cr == ENC_CODERANGE_VALID)
    cr = ENC_CODERANGE_7BIT;
The net result of this is strings that can't possibly be CR_7BIT simply by virtue of their encoding end up incorrectly be marked as CR_7BIT. For example:
s = "b".encode("utf-16le")
from = "a-z".encode("utf-16le")
to = "*".encode("utf-16le")
result = s.tr(from, to)
p to
p to.encoding
p to.bytes
p to.ascii_only?
puts
p result
p result.encoding
p result.bytes
p result.ascii_only?
puts
p Encoding::UTF_16LE.ascii_compatible?
That produces the following output:
"*"
#<Encoding:UTF-16LE>
[42, 0]
false
"*"
#<Encoding:UTF-16LE>
[42, 0]
true
false
In this case, the original to string is identical to the result string. They have the same encoding and the same bytes. However, the result is marked as CR_7BIT (indicated by the String#ascii_only? value). UTF-16LE is not ASCII-compatible and should never have strings that are CR_7BIT.
        
           Updated by nirvdrum (Kevin Menard) about 8 years ago
          Updated by nirvdrum (Kevin Menard) about 8 years ago
          
          
        
        
      
      For what it's worth, I may have the root cause of this wrong. It looks like setting to CR_7BIT might be designed to help out the CHECK_IF_ASCII macro. But that macro is invoked on a byte-by-byte basis, ignoring the result's encoding and the clen value, both of which would help guide the correct code range. In this case, the two bytes being inspected (42 and 0) are both ASCII characters, but clen is 2 so the bytes should be considered as a pair.
        
           Updated by nobu (Nobuyoshi Nakada) about 8 years ago
          Updated by nobu (Nobuyoshi Nakada) about 8 years ago
          
          
        
        
      
      - Status changed from Open to Closed
Applied in changeset trunk|r60060.
string.c: ASCII-incompatible is not ASCII only
- string.c (tr_trans): ASCII-incompatible encoding strings cannot
 be ASCII-only even if valid. [ruby-core:83056] [Bug #13950]
        
           Updated by nagachika (Tomoyuki Chikanaga) about 8 years ago
          Updated by nagachika (Tomoyuki Chikanaga) about 8 years ago
          
          
        
        
      
      - Backport changed from 2.3: UNKNOWN, 2.4: UNKNOWN to 2.3: REQUIRED, 2.4: REQUIRED
        
           Updated by nagachika (Tomoyuki Chikanaga) almost 8 years ago
          Updated by nagachika (Tomoyuki Chikanaga) almost 8 years ago
          
          
        
        
      
      - Backport changed from 2.3: REQUIRED, 2.4: REQUIRED to 2.3: REQUIRED, 2.4: DONE
ruby_2_4 r61453 merged revision(s) 60060.
        
           Updated by usa (Usaku NAKAMURA) over 7 years ago
          Updated by usa (Usaku NAKAMURA) over 7 years ago
          
          
        
        
      
      - Backport changed from 2.3: REQUIRED, 2.4: DONE to 2.3: DONE, 2.4: DONE
ruby_2_3 r62137 merged revision(s) 60060.