Project

General

Profile

Bug #13949

Updated by nirvdrum (Kevin Menard) about 7 years ago

I've noticed that `String#unpack` String#unpack with the `'M'` 'M' directive can create strings that should be `CR_7BIT` CR_7BIT as `CR_VALID`. CR_VALID. The issue appears to have been introduced in r30542, which assumes that all `ASCII-8BIT` ASCII-8BIT strings must be `CR_VALID`. CR_VALID. It's possible this was correct back during Ruby 1.9.3 development and just wasn't updated. I'm not familiar enough with the history to tell. 

 A simple reproduction showing the issue is: 

 ``` 
 res = '0123456789=\n'.unpack('M').first 
 p res 
 p res.encoding 
 p res.bytes 
 p res.ascii_only? 

 puts 

 packed = res.bytes.pack('c*') 
 p packed 
 p packed.encoding 
 p packed.bytes 
 p packed.ascii_only? 

 ``` 

 This yields the following output: 

 ``` 
 "0123456789=\\n" 
 #<Encoding:ASCII-8BIT> 
 [48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 61, 92, 110] 
 false 

 "0123456789=\\n" 
 #<Encoding:ASCII-8BIT> 
 [48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 61, 92, 110] 
 true 
 ``` 

 Both strings have exactly the same contents with the same encoding. But, depending on how you construct them, one is consider to be `CR_7BIT` value CR_7BIT (indicated by the `String#ascii_only?` output), String#ascii_only?) value, and one is considered to be `CR_VALID`. CR_VALID. I believe `CR_7BIT` CR_7BIT is the correct code range value in this situation.

Back