Bug #18955: Kernel#sprintf - %c ignores a non-ASCII character's encoding - Ruby - Ruby Issue Tracking System

Bug #18955

Updated by andrykonchin (Andrew Konchin) over 3 years ago

I haven't found any similar existing issue to decided to create a new one. 

 I noticed that `sprintf("%c", string)` doesn't handle (in an expected way) a case when encodings of format sequence and string argument aren't the same and string contains non-ASCII character. 

 In this case it seems to me that `sprintf` uses just binary representation of a character and assigns (or interprets with) encoding of the format sequence string. 

 I would expect that `sprintf` negotiates encoding and converts everything (the the character and from the format string) string's encoding to the chosen one. format's encoding. 

 Examples to illustrate this behavior: 

 ```ruby 
 format = "%c".encode("Windows-1251") 
 string = "Й".encode(Encoding::KOI8_U) 
 r = sprintf(format, string) 
 r.encoding 
 # => #<Encoding:Windows-1251> 

 r == "Й".encode("Windows-1251") 
 # => false 

 r.codepoints 
 # => [234] 
 string.codepoints 
 # => [234] 
 ``` 

 In this example the result's encoding is a format's encoding. But codepoint isn't changed and equals a codepoint of the character in the original string's encoding. But it should be different: 

 ```ruby 
 "Й".encode("Windows-1251").codepoints 
 # => [201] 
 ``` 

 Another example: 

 ```ruby 
 string = "À".encode(Encoding::CP1252) 
 sprintf("%c", string) 
 # => in `sprintf': invalid byte sequence in UTF-8 (ArgumentError) 
 ``` 

 In this example the error means that `sprintf` doesn't encode properly a codepoint (of string's encoding) in UTF-8. It uses just raw bytes.

Back

Project

General

Profile

Ruby

Bug #18955