Project

General

Profile

Bug #15497

Updated by nobu (Nobuyoshi Nakada) almost 6 years ago

This seems to happen mostly for internal errors, as `raise` in Ruby code of course just uses the passed String's encoding for the message. 

 Example: 
 ```ruby 
 name = "été" 
 p name.encoding 

 begin 
   Module.new.const_set(name, 1) 
 rescue => e 
   p e 
   p e.message.encoding 
 end 
 ``` 

 When run, it gives: 
 ``` 
 $ LANG=en_US.UTF-8 ruby c.rb 
 #<Encoding:UTF-8> 
 #<NameError: wrong constant name été> 
 #<Encoding:UTF-8> 

 $ LANG=C ruby c.rb    
 #<Encoding:UTF-8> 
 #<NameError: wrong constant name "\u00E9t\u00E9"> 
 #<Encoding:US-ASCII> 
 ``` 

 Depending on the locale encoding, the encoding of the message changes! 
 This seems very unexpected, is inconvenient for testing (e.g., https://github.com/ruby/spec/commit/a6101a6e and any test checking exception messages with non-US-ASCII characters), 
 and does not represent what is in the source code (here it's clearly a valid UTF-8 String). 

 I think for such a case, the encoding of the constant name should be used, i.e., UTF-8. 
 Another way to see it is the message should be built like `"wrong constant name ".force_encoding('us-ascii') + constant_name`. 
 Indeed, if we do build the message manually like that it works as expected: 

 ```ruby ``` 
 name = "été" 
 begin 
   raise "wrong constant name ".force_encoding('US-ASCII') + name 
 rescue => e 
   p e 
   p e.message.encoding 
 end 
 ``` 
 gives 
 ``` 
 $ LANG=en_US.UTF-8 ruby c.rb 
 #<Encoding:UTF-8> 
 #<RuntimeError: wrong constant name été> 
 #<Encoding:UTF-8> 

 $ LANG=C ruby c.rb           
 #<Encoding:UTF-8> 
 #<RuntimeError: wrong constant name \u00E9t\u00E9> 
 #<Encoding:UTF-8> 
 ``` 

 Note that the message still looks different, but that's the effect of `Kernel#p`, because it does not know how to display UTF-8 characters in a US-ASCII terminal. 
 Nevertheless, both messages have the same bytes and encoding, which fixes all 3 problems mentioned above. 

 Setting `Encoding.default_internal` can workaround this but it's a bad workaround as this cannot work reliably in a multithreaded Ruby application, 
 affects many more things than just error messages, and the default behavior should be error messages with a deterministic encoding, just like `raise` in Ruby code.

Back