Bug #15497
Updated by nobu (Nobuyoshi Nakada) almost 6 years ago
This seems to happen mostly for internal errors, as `raise` in Ruby code of course just uses the passed String's encoding for the message. Example: ```ruby name = "été" p name.encoding begin Module.new.const_set(name, 1) rescue => e p e p e.message.encoding end ``` When run, it gives: ``` $ LANG=en_US.UTF-8 ruby c.rb #<Encoding:UTF-8> #<NameError: wrong constant name été> #<Encoding:UTF-8> $ LANG=C ruby c.rb #<Encoding:UTF-8> #<NameError: wrong constant name "\u00E9t\u00E9"> #<Encoding:US-ASCII> ``` Depending on the locale encoding, the encoding of the message changes! This seems very unexpected, is inconvenient for testing (e.g., https://github.com/ruby/spec/commit/a6101a6e and any test checking exception messages with non-US-ASCII characters), and does not represent what is in the source code (here it's clearly a valid UTF-8 String). I think for such a case, the encoding of the constant name should be used, i.e., UTF-8. Another way to see it is the message should be built like `"wrong constant name ".force_encoding('us-ascii') + constant_name`. Indeed, if we do build the message manually like that it works as expected: ```ruby ``` name = "été" begin raise "wrong constant name ".force_encoding('US-ASCII') + name rescue => e p e p e.message.encoding end ``` gives ``` $ LANG=en_US.UTF-8 ruby c.rb #<Encoding:UTF-8> #<RuntimeError: wrong constant name été> #<Encoding:UTF-8> $ LANG=C ruby c.rb #<Encoding:UTF-8> #<RuntimeError: wrong constant name \u00E9t\u00E9> #<Encoding:UTF-8> ``` Note that the message still looks different, but that's the effect of `Kernel#p`, because it does not know how to display UTF-8 characters in a US-ASCII terminal. Nevertheless, both messages have the same bytes and encoding, which fixes all 3 problems mentioned above. Setting `Encoding.default_internal` can workaround this but it's a bad workaround as this cannot work reliably in a multithreaded Ruby application, affects many more things than just error messages, and the default behavior should be error messages with a deterministic encoding, just like `raise` in Ruby code.