Project

General

Profile

Actions

Bug #15497

closed

Encoding of error messages should not depend on the locale encoding

Added by Eregon (Benoit Daloze) over 5 years ago. Updated over 5 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 2.6.0p0 (2018-12-25 revision 66547) [x86_64-linux]
[ruby-core:90853]

Description

This seems to happen mostly for internal errors, as raise in Ruby code of course just uses the passed String's encoding for the message.

Example:

name = "été"
p name.encoding

begin
  Module.new.const_set(name, 1)
rescue => e
  p e
  p e.message.encoding
end

When run, it gives:

$ LANG=en_US.UTF-8 ruby c.rb
#<Encoding:UTF-8>
#<NameError: wrong constant name été>
#<Encoding:UTF-8>

$ LANG=C ruby c.rb   
#<Encoding:UTF-8>
#<NameError: wrong constant name "\u00E9t\u00E9">
#<Encoding:US-ASCII>

Depending on the locale encoding, the encoding of the message changes!
This seems very unexpected, is inconvenient for testing (e.g., https://github.com/ruby/spec/commit/a6101a6e and any test checking exception messages with non-US-ASCII characters),
and does not represent what is in the source code (here it's clearly a valid UTF-8 String).

I think for such a case, the encoding of the constant name should be used, i.e., UTF-8.
Another way to see it is the message should be built like "wrong constant name ".force_encoding('us-ascii') + constant_name.
Indeed, if we do build the message manually like that it works as expected:

name = "été"
begin
  raise "wrong constant name ".force_encoding('US-ASCII') + name
rescue => e
  p e
  p e.message.encoding
end

gives

$ LANG=en_US.UTF-8 ruby c.rb
#<Encoding:UTF-8>
#<RuntimeError: wrong constant name été>
#<Encoding:UTF-8>

$ LANG=C ruby c.rb          
#<Encoding:UTF-8>
#<RuntimeError: wrong constant name \u00E9t\u00E9>
#<Encoding:UTF-8>

Note that the message still looks different, but that's the effect of Kernel#p, because it does not know how to display UTF-8 characters in a US-ASCII terminal.
Nevertheless, both messages have the same bytes and encoding, which fixes all 3 problems mentioned above.

Setting Encoding.default_internal can workaround this but it's a bad workaround as this cannot work reliably in a multithreaded Ruby application,
affects many more things than just error messages, and the default behavior should be error messages with a deterministic encoding, just like raise in Ruby code.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0