Bug #15497
closedEncoding of error messages should not depend on the locale encoding
Description
This seems to happen mostly for internal errors, as raise
in Ruby code of course just uses the passed String's encoding for the message.
Example:
name = "été"
p name.encoding
begin
Module.new.const_set(name, 1)
rescue => e
p e
p e.message.encoding
end
When run, it gives:
$ LANG=en_US.UTF-8 ruby c.rb
#<Encoding:UTF-8>
#<NameError: wrong constant name été>
#<Encoding:UTF-8>
$ LANG=C ruby c.rb
#<Encoding:UTF-8>
#<NameError: wrong constant name "\u00E9t\u00E9">
#<Encoding:US-ASCII>
Depending on the locale encoding, the encoding of the message changes!
This seems very unexpected, is inconvenient for testing (e.g., https://github.com/ruby/spec/commit/a6101a6e and any test checking exception messages with non-US-ASCII characters),
and does not represent what is in the source code (here it's clearly a valid UTF-8 String).
I think for such a case, the encoding of the constant name should be used, i.e., UTF-8.
Another way to see it is the message should be built like "wrong constant name ".force_encoding('us-ascii') + constant_name
.
Indeed, if we do build the message manually like that it works as expected:
name = "été"
begin
raise "wrong constant name ".force_encoding('US-ASCII') + name
rescue => e
p e
p e.message.encoding
end
gives
$ LANG=en_US.UTF-8 ruby c.rb
#<Encoding:UTF-8>
#<RuntimeError: wrong constant name été>
#<Encoding:UTF-8>
$ LANG=C ruby c.rb
#<Encoding:UTF-8>
#<RuntimeError: wrong constant name \u00E9t\u00E9>
#<Encoding:UTF-8>
Note that the message still looks different, but that's the effect of Kernel#p
, because it does not know how to display UTF-8 characters in a US-ASCII terminal.
Nevertheless, both messages have the same bytes and encoding, which fixes all 3 problems mentioned above.
Setting Encoding.default_internal
can workaround this but it's a bad workaround as this cannot work reliably in a multithreaded Ruby application,
affects many more things than just error messages, and the default behavior should be error messages with a deterministic encoding, just like raise
in Ruby code.
Updated by duerst (Martin Dürst) about 6 years ago
I agree that the locale encoding should only be taken into account when the message is actually output, not as long as it is passed around inside Ruby.
Updated by nobu (Nobuyoshi Nakada) about 6 years ago
- Description updated (diff)
It is intended for not only different encoding characters, but also control characters, e.g., "\0".
The message
is to display, and it is not good idea to show such chars directly, I guess.
And name
is available for the bare purpose.
Updated by nobu (Nobuyoshi Nakada) about 6 years ago
- Status changed from Open to Closed
Applied in changeset trunk|r66753.
Defer escaping control char in error messages
- eval_error.c (print_errinfo): defer escaping control char in
error messages until writing to stderr, instead of quoting at
building the message. [ruby-core:90853] [Bug #15497]