Project

General

Profile

Actions

Bug #15497

closed

Encoding of error messages should not depend on the locale encoding

Added by Eregon (Benoit Daloze) almost 3 years ago. Updated almost 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.6.0p0 (2018-12-25 revision 66547) [x86_64-linux]
[ruby-core:90853]

Description

This seems to happen mostly for internal errors, as raise in Ruby code of course just uses the passed String's encoding for the message.

Example:

name = "été"
p name.encoding

begin
  Module.new.const_set(name, 1)
rescue => e
  p e
  p e.message.encoding
end

When run, it gives:

$ LANG=en_US.UTF-8 ruby c.rb
#<Encoding:UTF-8>
#<NameError: wrong constant name été>
#<Encoding:UTF-8>

$ LANG=C ruby c.rb   
#<Encoding:UTF-8>
#<NameError: wrong constant name "\u00E9t\u00E9">
#<Encoding:US-ASCII>

Depending on the locale encoding, the encoding of the message changes!
This seems very unexpected, is inconvenient for testing (e.g., https://github.com/ruby/spec/commit/a6101a6e and any test checking exception messages with non-US-ASCII characters),
and does not represent what is in the source code (here it's clearly a valid UTF-8 String).

I think for such a case, the encoding of the constant name should be used, i.e., UTF-8.
Another way to see it is the message should be built like "wrong constant name ".force_encoding('us-ascii') + constant_name.
Indeed, if we do build the message manually like that it works as expected:

name = "été"
begin
  raise "wrong constant name ".force_encoding('US-ASCII') + name
rescue => e
  p e
  p e.message.encoding
end

gives

$ LANG=en_US.UTF-8 ruby c.rb
#<Encoding:UTF-8>
#<RuntimeError: wrong constant name été>
#<Encoding:UTF-8>

$ LANG=C ruby c.rb          
#<Encoding:UTF-8>
#<RuntimeError: wrong constant name \u00E9t\u00E9>
#<Encoding:UTF-8>

Note that the message still looks different, but that's the effect of Kernel#p, because it does not know how to display UTF-8 characters in a US-ASCII terminal.
Nevertheless, both messages have the same bytes and encoding, which fixes all 3 problems mentioned above.

Setting Encoding.default_internal can workaround this but it's a bad workaround as this cannot work reliably in a multithreaded Ruby application,
affects many more things than just error messages, and the default behavior should be error messages with a deterministic encoding, just like raise in Ruby code.

Updated by duerst (Martin Dürst) almost 3 years ago

I agree that the locale encoding should only be taken into account when the message is actually output, not as long as it is passed around inside Ruby.

Updated by nobu (Nobuyoshi Nakada) almost 3 years ago

  • Description updated (diff)

It is intended for not only different encoding characters, but also control characters, e.g., "\0".
The message is to display, and it is not good idea to show such chars directly, I guess.
And name is available for the bare purpose.

Actions #3

Updated by nobu (Nobuyoshi Nakada) almost 3 years ago

  • Status changed from Open to Closed

Applied in changeset trunk|r66753.


Defer escaping control char in error messages

  • eval_error.c (print_errinfo): defer escaping control char in error messages until writing to stderr, instead of quoting at building the message. [ruby-core:90853] [Bug #15497]
Actions

Also available in: Atom PDF