Bug #566

String encoding error messages are inconsistent

Added by Michael Selig over 3 years ago. Updated 10 months ago.

[ruby-core:18600]
Status:Closed Start date:09/15/2008
Priority:Normal Due date:
Assignee:Yui NARUSE % Done:

100%

Category:M17N
Target version:1.9.1 Release Candidate
ruby -v:

Description

Please compare:
"abc".encode("UTF-16BE") << "abc"
==> EncodingCompatibilityError: incompatible character encodings: UTF-16BE and US-ASCII
and:
"abc".encode("UTF-16BE") =~ /abc/
==> ArgumentError: incompatible encoding regexp match (US-ASCII regexp with UTF-16BE string)

also handling of broken (illegal) string encodings is not consistent:
"abc".force_encoding("UTF-16BE") =~ /abc/
==> ArgumentError: broken UTF-16BE string
and:
"abc".force_encoding("UTF-16BE") == "abc"
==> false (no error)
and:
"abc".encode("UTF-16BE").count("b".force_encoding("UTF-16BE"))
==> ArgumentError: invalid byte sequence in UTF-16BE

Associated revisions

Revision 20626
Added by Yukihiro Matsumoto about 3 years ago

* re.c (reg_enc_error): raise EncodingCompatibilityError for encoding incompatibility. [ruby-core:18600] * re.c (rb_reg_prepare_enc): more consistent error message. [ruby-core:18611]

History

Updated by Yukihiro Matsumoto over 3 years ago

Hi,

In message "Re: [ruby-core:18600] [Bug #566] String encoding error messages are inconsistent"
    on Mon, 15 Sep 2008 15:50:17 +0900, Michael Selig <redmine@ruby-lang.org> writes:

|Please compare:
|"abc".encode("UTF-16BE") << "abc"
|==> EncodingCompatibilityError: incompatible character encodings: UTF-16BE and US-ASCII
|and:
|"abc".encode("UTF-16BE") =~ /abc/
|==> ArgumentError: incompatible encoding regexp match (US-ASCII regexp with UTF-16BE string)
|
|also handling of broken (illegal) string encodings is not consistent:
|"abc".force_encoding("UTF-16BE") =~ /abc/
|==> ArgumentError: broken UTF-16BE string
|and:
|"abc".force_encoding("UTF-16BE") == "abc"
|==> false (no error)
|and:
|"abc".encode("UTF-16BE").count("b".force_encoding("UTF-16BE"))
|==> ArgumentError: invalid byte sequence in UTF-16BE

I am not sure what you mean by "inconsistent".  What are your ideal
messages (or behavior) for each case?

							matz.

Updated by Koichi Sasada over 3 years ago

  • Assignee set to Yui NARUSE

Updated by Yuki Sonoda over 3 years ago

  • Category set to M17N
  • Target version set to 1.9.1 Release Candidate

Updated by Yukihiro Matsumoto about 3 years ago

Hi,

Sorry for being late.

In message "Re: [ruby-core:18611] Re: [Bug #566] String encoding error messages are inconsistent"
    on Tue, 16 Sep 2008 07:38:13 +0900, "Michael Selig" <michael.selig@fs.com.au> writes:

|I would expect these to both be "EncodingCompatibilityError"

OK, I will.

|> |also handling of broken (illegal) string encodings is not consistent:
|> |"abc".force_encoding("UTF-16BE") =~ /abc/
|> |==> ArgumentError: broken UTF-16BE string
|> |and:
|> |"abc".force_encoding("UTF-16BE") == "abc"
|> |==> false (no error)
|> |and:
|> |"abc".encode("UTF-16BE").count("b".force_encoding("UTF-16BE"))
|> |==> ArgumentError: invalid byte sequence in UTF-16BE
|
|I guess in this group there are 2 issues:
|1) (This is minor) I would expect both error messages to have the same  
|text - I think the "invalid byte sequence in XXX" is the better.
|2) It seems inconsistent to me that the 1st & 2nd expressions look almost  
|the same as each other (a regexp match & a string compare) yet only the  
|regexp match raises an error.

(1) I changed messages more consistent.
(2) "=~" and "==" are different operation.  The former requires more
    precondition than the latter, e.g. type of both operands should be
    strings or regexps, and they should be encoding compatible.  On
    the other hands, the latter .  They should give either true or
    false, and should not raise any exception.

							matz.

Updated by Yukihiro Matsumoto about 3 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100
Applied in changeset r20626.

Also available in: Atom PDF