Project

General

Profile

Bug #12431

Strange behavior of String#encode('UTF-8', 'UTF-8', ...) when the encoding of the source string is not UTF-8

Added by pdg137 (Paul Grayson) about 3 years ago. Updated almost 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-linux]
[ruby-core:75732]

Description

When the dst_encoding and src_encoding options of String#encode are the same, it appears to ignore the encoding given and instead operate on the actual encoding of the string. Examples:

"abcdÁ".force_encoding('ASCII').encode('UTF-8', 'UTF-8', invalid: :replace, undef: :replace)
=> "abcd??"

"abcdÁ".force_encoding('ASCII').encode('UTF-8', 'UTF-8', invalid: :replace, undef: :replace, replace: '�')
Encoding::CompatibilityError: incompatible character encodings: US-ASCII and UTF-8

"abcdÁ\xff".encode('ASCII', 'ASCII', invalid: :replace, undef: :replace).force_encoding('UTF-8')
=> "abcdÁ�"

Also, without the "replace" options, exceptions are not raised as they should be:

"\xff".force_encoding('ASCII').encode('UTF-8', 'UTF-8')
=> "\xFF"

I looked a little at the code, and I think the problem might be in this block where the given string is passed to rb_str_scrub without any other encoding information.

What I would expect is for s.dup.force_encoding('X').encode('Y', opts) to behave identically to s.encode('Y', 'X', opts), but that is clearly not the case.

Verified on Ruby 2.1.5, 2.3.0, and 2.3.1.


Related issues

Related to Ruby master - Bug #13874: String#valid_encoding? has side effectsClosedActions
Related to Ruby master - Bug #8123: Transcoding exception when using replace along with universal_newlineClosedActions

Associated revisions

Revision 4fad63da
Added by nobu (Nobuyoshi Nakada) about 3 years ago

transcode.c: scrub in the given encoding

  • transcode.c (str_transcode0): scrub in the given encoding when the source encoding is given, not in the encoding of the receiver. [ruby-core:75732] [Bug #12431]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55181 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 55181
Added by nobu (Nobuyoshi Nakada) about 3 years ago

transcode.c: scrub in the given encoding

  • transcode.c (str_transcode0): scrub in the given encoding when the source encoding is given, not in the encoding of the receiver. [ruby-core:75732] [Bug #12431]

Revision 55181
Added by nobu (Nobuyoshi Nakada) about 3 years ago

transcode.c: scrub in the given encoding

  • transcode.c (str_transcode0): scrub in the given encoding when the source encoding is given, not in the encoding of the receiver. [ruby-core:75732] [Bug #12431]

Revision 55181
Added by nobu (Nobuyoshi Nakada) about 3 years ago

transcode.c: scrub in the given encoding

  • transcode.c (str_transcode0): scrub in the given encoding when the source encoding is given, not in the encoding of the receiver. [ruby-core:75732] [Bug #12431]

Revision 9b71251e
Added by nagachika (Tomoyuki Chikanaga) almost 3 years ago

merge revision(s) 55181: [Backport #12431]

    * transcode.c (str_transcode0): scrub in the given encoding when
      the source encoding is given, not in the encoding of the
      receiver.  [ruby-core:75732] [Bug #12431]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_3@55905 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 55905
Added by nagachika (Tomoyuki Chikanaga) almost 3 years ago

merge revision(s) 55181: [Backport #12431]

* transcode.c (str_transcode0): scrub in the given encoding when
  the source encoding is given, not in the encoding of the
  receiver.  [ruby-core:75732] [Bug #12431]

Revision b76d7aff
Added by usa (Usaku NAKAMURA) almost 3 years ago

merge revision(s) 55181: [Backport #12431]

    * transcode.c (str_transcode0): scrub in the given encoding when
      the source encoding is given, not in the encoding of the
      receiver.  [ruby-core:75732] [Bug #12431]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_2@55936 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 55936
Added by usa (Usaku NAKAMURA) almost 3 years ago

merge revision(s) 55181: [Backport #12431]

* transcode.c (str_transcode0): scrub in the given encoding when
  the source encoding is given, not in the encoding of the
  receiver.  [ruby-core:75732] [Bug #12431]

History

#1

Updated by nobu (Nobuyoshi Nakada) about 3 years ago

  • Status changed from Open to Closed

Applied in changeset r55181.


transcode.c: scrub in the given encoding

  • transcode.c (str_transcode0): scrub in the given encoding when the source encoding is given, not in the encoding of the receiver. [ruby-core:75732] [Bug #12431]

Updated by usa (Usaku NAKAMURA) about 3 years ago

  • Backport changed from 2.1: UNKNOWN, 2.2: UNKNOWN, 2.3: UNKNOWN to 2.1: WONTFIX, 2.2: REQUIRED, 2.3: REQUIRED

Updated by nagachika (Tomoyuki Chikanaga) almost 3 years ago

  • Backport changed from 2.1: WONTFIX, 2.2: REQUIRED, 2.3: REQUIRED to 2.1: WONTFIX, 2.2: REQUIRED, 2.3: DONE

ruby_2_3 r55905 merged revision(s) 55181.

Updated by usa (Usaku NAKAMURA) almost 3 years ago

  • Backport changed from 2.1: WONTFIX, 2.2: REQUIRED, 2.3: DONE to 2.1: WONTFIX, 2.2: DONE, 2.3: DONE

ruby_2_2 r55936 merged revision(s) 55181.

#5

Updated by nobu (Nobuyoshi Nakada) almost 2 years ago

  • Related to Bug #13874: String#valid_encoding? has side effects added
#6

Updated by duerst (Martin Dürst) 7 months ago

  • Related to Bug #8123: Transcoding exception when using replace along with universal_newline added

Also available in: Atom PDF