Bug #12431: Strange behavior of String#encode('UTF-8', 'UTF-8', ...) when the encoding of the source string is not UTF-8 - Ruby - Ruby Issue Tracking System

Actions

Bug #12431

closed

Strange behavior of String#encode('UTF-8', 'UTF-8', ...) when the encoding of the source string is not UTF-8

Bug #12431: Strange behavior of String#encode('UTF-8', 'UTF-8', ...) when the encoding of the source string is not UTF-8

Added by pdg137 (Paul Grayson) over 9 years ago. Updated about 9 years ago.

Status:

Closed

Assignee:

Target version:

ruby -v:

ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-linux]

Backport:

2.1: WONTFIX, 2.2: DONE, 2.3: DONE

[ruby-core:75732]

Description

When the dst_encoding and src_encoding options of String#encode are the same, it appears to ignore the encoding given and instead operate on the actual encoding of the string. Examples:

"abcdÁ".force_encoding('ASCII').encode('UTF-8', 'UTF-8', invalid: :replace, undef: :replace)
=> "abcd??"

"abcdÁ".force_encoding('ASCII').encode('UTF-8', 'UTF-8', invalid: :replace, undef: :replace, replace: '�')
Encoding::CompatibilityError: incompatible character encodings: US-ASCII and UTF-8

"abcdÁ\xff".encode('ASCII', 'ASCII', invalid: :replace, undef: :replace).force_encoding('UTF-8')
=> "abcdÁ�"

Also, without the "replace" options, exceptions are not raised as they should be:

"\xff".force_encoding('ASCII').encode('UTF-8', 'UTF-8')
=> "\xFF"

I looked a little at the code, and I think the problem might be in this block where the given string is passed to rb_str_scrub without any other encoding information.

What I would expect is for s.dup.force_encoding('X').encode('Y', opts) to behave identically to s.encode('Y', 'X', opts), but that is clearly not the case.

Verified on Ruby 2.1.5, 2.3.0, and 2.3.1.

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Tags

Custom queries

Bug #12431

Strange behavior of String#encode('UTF-8', 'UTF-8', ...) when the encoding of the source string is not UTF-8

Updated by nobu (Nobuyoshi Nakada) over 9 years ago Actions
Copy link
#1

Updated by usa (Usaku NAKAMURA) over 9 years ago Actions
Copy link
#2 [ruby-core:75899]

Updated by nagachika (Tomoyuki Chikanaga) about 9 years ago Actions
Copy link
#3 [ruby-core:76885]

Updated by usa (Usaku NAKAMURA) about 9 years ago Actions
Copy link
#4 [ruby-core:76926]

Updated by nobu (Nobuyoshi Nakada) about 8 years ago Actions
Copy link
#5

Updated by duerst (Martin Dürst) almost 7 years ago Actions
Copy link
#6

	Related to Ruby - Bug #13874: String#valid_encoding? has side effects	Closed		Actions
	Related to Ruby - Bug #8123: Transcoding exception when using replace along with universal_newline	Closed		Actions

Project

General

Profile

Ruby

Tags

Custom queries

Bug #12431

Strange behavior of String#encode('UTF-8', 'UTF-8', ...) when the encoding of the source string is not UTF-8

Updated by nobu (Nobuyoshi Nakada) over 9 years ago ActionsCopy link #1

Updated by usa (Usaku NAKAMURA) over 9 years ago ActionsCopy link #2 [ruby-core:75899]

Updated by nagachika (Tomoyuki Chikanaga) about 9 years ago ActionsCopy link #3 [ruby-core:76885]

Updated by usa (Usaku NAKAMURA) about 9 years ago ActionsCopy link #4 [ruby-core:76926]

Updated by nobu (Nobuyoshi Nakada) about 8 years ago ActionsCopy link #5

Updated by duerst (Martin Dürst) almost 7 years ago ActionsCopy link #6

Updated by nobu (Nobuyoshi Nakada) over 9 years ago Actions
Copy link
#1

Updated by usa (Usaku NAKAMURA) over 9 years ago Actions
Copy link
#2 [ruby-core:75899]

Updated by nagachika (Tomoyuki Chikanaga) about 9 years ago Actions
Copy link
#3 [ruby-core:76885]

Updated by usa (Usaku NAKAMURA) about 9 years ago Actions
Copy link
#4 [ruby-core:76926]

Updated by nobu (Nobuyoshi Nakada) about 8 years ago Actions
Copy link
#5

Updated by duerst (Martin Dürst) almost 7 years ago Actions
Copy link
#6