Project

General

Profile

Backport #6190

String#encode return string containing invalid chars but marked as valid

Added by pplr (Pierre PLR) over 7 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
[ruby-core:43557]

Description

a = " \xE9 ".encode('UTF-8', 'UTF-8', :invalid => :replace, :replace => "?")
a.valid_encoding?
=> true
a
=> " \xE9 "
a.squeeze
ArgumentError: invalid byte sequence in UTF-8
from (irb):32:in squeeze'
from (irb):32
from /usr/bin/irb:12:in
'

The expected string is " ? ", as the documentation for the ":replace" options says :
If the value is :replace, encode replaces invalid byte sequences in str with the replacement character.

Associated revisions

Revision 2b846261
Added by nobu (Nobuyoshi Nakada) over 7 years ago

  • transcode.c (str_encode_bang, encoded_dup): if nothing was transcoded, just set encoding but leave coderange unchanged as forcee_encoding. [ruby-core:43557][Bug #6190]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@35112 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 35112
Added by nobu (Nobuyoshi Nakada) over 7 years ago

  • transcode.c (str_encode_bang, encoded_dup): if nothing was transcoded, just set encoding but leave coderange unchanged as forcee_encoding. [ruby-core:43557][Bug #6190]

Revision 35112
Added by nobu (Nobuyoshi Nakada) over 7 years ago

  • transcode.c (str_encode_bang, encoded_dup): if nothing was transcoded, just set encoding but leave coderange unchanged as forcee_encoding. [ruby-core:43557][Bug #6190]

Revision 35112
Added by nobu (Nobuyoshi Nakada) over 7 years ago

  • transcode.c (str_encode_bang, encoded_dup): if nothing was transcoded, just set encoding but leave coderange unchanged as forcee_encoding. [ruby-core:43557][Bug #6190]

Revision 35112
Added by nobu (Nobuyoshi Nakada) over 7 years ago

  • transcode.c (str_encode_bang, encoded_dup): if nothing was transcoded, just set encoding but leave coderange unchanged as forcee_encoding. [ruby-core:43557][Bug #6190]

Revision 35112
Added by nobu (Nobuyoshi Nakada) over 7 years ago

  • transcode.c (str_encode_bang, encoded_dup): if nothing was transcoded, just set encoding but leave coderange unchanged as forcee_encoding. [ruby-core:43557][Bug #6190]

Revision 35112
Added by nobu (Nobuyoshi Nakada) over 7 years ago

  • transcode.c (str_encode_bang, encoded_dup): if nothing was transcoded, just set encoding but leave coderange unchanged as forcee_encoding. [ruby-core:43557][Bug #6190]

Revision 463633e4
Added by duerst (Martin Dürst) over 7 years ago

transcode.c (documentation for str_encode): Explain
that transcoding to the same encoding is a no-op
(i.e. no exceptions, no replacements,...).
[ruby-core:43557][Bug #6190]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@35121 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 35121
Added by duerst (Martin Dürst) over 7 years ago

transcode.c (documentation for str_encode): Explain
that transcoding to the same encoding is a no-op
(i.e. no exceptions, no replacements,...).
[ruby-core:43557][Bug #6190]

Revision 35121
Added by duerst (Martin Dürst) over 7 years ago

transcode.c (documentation for str_encode): Explain
that transcoding to the same encoding is a no-op
(i.e. no exceptions, no replacements,...).
[ruby-core:43557][Bug #6190]

Revision 35121
Added by duerst (Martin Dürst) over 7 years ago

transcode.c (documentation for str_encode): Explain
that transcoding to the same encoding is a no-op
(i.e. no exceptions, no replacements,...).
[ruby-core:43557][Bug #6190]

Revision 35121
Added by duerst (Martin Dürst) over 7 years ago

transcode.c (documentation for str_encode): Explain
that transcoding to the same encoding is a no-op
(i.e. no exceptions, no replacements,...).
[ruby-core:43557][Bug #6190]

Revision 35121
Added by duerst (Martin Dürst) over 7 years ago

transcode.c (documentation for str_encode): Explain
that transcoding to the same encoding is a no-op
(i.e. no exceptions, no replacements,...).
[ruby-core:43557][Bug #6190]

Revision 35121
Added by duerst (Martin Dürst) over 7 years ago

transcode.c (documentation for str_encode): Explain
that transcoding to the same encoding is a no-op
(i.e. no exceptions, no replacements,...).
[ruby-core:43557][Bug #6190]

Revision be36df7f
Added by usa (Usaku NAKAMURA) over 6 years ago

merge revision(s) 35112,35121: [Backport #6190]

    * transcode.c (str_encode_bang, encoded_dup): if nothing was
      transcoded, just set encoding but leave coderange unchanged as
      forcee_encoding.  [ruby-core:43557][Bug #6190]

    * transcode.c (documentation for str_encode): Explain
      that transcoding to the same encoding is a no-op
      (i.e. no exceptions, no replacements,...).
      [ruby-core:43557][Bug #6190]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_1_9_3@40056 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 40056
Added by usa (Usaku NAKAMURA) over 6 years ago

merge revision(s) 35112,35121: [Backport #6190]

* transcode.c (str_encode_bang, encoded_dup): if nothing was
  transcoded, just set encoding but leave coderange unchanged as
  forcee_encoding.  [ruby-core:43557][Bug #6190]

* transcode.c (documentation for str_encode): Explain
  that transcoding to the same encoding is a no-op
  (i.e. no exceptions, no replacements,...).
  [ruby-core:43557][Bug #6190]

History

#1

Updated by nobu (Nobuyoshi Nakada) over 7 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r35112.
Pierre, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • transcode.c (str_encode_bang, encoded_dup): if nothing was transcoded, just set encoding but leave coderange unchanged as forcee_encoding. [ruby-core:43557][Bug #6190]

Updated by duerst (Martin Dürst) over 7 years ago

  • Description updated (diff)

pplr (Pierre PLR) wrote:

a = " \xE9 ".encode('UTF-8', 'UTF-8', :invalid => :replace, :replace => "?")
a.valid_encoding?
=> true

Nobu fixed this so it won't return true anymore, which would be a lie.

a
=> " \xE9 "

The expected string is " ? ", as the documentation for the ":replace" options says :
If the value is :replace, encode replaces invalid byte sequences in str with the replacement character.

I added documentation to say that encoding from encoding A to the same encoding A is a no-op. Changing this would not be impossible, but would involve quite some work, and would make these operations slower.

#3

Updated by naruse (Yui NARUSE) over 6 years ago

  • Tracker changed from Bug to Backport
  • Project changed from Ruby master to Backport193

Updated by naruse (Yui NARUSE) over 6 years ago

  • Status changed from Closed to Assigned

Updated by zzak (Zachary Scott) over 6 years ago

  • Assignee set to naruse (Yui NARUSE)

naruse-san what do you want for this ticket?

#6

Updated by usa (Usaku NAKAMURA) over 6 years ago

  • Status changed from Assigned to Closed

This issue was solved with changeset r40056.
Pierre, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


merge revision(s) 35112,35121: [Backport #6190]

* transcode.c (str_encode_bang, encoded_dup): if nothing was
  transcoded, just set encoding but leave coderange unchanged as
  forcee_encoding.  [ruby-core:43557][Bug #6190]

* transcode.c (documentation for str_encode): Explain
  that transcoding to the same encoding is a no-op
  (i.e. no exceptions, no replacements,...).
  [ruby-core:43557][Bug #6190]

Also available in: Atom PDF