Backport #6190
closed
String#encode return string containing invalid chars but marked as valid
Added by pplr (Pierre PLR) almost 13 years ago.
Updated over 11 years ago.
Description
a = " \xE9 ".encode('UTF-8', 'UTF-8', :invalid => :replace, :replace => "?")
a.valid_encoding?
=> true
a
=> " \xE9 "
a.squeeze
ArgumentError: invalid byte sequence in UTF-8
from (irb):32:in squeeze' from (irb):32 from /usr/bin/irb:12:in
'
The expected string is " ? ", as the documentation for the ":replace" options says :
If the value is :replace, encode replaces invalid byte sequences in str with the replacement character.
- Status changed from Open to Closed
- % Done changed from 0 to 100
This issue was solved with changeset r35112.
Pierre, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.
- transcode.c (str_encode_bang, encoded_dup): if nothing was
transcoded, just set encoding but leave coderange unchanged as
forcee_encoding. [ruby-core:43557][Bug #6190]
- Description updated (diff)
pplr (Pierre PLR) wrote:
a = " \xE9 ".encode('UTF-8', 'UTF-8', :invalid => :replace, :replace => "?")
a.valid_encoding?
=> true
Nobu fixed this so it won't return true anymore, which would be a lie.
a
=> " \xE9 "
The expected string is " ? ", as the documentation for the ":replace" options says :
If the value is :replace, encode replaces invalid byte sequences in str with the replacement character.
I added documentation to say that encoding from encoding A to the same encoding A is a no-op. Changing this would not be impossible, but would involve quite some work, and would make these operations slower.
- Tracker changed from Bug to Backport
- Project changed from Ruby master to Backport193
- Status changed from Closed to Assigned
- Assignee set to naruse (Yui NARUSE)
naruse-san what do you want for this ticket?
- Status changed from Assigned to Closed
This issue was solved with changeset r40056.
Pierre, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.
merge revision(s) 35112,35121: [Backport #6190]
* transcode.c (str_encode_bang, encoded_dup): if nothing was
transcoded, just set encoding but leave coderange unchanged as
forcee_encoding. [ruby-core:43557][Bug #6190]
* transcode.c (documentation for str_encode): Explain
that transcoding to the same encoding is a no-op
(i.e. no exceptions, no replacements,...).
[ruby-core:43557][Bug #6190]
- Has duplicate Bug #19342: String#encode does not always throw exceptions for invalid source encodings added
Also available in: Atom
PDF
Like0
Like0Like0Like0Like0Like0Like0Like0