Project

General

Profile

Actions

Bug #19342

closed

String#encode does not always throw exceptions for invalid source encodings

Added by mathieu451 (Math Ieu) over 1 year ago. Updated over 1 year ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 3.0.5p211 (2022-11-24 revision ba5cf0f7c5) [amd64-freebsd13]
[ruby-core:111815]

Description

Documentation says that String#encode throws Encoding::InvalidByteSequenceError when the string isn't valid in the source encoding, but it does not always do so:

"\x99".encode('UTF-8', 'UTF-8')
"\x99".force_encoding('UTF-8').encode('UTF-8')

In both cases, it returns a string with invalid encoding.

But those do throw an exception:

"\x99".encode('ISO8859-1', 'UTF-8')
"\x99".force_encoding('UTF-8').encode('ISO8859-1')

I suppose it's debatable if it could be considered a bug or not. It's a weird case to ask to convert to/from the same encoding, but it happened to me with a loop that tried to interpret a binary string with multiple encodings:

input_string = "\x99".force_encoding('US-ASCII')
want_encoding = 'UTF-8'
%w{ISO8859-1 UTF-8}.each do |try_encoding|
  s = begin
        input_string.encode(want_encoding, try_encoding)
      rescue EncodingError
        next
      end
  process_string s
end

I expected to get a Encoding::InvalidByteSequenceError exception during the conversion, but instead I got exceptions later on while trying to work on an invalid string that #encode returned.


Related issues 1 (0 open1 closed)

Is duplicate of Backport193 - Backport #6190: String#encode return string containing invalid chars but marked as validClosednaruse (Yui NARUSE)03/22/2012Actions
Actions #1

Updated by duerst (Martin Dürst) over 1 year ago

  • Is duplicate of Backport #6190: String#encode return string containing invalid chars but marked as valid added

Updated by duerst (Martin Dürst) over 1 year ago

This was discussed in issue 6190. As you already say, it's somehow a weird case. The decision was to make transcoding from an encoding to the same encoding a no-op for performance. There was also some documentation (now in git commit 463633e4a934a00f869086a6ffbf84c6cb8ad630), but it seems to have been lost. That definitely should be fixed. The documentation is now in doc/transcode.rdoc.

Updated by duerst (Martin Dürst) over 1 year ago

  • Status changed from Open to Closed

I fixed the documentation (which was moved to doc/string/encode.rdoc by @nobu (Nobuyoshi Nakada) in 468ce1488d) in 11f28f3268. I think that this issue can therefore be closed.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0