Project

General

Profile

Actions

Bug #18995

open

IO#set_encoding sometimes set an IO's internal encoding to the default external encoding

Added by javanthropus (Jeremy Bopp) over 1 year ago. Updated 25 days ago.

Status:
Open
Assignee:
-
Target version:
-
ruby -v:
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
[ruby-core:109842]

Description

This script demonstrates the behavior:

def show(io)
  printf(
    "external encoding: %-25p  internal encoding: %-25p\n",
    io.external_encoding,
    io.internal_encoding
  )
end

Encoding.default_external = 'iso-8859-1'
Encoding.default_internal = 'iso-8859-2'

File.open('/dev/null') do |f|
  f.set_encoding('utf-8', nil)
  show(f)                             # f.internal_encoding is iso-8859-2, as expected

  f.set_encoding('utf-8', 'invalid')
  show(f)                             # f.internal_encoding is now iso-8859-1!

  Encoding.default_external = 'iso-8859-3'
  Encoding.default_internal = 'iso-8859-4'
  show(f)                             # f.internal_encoding is now iso-8859-3!
end

In the 1st case, we see that the IO's internal encoding is set to the current setting of Encoding.default_internal. In the 2nd case, the IO's internal encoding is set to Encoding.default_external instead. The 3rd case is more interesting because it shows that the IO's internal encoding is actually following the current setting of Encoding.default_external. It didn't just copy it when #set_encoding was called. It changes whenever Encoding.default_external changes.

What should the correct behavior be?

Updated by javanthropus (Jeremy Bopp) over 1 year ago

Can anyone confirm that this is a bug and not a misunderstanding? It looks like the changes to fix this will require a fair bit of refactoring, and there don't yet appear to be any tests around the various cases for arguments to IO#set_encoding where IO#internal_encoding and IO#external_encoding are checked. I found tests around various ways of opening files and pipes with encoding arguments which do check the resulting internal and external encodings of the IO object, but none of those test these corner cases.

Updated by javanthropus (Jeremy Bopp) 25 days ago ยท Edited

@jeremyevans0 (Jeremy Evans), did you ever take a look at this issue when I referenced it in #18899? The behavior is unchanged in Ruby 3.3.

The script above prints the following:

external encoding: #<Encoding:UTF-8>          internal encoding: #<Encoding:ISO-8859-2>   
external encoding: #<Encoding:UTF-8>          internal encoding: #<Encoding:ISO-8859-1>   
external encoding: #<Encoding:UTF-8>          internal encoding: #<Encoding:ISO-8859-3>

I expected it to print this:

external encoding: #<Encoding:UTF-8>          internal encoding: #<Encoding:ISO-8859-2>
external encoding: #<Encoding:UTF-8>          internal encoding: #<Encoding:ISO-8859-2>
external encoding: #<Encoding:UTF-8>          internal encoding: #<Encoding:ISO-8859-4>
Actions

Also available in: Atom PDF

Like0
Like0Like0