Project

General

Profile

Actions

Bug #18898

closed

IO#set_encoding with invalid arguments leads to a segfault

Added by javanthropus (Jeremy Bopp) almost 2 years ago. Updated almost 2 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
[ruby-core:109150]

Description

Save the following to a file and run it:

#!/usr/bin/env ruby

Encoding.default_external = 'utf-8'
File.open(__FILE__) do |f|
  f.set_encoding('utf-8', 'invalid')

  printf(
    "default external: %p\ndefault internal: %p\nexternal:         %p\ninternal:         %p\n\n",
    Encoding.default_external,
    Encoding.default_internal,
    f.external_encoding,
    f.internal_encoding
  )

  f.read
end

The above script will result in a segfault at f.read. This seems to happen because the call to #set_encoding results in the internal encoding of the IO object being set to follow Encoding.default_external while also setting the external encoding of the IO object to match. Ovbiously, there shouldn't be a segfault, but I actually expected the IO object's internal encoding to be set to nil due to the invalid encoding being specified for it.

I was able to reproduce this on all versions of Ruby from 2.7.0 to 3.0.2.

Actions #1

Updated by javanthropus (Jeremy Bopp) almost 2 years ago

  • Description updated (diff)
Actions #2

Updated by nobu (Nobuyoshi Nakada) almost 2 years ago

  • Status changed from Open to Closed

Applied in changeset git|5ef3c7ea2d1968c87f361b6615699b92cc6e5a9a.


[Bug #18898] Fallback invalid external encoding to the default

Updated by javanthropus (Jeremy Bopp) almost 2 years ago

Thank you for working on this. While the patch prevents the crash, it does not address the other odd behavior that was reported, namely that the internal encoding of the IO instance is set to the default external encoding. Shouldn't it set the internal encoding to either nil or Encoding.default_internal?

Updated by javanthropus (Jeremy Bopp) almost 2 years ago

It looks like the internal encoding should be set to Encoding.default_internal when an invalid encoding name is given as the second argument. The equivalent is giving nil as the second argument instead as far as I can tell. This script demonstrates the difference in behavior:

def show(io)
  printf(
    "external encoding: %-25p  internal encoding: %-25p\n",
    io.external_encoding,
    io.internal_encoding
  )
end

Encoding.default_external = 'iso-8859-1'
Encoding.default_internal = 'iso-8859-2'

File.open('/dev/null') do |f|
  f.set_encoding('utf-8', nil)
  show(f)

  f.set_encoding('utf-8', 'invalid')
  show(f)

  Encoding.default_external = 'iso-8859-3'
  Encoding.default_internal = 'iso-8859-4'
  show(f)
end

In the 1st case, we see that the IO's internal encoding is set to the current setting of Encoding.default_internal. In the 2nd case, the IO's internal encoding is set to Encoding.default_external instead. The 3rd case more interesting because it shows that the IO's internal encoding is actually following the current setting of Encoding.default_external. It didn't just copy it when #set_encoding was called. It changes whenever Encoding.default_external changes.

I already opened a related issue for inconsistent handling of arguments for IO#set_encoding (#18899). Should I move this facet of the problem to that issue instead? Should I open a separate issue altogether?

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0