Bug #18898
closedIO#set_encoding with invalid arguments leads to a segfault
Description
Save the following to a file and run it:
#!/usr/bin/env ruby
Encoding.default_external = 'utf-8'
File.open(__FILE__) do |f|
  f.set_encoding('utf-8', 'invalid')
  printf(
    "default external: %p\ndefault internal: %p\nexternal:         %p\ninternal:         %p\n\n",
    Encoding.default_external,
    Encoding.default_internal,
    f.external_encoding,
    f.internal_encoding
  )
  f.read
end
The above script will result in a segfault at f.read.  This seems to happen because the call to #set_encoding results in the internal encoding of the IO object being set to follow Encoding.default_external while also setting the external encoding of the IO object to match.  Ovbiously, there shouldn't be a segfault, but I actually expected the IO object's internal encoding to be set to nil due to the invalid encoding being specified for it.
I was able to reproduce this on all versions of Ruby from 2.7.0 to 3.0.2.
        
           Updated by javanthropus (Jeremy Bopp) over 3 years ago
          Updated by javanthropus (Jeremy Bopp) over 3 years ago
          
          
        
        
      
      - Description updated (diff)
        
           Updated by nobu (Nobuyoshi Nakada) over 3 years ago
          Updated by nobu (Nobuyoshi Nakada) over 3 years ago
          
          
        
        
      
      - Status changed from Open to Closed
Applied in changeset git|5ef3c7ea2d1968c87f361b6615699b92cc6e5a9a.
[Bug #18898] Fallback invalid external encoding to the default
        
           Updated by javanthropus (Jeremy Bopp) over 3 years ago
          Updated by javanthropus (Jeremy Bopp) over 3 years ago
          
          
        
        
      
      Thank you for working on this.  While the patch prevents the crash, it does not address the other odd behavior that was reported, namely that the internal encoding of the IO instance is set to the default external encoding.  Shouldn't it set the internal encoding to either nil or Encoding.default_internal?
        
           Updated by javanthropus (Jeremy Bopp) over 3 years ago
          Updated by javanthropus (Jeremy Bopp) over 3 years ago
          
          
        
        
      
      It looks like the internal encoding should be set to Encoding.default_internal when an invalid encoding name is given as the second argument.  The equivalent is giving nil as the second argument instead as far as I can tell.  This script demonstrates the difference in behavior:
def show(io)
  printf(
    "external encoding: %-25p  internal encoding: %-25p\n",
    io.external_encoding,
    io.internal_encoding
  )
end
Encoding.default_external = 'iso-8859-1'
Encoding.default_internal = 'iso-8859-2'
File.open('/dev/null') do |f|
  f.set_encoding('utf-8', nil)
  show(f)
  f.set_encoding('utf-8', 'invalid')
  show(f)
  Encoding.default_external = 'iso-8859-3'
  Encoding.default_internal = 'iso-8859-4'
  show(f)
end
In the 1st case, we see that the IO's internal encoding is set to the current setting of Encoding.default_internal.  In the 2nd case, the IO's internal encoding is set to Encoding.default_external instead.  The 3rd case more interesting because it shows that the IO's internal encoding is actually following the current setting of Encoding.default_external.  It didn't just copy it when #set_encoding was called.  It changes whenever Encoding.default_external changes.
I already opened a related issue for inconsistent handling of arguments for IO#set_encoding (#18899).  Should I move this facet of the problem to that issue instead?  Should I open a separate issue altogether?