Bug #18898
closedIO#set_encoding with invalid arguments leads to a segfault
Description
Save the following to a file and run it:
#!/usr/bin/env ruby
Encoding.default_external = 'utf-8'
File.open(__FILE__) do |f|
f.set_encoding('utf-8', 'invalid')
printf(
"default external: %p\ndefault internal: %p\nexternal: %p\ninternal: %p\n\n",
Encoding.default_external,
Encoding.default_internal,
f.external_encoding,
f.internal_encoding
)
f.read
end
The above script will result in a segfault at f.read
. This seems to happen because the call to #set_encoding
results in the internal encoding of the IO object being set to follow Encoding.default_external
while also setting the external encoding of the IO object to match. Ovbiously, there shouldn't be a segfault, but I actually expected the IO object's internal encoding to be set to nil due to the invalid encoding being specified for it.
I was able to reproduce this on all versions of Ruby from 2.7.0 to 3.0.2.
Updated by javanthropus (Jeremy Bopp) almost 2 years ago
- Description updated (diff)
Updated by nobu (Nobuyoshi Nakada) almost 2 years ago
- Status changed from Open to Closed
Applied in changeset git|5ef3c7ea2d1968c87f361b6615699b92cc6e5a9a.
[Bug #18898] Fallback invalid external encoding to the default
Updated by javanthropus (Jeremy Bopp) almost 2 years ago
Thank you for working on this. While the patch prevents the crash, it does not address the other odd behavior that was reported, namely that the internal encoding of the IO instance is set to the default external encoding. Shouldn't it set the internal encoding to either nil
or Encoding.default_internal
?
Updated by javanthropus (Jeremy Bopp) almost 2 years ago
It looks like the internal encoding should be set to Encoding.default_internal
when an invalid encoding name is given as the second argument. The equivalent is giving nil
as the second argument instead as far as I can tell. This script demonstrates the difference in behavior:
def show(io)
printf(
"external encoding: %-25p internal encoding: %-25p\n",
io.external_encoding,
io.internal_encoding
)
end
Encoding.default_external = 'iso-8859-1'
Encoding.default_internal = 'iso-8859-2'
File.open('/dev/null') do |f|
f.set_encoding('utf-8', nil)
show(f)
f.set_encoding('utf-8', 'invalid')
show(f)
Encoding.default_external = 'iso-8859-3'
Encoding.default_internal = 'iso-8859-4'
show(f)
end
In the 1st case, we see that the IO's internal encoding is set to the current setting of Encoding.default_internal
. In the 2nd case, the IO's internal encoding is set to Encoding.default_external
instead. The 3rd case more interesting because it shows that the IO's internal encoding is actually following the current setting of Encoding.default_external
. It didn't just copy it when #set_encoding
was called. It changes whenever Encoding.default_external
changes.
I already opened a related issue for inconsistent handling of arguments for IO#set_encoding
(#18899). Should I move this facet of the problem to that issue instead? Should I open a separate issue altogether?