IO#set_encoding sometimes set an IO's internal encoding to the default external encoding
This script demonstrates the behavior:
def show(io) printf( "external encoding: %-25p internal encoding: %-25p\n", io.external_encoding, io.internal_encoding ) end Encoding.default_external = 'iso-8859-1' Encoding.default_internal = 'iso-8859-2' File.open('/dev/null') do |f| f.set_encoding('utf-8', nil) show(f) # f.internal_encoding is iso-8859-2, as expected f.set_encoding('utf-8', 'invalid') show(f) # f.internal_encoding is now iso-8859-1! Encoding.default_external = 'iso-8859-3' Encoding.default_internal = 'iso-8859-4' show(f) # f.internal_encoding is now iso-8859-3! end
In the 1st case, we see that the IO's internal encoding is set to the current setting of Encoding.default_internal. In the 2nd case, the IO's internal encoding is set to Encoding.default_external instead. The 3rd case is more interesting because it shows that the IO's internal encoding is actually following the current setting of Encoding.default_external. It didn't just copy it when #set_encoding was called. It changes whenever Encoding.default_external changes.
What should the correct behavior be?
Updated by javanthropus (Jeremy Bopp) about 1 year ago
Can anyone confirm that this is a bug and not a misunderstanding? It looks like the changes to fix this will require a fair bit of refactoring, and there don't yet appear to be any tests around the various cases for arguments to
IO#external_encoding are checked. I found tests around various ways of opening files and pipes with encoding arguments which do check the resulting internal and external encodings of the IO object, but none of those test these corner cases.