Bug #21280
openStringIO#set_encoding warns when backed by chilled string literal
Description
StringIO#set_encoding
changes the underlying string encoding if the string is not frozen, but does not change the underlying string encoding if the string is frozen. In Ruby 3.4, this results in a warning for chilled literal strings:
$ ruby34 -w -r stringio -e "StringIO.new('').set_encoding('binary')"
-e:1: warning: literal string will be frozen in the future (run with --debug-frozen-string-literal for more information)
I believe Ruby should emit this warning only for cases that will break when string literals are frozen. This is not one of those cases, so Ruby should not emit the warning. To avoid emitting the warning, I think StringIO#set_encoding
should not set the encoding on the underlying string for chilled literal strings. I submitted a pull request to avoid changing the encoding of the underlying string: https://github.com/ruby/stringio/pull/128
However, @rhenium (Kazuki Yamaguchi) said that he thought the encoding of the underlying string should still be changed, and the warning should be emitted (https://github.com/ruby/stringio/pull/128#issuecomment-2818362875).
For some history, before Ruby 2.3, StringIO#set_encoding
used to always set the encoding of the underlying string. This was changed in #11827, when the encoding was not set on the underlying string if the string was frozen (3e1c01ae463a8c9d8bbe9050251a2538ddb0292f).
In https://bugs.ruby-lang.org/issues/11827#note-3, @nurse wrote:
Away from the case and thinking ideal behavior, StringIO should be a view of given source string and set_encoding shouldn't change source encoding.
But I'm not sure that it is worth breaking the compatibility.
I think this means that ideally, absent backwards compatibility issues, StringIO#set_encoding should never change the underlying string encoding.
In https://bugs.ruby-lang.org/issues/11827#note-4, @shugo (Shugo Maeda) gave an example from @nobu (Nobuyoshi Nakada) that open-uri depends on the current behavior:
enc = Encoding::ASCII_8BIT unless enc
if self.respond_to? :force_encoding
self.force_encoding(enc)
elsif self.respond_to? :string
self.string.force_encoding(enc)
else # Tempfile
self.set_encoding enc
end
end
However, as StringIO#string
is defined, this will call self.string.force_encoding(enc)
and not self.set_encoding enc
, so I'm not sure why a change to String#set_encoding
would affect the behavior of this example.
@rhenium (Kazuki Yamaguchi) pointed out that this issue affects StringIO#binmode
and StringIO#set_encoding_by_bom
as well as StringIO#encoding
.
How do we want to handle this case? Should this result in a warning (current behavior), or is it safe to avoid changing the encoding of the underlying string for chilled strings (as is done for frozen strings)?
Updated by byroot (Jean Boussier) 3 days ago
Should this result in a warning (current behavior), or is it safe to avoid changing the encoding of the underlying string for chilled strings (as is done for frozen strings)?
I think the current behavior is correct in the sense that "chilled strings" are mutable strings, hence should behave just like a mutable string. In this case it means changing the encoding.
Chilled strings should also emit a warning when they are mutated to signal that the behavior will change if --enable-frozen-string-literal
is turned on, which it does right now.
Based on that I think the current behavior is correct, this warning is correctly warning you action is needed here.
Updated by jeremyevans0 (Jeremy Evans) 3 days ago
byroot (Jean Boussier) wrote in #note-1:
Chilled strings should also emit a warning when they are mutated to signal that the behavior will change if
--enable-frozen-string-literal
is turned on, which it does right now.
But behavior in this case will not change if --enable-frozen-string-literal
is turned on, because StringIO#set_encoding
does not set the encoding of the underlying string for frozen strings.
Based on that I think the current behavior is correct, this warning is correctly warning you action is needed here.
But no action is needed. Things will automatically work when strings move from chilled by default to frozen by default, because StringIO#set_encoding
will not try to change the encoding of the underlying string in that case.
Updated by byroot (Jean Boussier) 3 days ago
ยท Edited
But behavior in this case will not change
Maybe I'm missing something, but I think the behavior will change:
>> StringIO.new("").set_encoding(Encoding::BINARY).string.encoding
(irb):7: warning: literal string will be frozen in the future (run with --debug-frozen-string-literal for more information)
=> #<Encoding:BINARY (ASCII-8BIT)>
>> StringIO.new(+"").set_encoding(Encoding::BINARY).string.encoding
=> #<Encoding:BINARY (ASCII-8BIT)>
>> StringIO.new("".freeze).set_encoding(Encoding::BINARY).string.encoding
=> #<Encoding:UTF-8>
Updated by jeremyevans0 (Jeremy Evans) 3 days ago
byroot (Jean Boussier) wrote in #note-3:
But behavior in this case will not change
Maybe I'm missing something, but I think the behavior will change:
>> StringIO.new("").set_encoding(Encoding::BINARY).string.encoding (irb):7: warning: literal string will be frozen in the future (run with --debug-frozen-string-literal for more information) => #<Encoding:BINARY (ASCII-8BIT)> >> StringIO.new(+"").set_encoding(Encoding::BINARY).string.encoding => #<Encoding:BINARY (ASCII-8BIT)> >> StringIO.new("".freeze).set_encoding(Encoding::BINARY).string.encoding => #<Encoding:UTF-8>
The full sentence was:
But behavior in this case will not change if
--enable-frozen-string-literal
is turned on, because StringIO#set_encoding does not set the encoding of the underlying string for frozen strings.
With the patch, StringIO#set_encoding
treats a chilled string as a frozen string, so behavior when --enable-frozen-string-literal
is turned on does not change. The present situation is that --enable-frozen-string-literal
changes behavior of StringIO#set_encoding
.
Updated by byroot (Jean Boussier) 3 days ago
Perhaps it's the jetlag talking, but I think the current behavior is desirable.
Chilled string as designed should behave like mutable strings, and warn when a frozen one would have behaved differently. Isn't that the current behavior?
Updated by jeremyevans0 (Jeremy Evans) 3 days ago
byroot (Jean Boussier) wrote in #note-5:
Chilled string as designed should behave like mutable strings, and warn when a frozen one would have behaved differently. Isn't that the current behavior?
In this case, a frozen string would not behave differently, since StringIO#set_encoding
treats frozen strings differently than it does unfrozen strings.
The purpose of chilled strings is to warn users when code will break when the strings are frozen. Since code will not break in this case, issuing a warning seems like a bad idea, as it will prompt code changes when none are actually needed.
Updated by byroot (Jean Boussier) 3 days ago
Since code will not break in this case
That is debatable. It won't raise an exception yes, but the behavior will change as you can see in the IRB session I shared.