https://bugs.ruby-lang.org/https://bugs.ruby-lang.org/favicon.ico?17113305112022-07-06T13:53:23ZRuby Issue Tracking SystemRuby master - Bug #18899: Inconsistent argument handling in IO#set_encodinghttps://bugs.ruby-lang.org/issues/18899?journal_id=982932022-07-06T13:53:23Zjavanthropus (Jeremy Bopp)jeremy@bopp.net
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/98293/diff?detail_id=62827">diff</a>)</li></ul> Ruby master - Bug #18899: Inconsistent argument handling in IO#set_encodinghttps://bugs.ruby-lang.org/issues/18899?journal_id=987982022-08-21T14:24:45Zjavanthropus (Jeremy Bopp)jeremy@bopp.net
<ul></ul><p>Can anyone confirm if this is a bug or intended behavior? I've taken a look at the code that implements this, and there are 2 pretty independent code paths for handling the single string argument case and the multiple argument case. If this is confirmed to be a bug, I would like to write a patch to unify the behavior.</p> Ruby master - Bug #18899: Inconsistent argument handling in IO#set_encodinghttps://bugs.ruby-lang.org/issues/18899?journal_id=988742022-08-23T22:20:55Zjeremyevans0 (Jeremy Evans)merch-redmine@jeremyevans.net
<ul></ul><p>I think it is a bug. I submitted a pull request to fix it: <a href="https://github.com/ruby/ruby/pull/6280" class="external">https://github.com/ruby/ruby/pull/6280</a>. Not sure if the approach taken is the best way, though.</p> Ruby master - Bug #18899: Inconsistent argument handling in IO#set_encodinghttps://bugs.ruby-lang.org/issues/18899?journal_id=989432022-08-26T13:01:18Zjavanthropus (Jeremy Bopp)jeremy@bopp.net
<ul></ul><p>I ran my test against your branch, and it addresses this issue. I hope it can be incorporated soon. Thanks!</p> Ruby master - Bug #18899: Inconsistent argument handling in IO#set_encodinghttps://bugs.ruby-lang.org/issues/18899?journal_id=1001902022-11-21T12:22:18Znaruse (Yui NARUSE)naruse@airemix.jp
<ul></ul><p>I think your example needs to be as follows:</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="c1">#!/usr/bin/env ruby</span>
<span class="k">def</span> <span class="nf">show</span><span class="p">(</span><span class="n">io</span><span class="p">,</span> <span class="n">args</span><span class="p">)</span>
<span class="nb">printf</span><span class="p">(</span>
<span class="s2">"args: %-50s external encoding: %-25s internal encoding: %-25s</span><span class="se">\n</span><span class="s2">"</span><span class="p">,</span>
<span class="n">args</span><span class="p">.</span><span class="nf">inspect</span><span class="p">,</span>
<span class="n">io</span><span class="p">.</span><span class="nf">external_encoding</span><span class="p">.</span><span class="nf">inspect</span><span class="p">,</span>
<span class="n">io</span><span class="p">.</span><span class="nf">internal_encoding</span><span class="p">.</span><span class="nf">inspect</span>
<span class="p">)</span>
<span class="k">end</span>
<span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="s1">'/dev/null'</span><span class="p">,</span> <span class="s1">'r:binary:utf-8'</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">f</span><span class="o">|</span>
<span class="n">args</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'r:binary:utf-8'</span><span class="p">]</span>
<span class="n">show</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">args</span><span class="p">)</span>
<span class="n">args</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'binary:utf-8'</span><span class="p">]</span>
<span class="n">f</span><span class="p">.</span><span class="nf">set_encoding</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">)</span>
<span class="n">show</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">args</span><span class="p">)</span>
<span class="n">args</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'binary'</span><span class="p">,</span> <span class="s1">'utf-8'</span><span class="p">]</span>
<span class="n">f</span><span class="p">.</span><span class="nf">set_encoding</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">)</span>
<span class="n">show</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">args</span><span class="p">)</span>
<span class="n">args</span> <span class="o">=</span> <span class="p">[</span><span class="no">Encoding</span><span class="p">.</span><span class="nf">find</span><span class="p">(</span><span class="s1">'binary'</span><span class="p">),</span> <span class="no">Encoding</span><span class="p">.</span><span class="nf">find</span><span class="p">(</span><span class="s1">'utf-8'</span><span class="p">)]</span>
<span class="n">f</span><span class="p">.</span><span class="nf">set_encoding</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">)</span>
<span class="n">show</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">args</span><span class="p">)</span>
<span class="k">end</span>
</code></pre>
<p>The result will be</p>
<pre><code>args: ["r:binary:utf-8"] external encoding: #<Encoding:ASCII-8BIT> internal encoding: nil
args: ["binary:utf-8"] external encoding: #<Encoding:ASCII-8BIT> internal encoding: nil
args: ["binary", "utf-8"] external encoding: #<Encoding:ASCII-8BIT> internal encoding: #<Encoding:UTF-8>
args: [#<Encoding:ASCII-8BIT>, #<Encoding:UTF-8>] external encoding: #<Encoding:ASCII-8BIT> internal encoding: #<Encoding:UTF-8>
</code></pre> Ruby master - Bug #18899: Inconsistent argument handling in IO#set_encodinghttps://bugs.ruby-lang.org/issues/18899?journal_id=1001932022-11-21T13:53:26Zjavanthropus (Jeremy Bopp)jeremy@bopp.net
<ul></ul><p>Thank you for your response. How do the changes to the example make a difference? The results with the original example are:</p>
<pre><code>args: ["binary:utf-8"] external encoding: #<Encoding:ASCII-8BIT> internal encoding: nil
args: ["binary", "utf-8"] external encoding: #<Encoding:ASCII-8BIT> internal encoding: #<Encoding:UTF-8>
args: [#<Encoding:ASCII-8BIT>, #<Encoding:UTF-8>] external encoding: #<Encoding:ASCII-8BIT> internal encoding: #<Encoding:UTF-8>
</code></pre>
<p>Unless I'm mistaken, these are exactly the same as the last 3 lines of the modified example's output. The question remains as to why the single string argument case results in a <code>nil</code> internal encoding while the 2 argument cases do not.</p>
<p>Before investigating this, I thought that the logic would first split <code>"binary:utf-8"</code> into <code>"binary"</code> and <code>"utf-8"</code> and then proceed as in the 2 string argument case. In other words, I expected that all cases would result in the internal encoding being set to the same value, either <code>nil</code> or <code>Encoding::UTF-8</code>.</p> Ruby master - Bug #18899: Inconsistent argument handling in IO#set_encodinghttps://bugs.ruby-lang.org/issues/18899?journal_id=1002632022-11-25T17:55:51Zjeremyevans0 (Jeremy Evans)merch-redmine@jeremyevans.net
<ul></ul><p>After more research, it appears the current behavior is expected. Parsing the single string with embedded colon is already handled correctly. However, if the external encoding is binary/ASCII-8BIT, then the internal encoding is deliberately set to <code>nil</code>:</p>
<pre><code class="c syntaxhl" data-language="c"><span class="c1">// in rb_io_ext_int_to_encs</span>
<span class="k">if</span> <span class="p">(</span><span class="n">ext</span> <span class="o">==</span> <span class="n">rb_ascii8bit_encoding</span><span class="p">())</span> <span class="p">{</span>
<span class="cm">/* If external is ASCII-8BIT, no transcoding */</span>
<span class="n">intern</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
</code></pre>
<p>Basically, the <code>'binary:utf-8'</code> encoding doesn't make sense. Providing two encodings is done to transcode from one encoding to the other. There is no transcoding if the external encoding is binary. If you want the internal encoding to be UTF-8, then just use <code>'utf-8'</code>.</p>
<p>That still leaves us with inconsistent behavior between <code>'binary:utf-8'</code> and <code>'binary', 'utf-8'</code>. So I propose to make the <code>'binary', 'utf-8'</code> behavior the same as <code>'binary:utf-8'</code>. I updated my pull request to do that: <a href="https://github.com/ruby/ruby/pull/6280" class="external">https://github.com/ruby/ruby/pull/6280</a></p>
<p>An alternative approach would be to remove the above code to treat the external encoding specially.</p> Ruby master - Bug #18899: Inconsistent argument handling in IO#set_encodinghttps://bugs.ruby-lang.org/issues/18899?journal_id=1002742022-11-26T12:44:15ZEregon (Benoit Daloze)
<ul></ul><p>I've taken a look in <code>IO#set_encoding</code> recently and it's such an unreadable mess, I think nobody would be able to explain its full semantics.<br>
So anything to simplify it would IMHO be welcome.<br>
I think <code>IO#set_encoding</code> should simply set the internal/external encodings for that IO, with no special cases and not caring about the default external/internal encodings.<br>
If some cases don't make any sense they should raise an exception.</p> Ruby master - Bug #18899: Inconsistent argument handling in IO#set_encodinghttps://bugs.ruby-lang.org/issues/18899?journal_id=1002802022-11-26T23:20:21Zjavanthropus (Jeremy Bopp)jeremy@bopp.net
<ul></ul><p>Please also see <a class="issue tracker-1 status-1 priority-4 priority-default" title="Bug: IO#set_encoding sometimes set an IO's internal encoding to the default external encoding (Open)" href="https://bugs.ruby-lang.org/issues/18995">#18995</a> for another example of the intricate implementation behaving unexpectedly. During my own investigation, I discovered that using <code>"-"</code> for the internal encoding name is silently ignored. According to the comments in the code, <code>"-"</code> is used to indicate no conversion, but it's completely undocumented for the method. If you use <code>"-"</code> for the external encoding name, you get similarly divergent behavior as reported for this issue if you pass <code>"-:utf-8"</code> vs. <code>"-"</code>, <code>"utf-8"</code>.</p> Ruby master - Bug #18899: Inconsistent argument handling in IO#set_encodinghttps://bugs.ruby-lang.org/issues/18899?journal_id=1003602022-11-30T18:21:29ZDan0042 (Daniel DeLorme)
<ul></ul><p>Naively, I would have expected "binary:utf-8" to take arbitrary input and force the encoding to UTF-8, and "utf-8:utf-8" to read and validate the input as UTF-8.<br>
Neither does what I expected. <code>¯\_(ツ)_/¯</code></p> Ruby master - Bug #18899: Inconsistent argument handling in IO#set_encodinghttps://bugs.ruby-lang.org/issues/18899?journal_id=1009322023-01-02T03:04:54Zjeremyevans (Jeremy Evans)code@jeremyevans.net
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Closed</i></li></ul><p>Applied in changeset <a class="changeset" title="Make IO#set_encoding with binary external encoding use nil internal encoding This was already th..." href="https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/0903a251796c2b4086804a94420c231c04e3cea1">git|0903a251796c2b4086804a94420c231c04e3cea1</a>.</p>
<hr>
<p>Make IO#set_encoding with binary external encoding use nil internal encoding</p>
<p>This was already the behavior when a single <code>'external:internal'</code><br>
encoding specifier string was passed. This makes the behavior<br>
consistent for the case where separate external and internal<br>
encoding specifiers are provided.</p>
<p>While here, fix the IO#set_encoding method documentation to<br>
state that either the first or second argument can be a string<br>
with an encoding name, and describe the behavior when the<br>
external encoding is binary.</p>
<p>Fixes [Bug <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: Inconsistent argument handling in IO#set_encoding (Closed)" href="https://bugs.ruby-lang.org/issues/18899">#18899</a>]</p>