https://bugs.ruby-lang.org/https://bugs.ruby-lang.org/favicon.ico?17113305112022-02-08T09:21:17ZRuby Issue Tracking SystemRuby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=964202022-02-08T09:21:17Zduerst (Martin Dürst)duerst@it.aoyama.ac.jp
<ul></ul><p>Well, it's actually not just binary. Binary would mean that none of the bytes have any 'meaning' (i.e. characters) assigned to them. But ASCII-8BIT actually has character 'meaning' assigned to the ASCII range.<br>
You can for example do the following:</p>
<pre><code class="Ruby syntaxhl" data-language="Ruby"><span class="n">u</span> <span class="o">=</span> <span class="p">(</span><span class="n">b</span> <span class="o">=</span> <span class="s2">"abcde"</span><span class="p">.</span><span class="nf">force_encoding</span><span class="p">(</span><span class="s1">'ASCII-8BIT'</span><span class="p">)).</span><span class="nf">encode</span><span class="p">(</span><span class="s1">'UTF-8'</span><span class="p">)</span>
</code></pre>
<p>This gives you the string "abcde" with the encoding UTF-8. This shows that the lower 7 bits are interpreted the same as US-ASCII. The range with the 8th bit set, on the other hand, is just binary values, so</p>
<pre><code class="Ruby syntaxhl" data-language="Ruby"><span class="s2">"</span><span class="se">\xCD</span><span class="s2">"</span><span class="p">.</span><span class="nf">force_encoding</span><span class="p">(</span><span class="s1">'ASCII-8BIT'</span><span class="p">).</span><span class="nf">encode</span><span class="p">(</span><span class="s1">'UTF-8'</span><span class="p">)</span>
</code></pre>
<p>produces this error:</p>
<pre><code>Encoding::UndefinedConversionError ("\xCD" from ASCII-8BIT to UTF-8)
</code></pre>
<p>I choose UTF-8 as the target encoding because that contains all of Unicode, so the error cannot be because the source character doesn't exist in the target encoding.</p>
<p>So there's indeed some complexity here, but it's not exactly what you think.</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=964212022-02-08T09:28:58Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><p><a class="user active user-mention" href="https://bugs.ruby-lang.org/users/50">@duerst (Martin Dürst)</a> I'm aware of this, but I don't quite see how it's a concern. It's a fairly subtle behavior, and I doubt the <code>ASCII-8BIT</code> name particularly reveal it.</p>
<p>Also nitpick, but a better example would be:</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="s2">"</span><span class="se">\xC3\xA9</span><span class="s2">"</span><span class="p">.</span><span class="nf">b</span><span class="p">.</span><span class="nf">encode</span><span class="p">(</span><span class="no">Encoding</span><span class="o">::</span><span class="no">UTF_8</span><span class="p">)</span> <span class="c1"># => Encoding::UndefinedConversionError</span>
</code></pre>
<p>Since it's valid UTF-8.</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=964222022-02-08T09:41:23Znaruse (Yui NARUSE)naruse@airemix.jp
<ul></ul><p>duerst (Martin Dürst) wrote in <a href="#note-1">#note-1</a>:</p>
<blockquote>
<p>Well, it's actually not just binary. Binary would mean that none of the bytes have any 'meaning' (i.e. characters) assigned to them. But ASCII-8BIT actually has character 'meaning' assigned to the ASCII range.</p>
</blockquote>
<p>I agree the principle.<br>
But we should consider this proposal as "ASCII range of binary data in the world is usually ASCII. Why you call them as complex name: ASCII-8BIT?"</p>
<p>I think the name of the encoding is a communication tool. We should compare pros and cons between ASCII-8BIT and BINARY.</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=964232022-02-08T10:15:10ZEregon (Benoit Daloze)
<ul></ul><p>+1000 for this, I think ASCII-8BIT is always extremely confusing, and BINARY is much more revealing (= we don't know what the actual encoding is, or it might be binary data and not text).<br>
I've seen many Ruby users confused by this.<br>
I'm not sure why I never thought to propose it here TBH.</p>
<p>I've literally never used the <code>Encoding::ASCII_8BIT</code> form in code (and rarely if ever seen it) but <code>Encoding::BINARY</code> many times.</p>
<p>The property that bytes < 128 are interpreted as US-ASCII is nothing special, every <code>Encoding#ascii_compatible?</code> behaves like that.<br>
And almost all non-dummy Ruby encodings are <code>#ascii_compatible?</code>, the only two exceptions are UTF-16/32 (both LE/BE).</p>
<p>Two things particularly confusing about the name ASCII-8BIT:</p>
<ul>
<li>It's completely unclear it might mean binary data or unknown encoding</li>
<li>ISO-8859-* and many other encodings are 8-bit ascii-compatible encodings. Yet ASCII-8BIT which name seems to imply something close is nothing like that (the 8th bit is undefined, uninterpreted but valid).</li>
</ul>
<p>(FWIW JCodings, the Java library for Ruby encodings has ASCIIEncoding.INSTANCE for BINARY, that's even worse as it's even more confusing with US-ASCII, I've been thinking how to fix that in JCodings in a compatible way)</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=964242022-02-08T10:21:10ZEregon (Benoit Daloze)
<ul></ul><p>BTW Python has the "bytes" encoding and it behaves very similar to Ruby's BINARY encoding (it's a different type in Python but that's details).<br>
e.g.</p>
<pre><code>>>> bytes("abcdé", 'utf-8')
b'abcd\xc3\xa9'
</code></pre>
<p>That's also a more telling name than ASCII-8BIT.<br>
BINARY is better for Ruby because it's already an established name for it.</p>
<p>There is also already <code>String#b</code> for binary, it's not <code>String#a</code> or so.</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=964382022-02-09T09:51:42Znaruse (Yui NARUSE)naruse@airemix.jp
<ul></ul><p>The name <code>ASCII-8BIT</code> expresses how we deeply considered about what "binary" is. Ruby 1.9's encoding system is serial invents. Ruby invented some ideas: ASCII COMPATIBLE and ASCII-8BIT.</p>
<blockquote>
<p>Two things particularly confusing about the name ASCII-8BIT:</p>
<ul>
<li>It's completely unclear it might mean binary data or unknown encoding</li>
<li>ISO-8859-* and many other encodings are 8-bit ascii-compatible encodings. Yet ASCII-8BIT which name seems to imply something close is nothing like that (the 8th bit is undefined, uninterpreted but valid).</li>
</ul>
</blockquote>
<p>Your two questions raises very good points. The answer for them is tightly coupled with the name <code>ASCII-8BIT</code>.</p>
<blockquote>
<ul>
<li>It's completely unclear it might mean binary data or unknown encoding</li>
</ul>
</blockquote>
<p>I want to ask you that how often you can actually distinguish them. Ruby's assumption is that developers cannot distinguish them in normal use cases. If so, Ruby may not provide two objects. If Ruby provide only one object for them, developers don't need clarify it.</p>
<blockquote>
<p>ISO-8859-* and many other encodings are 8-bit ascii-compatible encodings. Yet ASCII-8BIT which name seems to imply something close is nothing like that (the 8th bit is undefined, uninterpreted but valid).</p>
</blockquote>
<p>This is very good question. Ruby's answer is "yes, ASCII-8BIT is similar to ISO-8859-*". As you say, an ASCII-8BIT string's 8-bit range is undefined. But Ruby doesn't matter that. In the real world such phenomenon is sometimes discovered.</p>
<p>For example the charset of HTTP Header is usually ISO-8859-1. Many languages struggled how to handle these octets. Python and .NET handles this as binary. It prevents to leverage powerful String methods to such binary data. Ruby handles it as ASCII-8BIT. Ruby's insight is binaries Ruby handles is usually such octets. The name <code>ASCII-8BIT</code> reflects such insight.</p>
<p>Therefore the conclusion for your question is that they are just what the real world is. The name just reflects that.</p>
<p>Anyway Rails programmers don't need such understanding usually. If renaming cares people who just hit the surface of this chaos, it might be worth considered, though changing encoding.name may hit the compatibility issue.</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=964432022-02-09T16:52:29Ztenderlovemaking (Aaron Patterson)tenderlove@ruby-lang.org
<ul></ul><p>First, I agree with this proposal. Second, I think this example should raise an exception:</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="n">u</span> <span class="o">=</span> <span class="p">(</span><span class="n">b</span> <span class="o">=</span> <span class="s2">"abcde"</span><span class="p">.</span><span class="nf">force_encoding</span><span class="p">(</span><span class="s1">'ASCII-8BIT'</span><span class="p">)).</span><span class="nf">encode</span><span class="p">(</span><span class="s1">'UTF-8'</span><span class="p">)</span>
</code></pre>
<p>But I can open a different ticket for that. The point I actually want to make is that I've never seen this use case in the wild. 100% of the cases I've seen for <code>force_encoding('ASCII-8BIT')</code> are when the developer knows the string is binary (or unknown) data and they want to treat it as binary / unknown data <em>not</em> as "might be US-ASCII sometimes". The name "binary" would more accurately reflect real world usage IMO.</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=964442022-02-09T17:34:28ZEregon (Benoit Daloze)
<ul></ul><p>naruse (Yui NARUSE) wrote in <a href="#note-6">#note-6</a>:</p>
<blockquote>
<p>I want to ask you that how often you can actually distinguish them.</p>
</blockquote>
<p>I think in many cases it is possible to distinguish.<br>
For instance, an HTTP header might initially be in the binary encoding and mean "unknown encoding" (can often find the real encoding through <code>Content-Type</code>'s charset, but not always and could be invalid)<br>
Another example is <code>socket.read(N)</code> which might be actual binary data (e.g. for a binary protocol), or text and the actual encoding depends then on what's communicated on that socket.</p>
<p>And I would think most Ruby programs need to handle the binary encoding somehow, and can only leave a String as binary if it's only bytes < 128, otherwise things break.</p>
<blockquote>
<p>If so, Ruby may not provide two objects.</p>
</blockquote>
<p>I don't think two different "binary" Encodings are useful, one seems enough in practice and can be used for both meanings, which are very close (as a binary byte array, or a marker for unknown encoding).</p>
<blockquote>
<p>This is very good question. Ruby's answer is "yes, ASCII-8BIT is similar to ISO-8859-*". As you say, an ASCII-8BIT string's 8-bit range is undefined. But Ruby doesn't matter that. In the real world such phenomenon is sometimes discovered.</p>
</blockquote>
<p>I think such situations need to be handled somehow and given a real encoding.<br>
"ASCII-8BIT" feels confusing because there is no such thing as a "8th" bit of ASCII, without a more specific encoding which defines that.<br>
So it really means unknown, and "ASCII-8BIT" seems far from "unknown encoding".</p>
<p>Also "ASCII-8BIT" sounds clearly wrong if it's actual binary data (which might not use any ASCII concept at all).<br>
The behavior that this pseudo-encoding is ASCII compatible and e.g. shows byte 65 as <code>A</code> is fine, after all hexdump utilities typically do the same for bytes < 128 and it's helpful if there is ASCII text in the middle of binary data.</p>
<blockquote>
<p>Anyway Rails programmers don't need such understanding usually. If renaming cares people who just hit the surface of this chaos, it might be worth considered, though changing encoding.name may hit the compatibility issue.</p>
</blockquote>
<p>Not just Rails programmers, I think most Ruby programmers are confused when they see ASCII-8BIT, and not only the first time.<br>
I believe renaming to BINARY would help them understand the meaning much better.</p>
<p><a class="user active user-mention" href="https://bugs.ruby-lang.org/users/73">@tenderlovemaking (Aaron Patterson)</a> One issue is e.g. error messages in CRuby are encoded in the binary encoding (probably for the legacy reason of using <code>rb_str_new()</code>), and so that would be I think a wide-reaching change with a high chance of causing real compatibility issues, it seems too incompatible to me.<br>
As an example, the encoding negotiation rules (e.g. for concatenation) in Ruby are all based around whether one side is <code>#ascii_only?</code> and if yes then just use the other side's encoding. Preventing to e.g. concat with a ASCII-only binary string would break lots of programs.<br>
Anyway, I think that's a separate issue indeed.</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=964452022-02-09T17:49:29Zjeremyevans0 (Jeremy Evans)merch-redmine@jeremyevans.net
<ul></ul><p>I'm also in favor of renaming <code>ASCII-8BIT</code> to <code>BINARY</code>, but I don't have strong feelings about it. I'm strongly against breaking <code>String#encode</code> for binary strings.</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=964482022-02-09T23:35:30Ztenderlovemaking (Aaron Patterson)tenderlove@ruby-lang.org
<ul></ul><p>jeremyevans0 (Jeremy Evans) wrote in <a href="#note-9">#note-9</a>:</p>
<blockquote>
<p>I'm also in favor of renaming <code>ASCII-8BIT</code> to <code>BINARY</code>, but I don't have strong feelings about it. I'm strongly against breaking <code>String#encode</code> for binary strings.</p>
</blockquote>
<p>Ya, sorry, I should be more clear. I think concatenation shouldn't try to guess at the encoding. If the user calls "encode" then it seems fine.</p>
<p>Eregon (Benoit Daloze) wrote in <a href="#note-8">#note-8</a>:</p>
<blockquote>
<p>As an example, the encoding negotiation rules (e.g. for concatenation) in Ruby are all based around whether one side is <code>#ascii_only?</code> and if yes then just use the other side's encoding. Preventing to e.g. concat with a ASCII-only binary string would break lots of programs.<br>
Anyway, I think that's a separate issue indeed.</p>
</blockquote>
<p>Yes, this is the issue I have. IME the code is already broken, it just hasn't had the right input to break it yet (where would the binary string come from other than an external location?). Regardless, I made a ticket here: <a href="https://bugs.ruby-lang.org/issues/18579" class="external">https://bugs.ruby-lang.org/issues/18579</a> 😄</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=964612022-02-10T07:53:35Zduerst (Martin Dürst)duerst@it.aoyama.ac.jp
<ul></ul><p>Eregon (Benoit Daloze) wrote in <a href="#note-4">#note-4</a>:</p>
<blockquote>
<p>The property that bytes < 128 are interpreted as US-ASCII is nothing special, every <code>Encoding#ascii_compatible?</code> behaves like that.<br>
And almost all non-dummy Ruby encodings are <code>#ascii_compatible?</code>, the only two exceptions are UTF-16/32 (both LE/BE).</p>
<p>Two things particularly confusing about the name ASCII-8BIT:</p>
<ul>
<li>It's completely unclear it might mean binary data or unknown encoding</li>
</ul>
</blockquote>
<p>Well, binary data can be character data with unknown encoding (or with encoding not yet set), or it can be truly binary data (e.g. as in a .jpg file or .zip file,...).</p>
<blockquote>
<ul>
<li>ISO-8859-* and many other encodings are 8-bit ascii-compatible encodings. Yet ASCII-8BIT which name seems to imply something close is nothing like that (the 8th bit is undefined, uninterpreted but valid).</li>
</ul>
</blockquote>
<p>ASCII-8BIT is an 8-bit ascii-compatible encoding, isn't it?</p>
<p>I think the idea of ASCII-8BIT goes back to the fact that in Ruby, many encodings can be used for source code, and as long as you only use ASCII in the code, it doesn't actually matter. That's to a large extent how Ruby 1.8 operated, and that was carried over into Ruby 1.9.</p>
<p>Now that the default source encoding is UTF-8, we have an encoding pragma for source files in other encodings, and so on, the importance of "something where we know ASCII is ASCII, but we are not sure about the upper half of the byte values" may be quite a bit less important.</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=964622022-02-10T09:11:22Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><blockquote>
<p>though changing encoding.name may hit the compatibility issue.</p>
</blockquote>
<p>I personally don't think it's much of a concern, but if it is, then a possible alternative would be to only change <code>Encoding::ASCII_8BIT.inspect</code> so that it shows up as <code>BINARY</code> in <code>EncodingError</code> and such, but that <code>Encoding::ASCII_8BIT.name</code> is unchanged.</p>
<p>Unless people think this would be even more confusing.</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=964652022-02-10T14:15:32ZEregon (Benoit Daloze)
<ul></ul><p>byroot (Jean Boussier) wrote in <a href="#note-12">#note-12</a>:</p>
<blockquote>
<blockquote>
<p>though changing encoding.name may hit the compatibility issue.</p>
</blockquote>
<p>I personally don't think it's much of a concern</p>
</blockquote>
<p>I agree, this sounds very unlikely to cause compatibility issues, and if it does it would be extremely rare.<br>
I believe the vast majority of programs simply don't rely on <code>Encoding#name</code> values.<br>
(and of course <code>Encoding.find(name)</code> would still work for both <code>"binary"</code> & <code>"ascii-8bit"</code>)</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=965302022-02-17T09:14:53Zmatz (Yukihiro Matsumoto)matz@ruby.or.jp
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Rejected</i></li></ul><p>I don't object to the proposal itself. But as <a class="user active user-mention" href="https://bugs.ruby-lang.org/users/17">@ko1 (Koichi Sasada)</a> searched, there are so many gems that compare <code>Encoding#name</code> and <code>ASCII-8BIT</code>.<br>
So I don't accept the proposal for the sake of compatibility.</p>
<p>Matz.</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=965312022-02-17T09:16:43Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><p>Can I make a counter proposal?</p>
<p>We could keep <code>Encoding#name</code> as <code>"ASCII-8BIT"</code>, but change <code>Encoding#inspect</code> and make sure <code>EncodingError</code> use the <code>BINARY</code> name in its error messages.</p>
<p>What do you think?</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=965322022-02-17T09:24:07Zmatz (Yukihiro Matsumoto)matz@ruby.or.jp
<ul></ul><p>Does this counter-proposal solve the original problem?<br>
It seems it introduces another inconsistency (and possible confusion).</p>
<p>Matz.</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=965332022-02-17T09:27:10Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><blockquote>
<p>Does this counter-proposal solve the original problem?</p>
</blockquote>
<p>I believe so because the main way users are exposed to <code>ASCII-8BIT</code> is through <code>EncodingError</code>.</p>
<blockquote>
<p>It seems it introduces another inconsistency (and possible confusion).</p>
</blockquote>
<p>Indeed, my personal belief is that <code>Encoding#name</code> is both an advanced API and one that you don't really want to use. So I think the few users that would encounter this inconsistency would have the background to not be tricked by it.</p>
<p>But ultimately this is your call.</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=965452022-02-17T13:30:12ZEregon (Benoit Daloze)
<ul></ul><p>Link to the gem-codesearch results from <a class="user active user-mention" href="https://bugs.ruby-lang.org/users/17">@ko1 (Koichi Sasada)</a>: <a href="https://hackmd.io/koJLPz4eRXKzaaDvVqji7w#Feature-18576-Rename-ASCII-8BIT-encoding-to-BINARY-byroot" class="external">https://hackmd.io/koJLPz4eRXKzaaDvVqji7w#Feature-18576-Rename-ASCII-8BIT-encoding-to-BINARY-byroot</a></p>
<p>This seems very few usages and IMHO such gems should be fixed (if they are still used, probably not for most).<br>
It's only 71 gems: <a href="https://gist.github.com/eregon/2b5de829d9aeb8b91b551fa05677b4db#file-gem-names" class="external">https://gist.github.com/eregon/2b5de829d9aeb8b91b551fa05677b4db#file-gem-names</a></p>
<p><code>str.encoding.name == "ASCII-8BIT"</code> is also needlessly slow and brittle.</p>
<p>It seems many matches are about old versions of rack/lint.rb and that's already fixed since <a href="https://github.com/rack/rack/pull/982" class="external">https://github.com/rack/rack/pull/982</a>.<br>
nokogiri still uses it but that could be easily fixed: <a href="https://github.com/sparklemotion/nokogiri/blob/e324a91477fe3b199c95b52c3985647dd2aeb847/lib/nokogiri/html5/document.rb#L33" class="external">https://github.com/sparklemotion/nokogiri/blob/e324a91477fe3b199c95b52c3985647dd2aeb847/lib/nokogiri/html5/document.rb#L33</a></p>
<p>IMHO from a compatibility perspective it would be fair enough to change the Encoding#name too.<br>
But I guess others will disagree, so I believe @byroot's proposal is still a big step forward (i.e. adding <code>def Encoding::BINARY.name; 'ASCII-8BIT'; end</code> or so for compatibility).</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=965482022-02-17T13:58:45Zmatz (Yukihiro Matsumoto)matz@ruby.or.jp
<ul><li><strong>Status</strong> changed from <i>Rejected</i> to <i>Open</i></li></ul><p>Making <code>Encoding#name</code> to return the name different from the encoding name is unacceptable.<br>
Besides that, in general, compatibility issue is hard to estimate beforehand, so we tend to be very conservative.<br>
If you (or someone) estimate the compatibility issue is minimal, and want to experiment to see if it's true during pre-release, I'd say go.<br>
Will you?</p>
<p>Matz.</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=965492022-02-17T14:00:34Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><blockquote>
<p>Will you?</p>
</blockquote>
<p>I'd like to champion this. I already started opening pull requests on the affected gems.</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=965522022-02-17T15:34:51Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><p>Ok, so I went over all 71 matches after filtering vendored code: <a href="https://gist.github.com/casperisfine/5a26c7b85f7d15c4acd63d62d67eafbb" class="external">https://gist.github.com/casperisfine/5a26c7b85f7d15c4acd63d62d67eafbb</a></p>
<p>I opened 31 pull requests, all where trivial changes <code>str.encoding.name == ""</code> -> <code>str.encoding == Encoding::BINARY</code> with the notable exception of <code>vcr</code> because it store the encoding names in files.</p>
<p>The vast majority of the matches are abandoned gems with no update since 2013 or older ( I still opened PRs when I could). Some are even just old versions of <code>rack</code> republished under another name.</p>
<p>The few high profiles gems impacted are:</p>
<ul>
<li>Nokogiri: patch sent</li>
<li>VCR: patch sent</li>
<li>mongo: patch sent</li>
</ul>
<p>That being said, it's impossible to measure how much proprietary code may use the same pattern.</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=965842022-02-19T10:59:23Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><p>I prepared the patch for this: <a href="https://github.com/ruby/ruby/pull/5571" class="external">https://github.com/ruby/ruby/pull/5571</a></p>
<p>If there is no objections I'd like to merge it so it's part of the upcoming 3.2.0-preview1</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=965982022-02-21T08:23:13Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><p><a class="user active user-mention" href="https://bugs.ruby-lang.org/users/13">@matz (Yukihiro Matsumoto)</a> could you confirm you are OK to merge the <code>ASCII-8BIT -> BINARY</code> rename for 3.2.0-preview1?</p>
<p>I think the earlier this happens the more likely it can go well. So far all the PR I made in gems were received very positively.</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=968932022-03-17T09:03:44Zmatz (Yukihiro Matsumoto)matz@ruby.or.jp
<ul></ul><p>The risk of compatibility has been reduced thanks to @byroot's effort, but probably there still are many applications potentially affected by the change. Considering the benefit (of being slightly more descriptive) and risk (of incompatibility), I don't think it pays.</p>
<p>Matz.</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=968942022-03-17T11:08:09ZEregon (Benoit Daloze)
<ul></ul><p>I think it's worth changing, the current name is confusing to most Ruby users, and there were only 71 gems out of 170000+ gems, and those gems were patched.<br>
It seems equally unlikely that many applications would depend on <code>enc.name == "ASCII-8BIT"</code>, and that those applications would update to latest Ruby.<br>
If we don't change it now, we will probably never change it and stay forever with that confusing name, that seems really bad for future Ruby.</p>
<p><a class="user active user-mention" href="https://bugs.ruby-lang.org/users/13">@matz (Yukihiro Matsumoto)</a> How about we try it (as experimental or so) before the preview, and based on feedback keep it or revert it?<br>
From your comment in #19 I thought that's what you offered.</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=969062022-03-17T15:06:32Zlarskanis (Lars Kanis)
<ul></ul><p>Having solved a lot of encoding issues for co-workers, especially on Windows, I'm with <a class="user active user-mention" href="https://bugs.ruby-lang.org/users/772">@Eregon (Benoit Daloze)</a>. As the programmers best friend, I think it's worth to try out this minor incompatibility. At least compared to something like the <a href="https://github.com/ruby/ruby/commit/7c738ce5e649b82bdc1305d5c347e81886ee759a" class="external">removal of rb_cData</a> which breaks lots of older gems, just for cleaning up the C-API (after 2 years of deprecation warnings).</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=1055352023-12-06T12:36:12ZEregon (Benoit Daloze)
<ul><li><strong>Target version</strong> set to <i>3.4</i></li></ul><p><a class="user active user-mention" href="https://bugs.ruby-lang.org/users/13">@matz (Yukihiro Matsumoto)</a> Could we try this again for 3.4, soon after the 3.3 release?</p>
<p>Then there is plenty of time to discover any issue related to it (probably very few as gems have been patched, and applications using encoding names instead of encoding constants are likely very old and unlikely to use a recent Ruby).</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=1057622023-12-20T08:44:41Znaruse (Yui NARUSE)naruse@airemix.jp
<ul></ul><p>I strongly object that we change Encoding#name of ASCII-8BIT encoding into "BINARY" because of compatibility.<br>
I don't want people to fix the code which are correctly running now.</p>
<p>However supporting people who newly writing a code is reasonable.<br>
I agree to add more information in Encoding#inspect and error message.</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=1061832024-01-11T10:26:07ZEregon (Benoit Daloze)
<ul></ul><p><a class="user active user-mention" href="https://bugs.ruby-lang.org/users/5">@naruse (Yui NARUSE)</a> Do you have evidence of (latest release and not ancient) gems or applications comparing <code>encoding.name</code> to <code>'ASCII-8BIT'</code>?<br>
I think it's so obviously a bad idea to compare the encoding name as a String, AFAIK there was never a valid reason to use it (over <code>enc == Encoding::BINARY</code>, which works since Ruby 1.9) and it's inefficient, brittle and unnecessary.</p>
<p>FWIW <a href="https://github.com/search?q=%22name+%3D%3D+%27ASCII-8BIT%27%22&type=code&p=1" class="external">https://github.com/search?q=%22name+%3D%3D+%27ASCII-8BIT%27%22&type=code&p=1</a> shows very few matches and it's mostly copies of old VCR code.<br>
The chance of that code running on Ruby 3.4+ seems almost nonexistent, there would likely be many more serious compatibility issues with such old code (e.g. kwargs changes).<br>
And fixing it is really easy.</p>
<p><a class="user active user-mention" href="https://bugs.ruby-lang.org/users/13">@matz (Yukihiro Matsumoto)</a> Can we experiment for 3.4?<br>
If we have pushback based on actual code then let's go more conservative, but otherwise I think we should do the clean fix here.</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=1061852024-01-11T10:30:47ZEregon (Benoit Daloze)
<ul></ul><p>Also given the efforts of <a class="user active user-mention" href="https://bugs.ruby-lang.org/users/7941">@byroot (Jean Boussier)</a> in <a href="https://bugs.ruby-lang.org/issues/18576#note-21" class="external">https://bugs.ruby-lang.org/issues/18576#note-21</a> and the offer from <a class="user active user-mention" href="https://bugs.ruby-lang.org/users/13">@matz (Yukihiro Matsumoto)</a> in <a href="https://bugs.ruby-lang.org/issues/18576#note-19" class="external">https://bugs.ruby-lang.org/issues/18576#note-19</a>, I'd like to do exactly what matz said:</p>
<blockquote>
<p>If you (or someone) estimate the compatibility issue is minimal, and want to experiment to see if it's true during pre-release, I'd say go.</p>
</blockquote>
<p>I estimate it to be minimal.<br>
We can know from the experiment if it's true or not, there are more than 11 months before 3.4, so plenty of time to discover any potential issue with it.</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=1061862024-01-11T10:35:09Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><p>I would also like to try this again for 3.4, if we do it early, the potential remaining issue will have a chance to be noticed with the first preview release.</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=1062882024-01-17T08:26:19Znaruse (Yui NARUSE)naruse@airemix.jp
<ul></ul><p>Even if you "fix" gems, the number of affected gems insists there are as many as private rails applications.<br>
Such incompatibility is not acceptable.</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=1062902024-01-17T08:36:33Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><p><a class="user active user-mention" href="https://bugs.ruby-lang.org/users/5">@naruse (Yui NARUSE)</a> no one is denying that there is private code out there that will be broken by such change. The question is how much and how hard it would be to detect and fix, and how much the change improve Ruby for its users.</p>
<p>We regularly make changes with much more breaking potential. So that alone isn't a reason to refuse the change in my opinion.</p>
<p>But if there is consensus that the cost/benefit isn't positive, then I'd like to propose again:</p>
<blockquote>
<p>We could keep Encoding#name as "ASCII-8BIT", but change Encoding#inspect and make sure EncodingError use the BINARY name in its error messages.</p>
</blockquote>
<p>But slightly modified:</p>
<p>I'd like to change <code>Encoding::BINARY.inspect</code> from <code>"#<Encoding:ASCII-8BIT>"</code> to <code>"#<Encoding:ASCII-8BIT (BINARY)>"</code>.</p>
<p>Would that be acceptable?</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=1062912024-01-17T09:19:21Zzverok (Victor Shepelev)zverok.offline@gmail.com
<ul></ul><blockquote>
<p>Such incompatibility is not acceptable.</p>
</blockquote>
<p>In all honesty, a selective application of this dogma doesn’t always look justified.<br>
For better or worse, we break compatibility constantly.</p>
<p>One of the recent telling examples was the removal of <code>File.exists?</code> (an alias of <code>.exist?</code>), which, while "deprecated a long time ago," actually</p>
<ul>
<li>broke a lot of gems/other software (because even with the "typically we have bare words as predicates" rule, it was more natural for people to write <code>exists?</code>, so while it was available, a <em>lot</em> of code was using it);</li>
<li>improved absolutely nothing in Ruby’s friendliness and learnability save for "removed a reason to ask for <code>String#starts_with?</code> and similar methods" (while, say, Rails continues to prefer third-person verbs in its core extensions, like <code>String#starts_with?</code> or <code>Range#overlaps?</code>)</li>
</ul>
<p>OTOH, renaming the unfortunately named encoding:</p>
<ul>
<li>makes Ruby friendlier (as a mentor, I saw a <em>lot</em> of people confused with <code>ASCII-8BIT</code>),</li>
<li>breaks not a lot of code: while fixing gems wouldn't fix <em>all</em> of its usages, the (minuscule) amount of gems to fix gives a good estimation of how frequently this might be a problem,</li>
<li>breaks code that mostly written in the "unexpected" way, so rethinking it might be a good idea anyway.</li>
</ul> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=1063022024-01-18T00:58:07ZDan0042 (Daniel DeLorme)
<ul></ul><p>tenderlovemaking (Aaron Patterson) wrote in <a href="#note-7">#note-7</a>:</p>
<blockquote>
<p>I think this example should raise an exception:</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="n">u</span> <span class="o">=</span> <span class="p">(</span><span class="n">b</span> <span class="o">=</span> <span class="s2">"abcde"</span><span class="p">.</span><span class="nf">force_encoding</span><span class="p">(</span><span class="s1">'ASCII-8BIT'</span><span class="p">)).</span><span class="nf">encode</span><span class="p">(</span><span class="s1">'UTF-8'</span><span class="p">)</span>
</code></pre>
</blockquote>
<p>I'm worried about the above misconception. No, this example shouldn't raise an exception, because being ascii-compatible is the entire reason there's "ASCII" in "ASCII-8BIT". If even <a class="user active user-mention" href="https://bugs.ruby-lang.org/users/73">@tenderlovemaking (Aaron Patterson)</a> can have this misconception, I would wager it's a fairly common one. And if the encoding was renamed to "BINARY" it would further encourage the misconception. We'd wind up with a kind of Frankenstein encoding that pretends to be true binary by its name, but having the behavior of ascii-compatible encodings. This thread has several people currently agreeing that the ascii-compatible behavior should not change, but if the name was changed I can easily predict some people will call for a change in behavior because the name "binary" has that overtone.</p>
<p>zverok (Victor Shepelev) wrote in <a href="#note-34">#note-34</a>:</p>
<blockquote>
<p>For better or worse, we break compatibility constantly.<br>
One of the recent telling examples was the removal of <code>File.exists?</code></p>
</blockquote>
<p>I won't say we can never break compatibility, but there's a very big qualitative difference here. If you run into <code>File.exists?</code>, the program simply crashes with NoMethodError. If you run into <code>enc.name == "ASCII-8BIT"</code> the return value changes from true to false; the program may crash later or not, the bug can remain undetected for a long time, there's a potential for corrupted data. This is 2-3 orders of magnitude harder to debug than NoMethodError. Even if not many people are affected by this, it's a very nasty kind of incompatibility.</p>
<p>byroot (Jean Boussier) wrote in <a href="#note-15">#note-15</a>:</p>
<blockquote>
<p>We could keep <code>Encoding#name</code> as <code>"ASCII-8BIT"</code>, but change <code>Encoding#inspect</code> and make sure <code>EncodingError</code> use the <code>BINARY</code> name in its error messages.</p>
</blockquote>
<p>I would really like that.</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=1063202024-01-18T15:19:11ZEregon (Benoit Daloze)
<ul></ul><p>I think everyone's opinion on the thread is pretty clear and not everyone agrees, that's fine.</p>
<p><a class="user active user-mention" href="https://bugs.ruby-lang.org/users/13">@matz (Yukihiro Matsumoto)</a> Could you decide whether it's OK to experiment with the Encoding#name changing to "BINARY" or not?<br>
If not, is @byroot's proposal in <a href="https://bugs.ruby-lang.org/issues/18576#note-33" class="external">https://bugs.ruby-lang.org/issues/18576#note-33</a> accepted?</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=1063782024-01-21T09:46:03Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><blockquote>
<p>@byroot's proposal</p>
</blockquote>
<p>To clarify what I'm proposing if the rename is not acceptable is:</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="o">>></span> <span class="no">Encoding</span><span class="o">::</span><span class="no">BINARY</span>
<span class="o">=></span> <span class="c1">#<Encoding:ASCII-8BIT></span>
</code></pre>
<p>becomes:</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="o">>></span> <span class="no">Encoding</span><span class="o">::</span><span class="no">BINARY</span>
<span class="o">=></span> <span class="c1">#<Encoding:ASCII-8BIT (BINARY)></span>
</code></pre>
<p>And:</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="o">>></span> <span class="s2">"fée"</span> <span class="o">+</span> <span class="s2">"fée"</span><span class="p">.</span><span class="nf">b</span>
<span class="p">(</span><span class="n">irb</span><span class="p">):</span><span class="mi">8</span><span class="ss">:in</span> <span class="sb">`+': incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding::CompatibilityError)
</span></code></pre>
<p>Becomes:</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="o">>></span> <span class="s2">"fée"</span> <span class="o">+</span> <span class="s2">"fée"</span><span class="p">.</span><span class="nf">b</span>
<span class="p">(</span><span class="n">irb</span><span class="p">):</span><span class="mi">8</span><span class="ss">:in</span> <span class="sb">`+': incompatible character encodings: UTF-8 and ASCII-8BIT (BINARY) (Encoding::CompatibilityError)
</span></code></pre> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=1063832024-01-22T10:15:19ZEregon (Benoit Daloze)
<ul></ul><p>I think for that last example, omitting <code>ASCII-8BIT</code> would be much clearer, also two sets of parens seems too much.<br>
So:</p>
<pre><code>(irb):8:in `+': incompatible character encodings: UTF-8 and BINARY (Encoding::CompatibilityError)
</code></pre>
<p>Otherwise we would likely still have the confusion that "ASCII" is not compatible with UTF-8 (which is untrue of course).</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=1064162024-01-24T06:47:13Zshyouhei (Shyouhei Urabe)shyouhei@ruby-lang.org
<ul></ul><p><a class="user active user-mention" href="https://bugs.ruby-lang.org/users/5">@naruse (Yui NARUSE)</a> is actually positive for changing error messages (see <a href="#note-28">#note-28</a>). I guess everybody here is agreeing to @byroot's list of proposed changes in <a href="#note-37">#note-37</a> (except wording)?</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=1067572024-02-14T09:32:20Znaruse (Yui NARUSE)naruse@airemix.jp
<ul></ul><p>byroot (Jean Boussier) wrote in <a href="#note-33">#note-33</a>:</p>
<blockquote>
<p>I'd like to change <code>Encoding::BINARY.inspect</code> from <code>"#<Encoding:ASCII-8BIT>"</code> to <code>"#<Encoding:ASCII-8BIT (BINARY)>"</code>.</p>
<p>Would that be acceptable?</p>
</blockquote>
<p>I agree the idea.</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=1068712024-02-19T12:38:44Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><p>Proposed patch: <a href="https://github.com/ruby/ruby/pull/10018" class="external">https://github.com/ruby/ruby/pull/10018</a></p>
<p>I used my initial suggestion: <code>ASCII-8BIT (BINARY)</code>, but if the parentheses are deemed to much, I'm happy to adjust.</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=1068872024-02-19T23:02:55ZDan0042 (Daniel DeLorme)
<ul></ul><p>I've come to realize something; when an ASCII-8BIT string contains only ascii characters, it behaves exactly like a US-ASCII string and in such a case it feels unnatural to call it "binary" (at least for me). But as soon as there is a non-ascii byte, it becomes incompatible with every other encoding and then truly deserves to be called BINARY. And that's when encoding errors occur. So in error messages, "BINARY" makes perfect sense to me since the error occurs due to the string being in "binary" state rather than "ascii-only" state. The distinction may be irrelevant to others but at least it has helped me put into words and understand why it felt so uncomfortable to change the name to "BINARY". Just my 2¢</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=1068892024-02-19T23:23:56Zduerst (Martin Dürst)duerst@it.aoyama.ac.jp
<ul></ul><p>What about</p>
<pre><code>>> "fée" + "fée".b
(irb):8:in `+': incompatible character encodings: UTF-8 and BINARY (ASCII-8BIT) (Encoding::CompatibilityError)
</code></pre>
<p>This still leaves "ASCII-8BIT" in (because I think it's important to help people understand that BINARY and ASCII-8BIT are the same).</p>
<p>[It also keeps the wart of consecutive parentheticals, but that can be dealt with separately.]</p> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=1069032024-02-20T07:51:06Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><pre><code>>> "fée" + "fée".b
(irb):8:in `+': incompatible character encodings: UTF-8 and BINARY (ASCII-8BIT) (Encoding::CompatibilityError)
</code></pre>
<p>I don't mind <code>BINARY</code> being first or last. I'll adjust my PR.</p>
<p>As for the consecutive parentheteses, what about:</p>
<pre><code>>> "fée" + "fée".b
(irb):8:in `+': incompatible character encodings: UTF-8 and BINARY / ASCII-8BIT (Encoding::CompatibilityError)
</code></pre> Ruby master - Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`https://bugs.ruby-lang.org/issues/18576?journal_id=1069102024-02-20T11:16:04ZEregon (Benoit Daloze)
<ul></ul><p><code>BINARY (ASCII-8BIT)</code> seems a good compromise.</p>
<p>The <code>/</code> seems potentially confusing for:<br>
<code>incompatible character encodings: BINARY / ASCII-8BIT and EUC-JP (Encoding::CompatibilityError)</code>.<br>
So I think using parenthesis is OK and clearer than <code>/</code>.</p>