https://bugs.ruby-lang.org/https://bugs.ruby-lang.org/favicon.ico?17113305112013-07-31T02:49:27ZRuby Issue Tracking SystemRuby master - Feature #8709: Dir.glob should return sorted file listhttps://bugs.ruby-lang.org/issues/8709?journal_id=407672013-07-31T02:49:27ZAnonymous
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Rejected</i></li></ul><p><code>Dir.glob</code> is documented to return filenames in filesystem order:</p>
<blockquote>
<p>Note that case sensitivity depends on your system, <em>as does the order in which the results are returned.</em></p>
</blockquote> Ruby master - Feature #8709: Dir.glob should return sorted file listhttps://bugs.ruby-lang.org/issues/8709?journal_id=837702020-01-11T06:28:21Zbmwiedemann (Bernhard M. Wiedemann)
<ul><li><strong>Status</strong> changed from <i>Rejected</i> to <i>Open</i></li></ul><p>There are two problems with unsorted glob:</p>
<ol>
<li>
<p>it is different from glob in C, bash and perl that all sort by default. Even GNU make finally switched back to sorted wildcard/glob ( <a href="https://savannah.gnu.org/bugs/index.php?52076" class="external">https://savannah.gnu.org/bugs/index.php?52076</a> )</p>
</li>
<li>
<p>it causes problems for reproducible builds, so that developers have to patch an infinite number of callers such as</p>
</li>
</ol>
<p><a href="https://github.com/sass/sassc-ruby/pull/178" class="external">https://github.com/sass/sassc-ruby/pull/178</a></p>
<p>to be able to get identical build results on identical OSes on different machines.</p> Ruby master - Feature #8709: Dir.glob should return sorted file listhttps://bugs.ruby-lang.org/issues/8709?journal_id=837722020-01-11T11:18:17Zhsbt (Hiroshi SHIBATA)hsbt@ruby-lang.org
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Rejected</i></li><li><strong>Backport</strong> deleted (<del><i>1.9.3: UNKNOWN, 2.0.0: UNKNOWN</i></del>)</li></ul><p>Do not update the <code>status</code> without a maintainer's decision.</p> Ruby master - Feature #8709: Dir.glob should return sorted file listhttps://bugs.ruby-lang.org/issues/8709?journal_id=837732020-01-11T11:27:54ZEregon (Benoit Daloze)
<ul><li><strong>Tracker</strong> changed from <i>Bug</i> to <i>Feature</i></li><li><strong>Status</strong> changed from <i>Rejected</i> to <i>Open</i></li><li><strong>ruby -v</strong> deleted (<del><i>ruby 1.9.3p429 (2013-05-15) [x86_64-linux] Brightbox</i></del>)</li></ul><p>I agree always sorting the result of <code>Dir.glob</code> makes sense.<br>
Non-determinism caused by Dir.glob is very annoying and IMHO doesn't feel like Ruby.<br>
I would also expect sorting is a low overhead compared to syscalls, so performance-wise I think it's not a big hit.</p>
<p>FWIW, TruffleRuby returns sorted results for <code>Dir.glob</code> since 2016.</p>
<p>hsbt (Hiroshi SHIBATA) wrote:</p>
<blockquote>
<p>Do not update the <code>status</code> without a maintainer's decision.</p>
</blockquote>
<p>How should we rediscuss this then?<br>
It's not because the documentation mentions it we should never change it.<br>
I'll reopen as a Feature.</p> Ruby master - Feature #8709: Dir.glob should return sorted file listhttps://bugs.ruby-lang.org/issues/8709?journal_id=837742020-01-11T11:32:00Zhsbt (Hiroshi SHIBATA)hsbt@ruby-lang.org
<ul></ul><p>I have no opinion about this feature.</p> Ruby master - Feature #8709: Dir.glob should return sorted file listhttps://bugs.ruby-lang.org/issues/8709?journal_id=837762020-01-11T11:33:35ZEregon (Benoit Daloze)
<ul></ul><p>Here are some benchmark results in the ruby repository:</p>
<pre><code>$ ruby -e 'p Dir["**/*"].size'
12171
$ ruby -rbenchmark -e '10.times { p Benchmark.realtime { Dir["**/*"] } }'
0.017877419999422273
0.015390422999189468
0.015255956001055893
0.015021605999208987
0.015777969998453045
0.015484851002838695
0.016179073001694633
0.015210424000542844
0.015358253996964777
0.014319942998554325
$ ruby -rbenchmark -e '10.times { p Benchmark.realtime { Dir["**/*"].sort } }'
0.017600111998035572
0.017109740001615137
0.017832364999776473
0.01726310600133729
0.018130796997866128
0.01659841600121581
0.018173008000303525
0.017528833999676863
0.017515739000373287
0.01770434499849216
</code></pre>
<p>So a bit slower but we can likely optimize further if desired.</p> Ruby master - Feature #8709: Dir.glob should return sorted file listhttps://bugs.ruby-lang.org/issues/8709?journal_id=837792020-01-11T11:35:26ZEregon (Benoit Daloze)
<ul></ul><p>I added this issue to the next meeting's agenda:<br>
<a href="https://bugs.ruby-lang.org/issues/16454" class="external">https://bugs.ruby-lang.org/issues/16454</a></p> Ruby master - Feature #8709: Dir.glob should return sorted file listhttps://bugs.ruby-lang.org/issues/8709?journal_id=837862020-01-11T20:48:05Zbmwiedemann (Bernhard M. Wiedemann)
<ul></ul><p>The benchmark numbers above show a difference of 12%</p>
<p>That is probably the worst case, because usually, globs will return fewer entries (though for some strange reason I get a 20% diff on a dir with 200 entries)</p>
<p>and usually some processing will be performed on the returned files and that will take much longer than the sorting.</p> Ruby master - Feature #8709: Dir.glob should return sorted file listhttps://bugs.ruby-lang.org/issues/8709?journal_id=838202020-01-13T09:28:30Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><p>For what it's worth I also think it should return a sorted array, because:</p>
<ul>
<li>Pretty much any rubyist I know have been been bitten by this at least once.</li>
<li>Many experienced rubyist end up always writing <code>Dir[patten].sort</code>
</li>
<li>It's particularly prevalent because the "develop on OSX, deploy on Linux" combo is very popular.</li>
</ul>
<p>If the performance impact is a concern, I think an extra keyword argument could be added: <code>glob( pattern, [flags], [base: path], [sort: true] )</code>, this way you can avoid the performance impact if you know that you don't need it.</p> Ruby master - Feature #8709: Dir.glob should return sorted file listhttps://bugs.ruby-lang.org/issues/8709?journal_id=838212020-01-13T10:04:41Zdeivid (David Rodríguez)
<ul></ul><p>I got bit by this in the past too when trying to reproduce order dependent test failures (<a href="https://github.com/rubygems/rubygems/pull/2626#discussion_r254020218" class="external">https://github.com/rubygems/rubygems/pull/2626#discussion_r254020218</a>).</p> Ruby master - Feature #8709: Dir.glob should return sorted file listhttps://bugs.ruby-lang.org/issues/8709?journal_id=838262020-01-13T18:49:44Zjhawthorn (John Hawthorn)
<ul></ul><p>One potential issue with this is that though globs which scanned directories (ex. <code>Dir.glob("foo/*")</code>) would return results in an inconsistent order, globs which used purely brace expansion (ex. <code>Dir.glob("foo/{a,b,c,d}")</code>) would return values predictably in the order listed.</p>
<p>Rails versions prior to 6.0 unfortunately relied on this behaviour (6.0+ in <em>most</em> cases doesn't and does sorting manually). It probably shouldn't have relied on it, but it did, and I fear other libraries or tools may have done the same.</p>
<p>We could possibly work around that by sorting when reading directory entries rather than sorting the full result, but that's more complicated to implement and would be hard to document as an exact behaviour developers can expect/rely upon.</p> Ruby master - Feature #8709: Dir.glob should return sorted file listhttps://bugs.ruby-lang.org/issues/8709?journal_id=838392020-01-14T06:27:56Znaruse (Yui NARUSE)naruse@airemix.jp
<ul></ul><blockquote>
<p>the Principle of Least Astonishment.</p>
</blockquote>
<p>You shouldn't use "the Principle of Least Astonishment".<br>
Without the term you need to explain why the current behavior is bad and need to change.</p>
<p>For example ...<br>
the result of Dir.glob depends a OS and filesystem. People often wrongly write code which depends their local environment.<br>
Though people should carefully write portable code, could we provide a guard to protect people from such pitfalls?<br>
Many people write specs which compare the result of Dir.glob and an expected array, and fails.<br>
If Dir.glob sort the result, people can avoid pitfalls and reduce the cost of writing such specs.</p> Ruby master - Feature #8709: Dir.glob should return sorted file listhttps://bugs.ruby-lang.org/issues/8709?journal_id=838432020-01-14T07:27:40Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul></ul><p>Hi <a class="user active user-mention" href="https://bugs.ruby-lang.org/users/11657">@jhawthorn (John Hawthorn)</a>, I'm unsure whether you agree with the proposal or not. Do you mean sorting the result may break Rails? Or not sorting the result may do so, i.e., are you against the change?</p> Ruby master - Feature #8709: Dir.glob should return sorted file listhttps://bugs.ruby-lang.org/issues/8709?journal_id=838552020-01-14T10:40:54ZEregon (Benoit Daloze)
<ul></ul><p><a class="user active user-mention" href="https://bugs.ruby-lang.org/users/11657">@jhawthorn (John Hawthorn)</a> Good point, I forgot to mention this.</p>
<p>The sorting must respect explicit order for <code>{...,...}</code> and conceptually the same as sorting just after readdir(3), not on the full result to be correct.<br>
That's also likely more efficient, due to sorting smaller arrays.<br>
ruby/spec already captures this, 3 specs fail if sorting is done on the returned array instead of per directory.</p> Ruby master - Feature #8709: Dir.glob should return sorted file listhttps://bugs.ruby-lang.org/issues/8709?journal_id=838562020-01-14T10:44:37ZEregon (Benoit Daloze)
<ul></ul><p>Even C's glob(3) is sorted (by default), as <a class="user active user-mention" href="https://bugs.ruby-lang.org/users/11544">@bmwiedemann (Bernhard M. Wiedemann)</a> said:</p>
<pre><code>$ man 3 glob
...
GLOB_NOSORT
Don't sort the returned pathnames. The only reason to do this is to save processing time. By default, the returned path‐
names are sorted.
</code></pre> Ruby master - Feature #8709: Dir.glob should return sorted file listhttps://bugs.ruby-lang.org/issues/8709?journal_id=838942020-01-16T05:22:01Znobu (Nobuyoshi Nakada)nobu@ruby-lang.org
<ul></ul><p>I'm for adding <code>NOSORT</code> option to the second argument.</p> Ruby master - Feature #8709: Dir.glob should return sorted file listhttps://bugs.ruby-lang.org/issues/8709?journal_id=839002020-01-16T06:03:27Zmatz (Yukihiro Matsumoto)matz@ruby.or.jp
<ul></ul><p>Accepted. We will add <code>sort: false</code> keyword option to disable sorting.</p>
<p>Matz.</p> Ruby master - Feature #8709: Dir.glob should return sorted file listhttps://bugs.ruby-lang.org/issues/8709?journal_id=839292020-01-16T15:12:24ZDan0042 (Daniel DeLorme)
<ul></ul><p>It's good to sort the result of <code>Dir["*"]</code>, but as jhawthorn pointed out the brace expansion <em>must</em> keep the same order. I have code that depends on this, and I'm sure many others also have code that depend on this, since it's the behavior found in the shell:</p>
<pre><code> $ touch a2 a1 a0 b2 b1 b0
$ echo {a,b}?
a0 a1 a2 b0 b1 b2
$ echo {b,a}?
b0 b1 b2 a0 a1 a2
</code></pre> Ruby master - Feature #8709: Dir.glob should return sorted file listhttps://bugs.ruby-lang.org/issues/8709?journal_id=839692020-01-19T06:54:20Znobu (Nobuyoshi Nakada)nobu@ruby-lang.org
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Closed</i></li></ul><p>Applied in changeset <a class="changeset" title="Sort globbed results by default [Feature #8709] Sort the results which matched single wildcard o..." href="https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/2f1081a451f21ca017cc9fdc585883e5c6ebf618">git|2f1081a451f21ca017cc9fdc585883e5c6ebf618</a>.</p>
<hr>
<p>Sort globbed results by default [Feature <a class="issue tracker-2 status-5 priority-4 priority-default closed" title="Feature: Dir.glob should return sorted file list (Closed)" href="https://bugs.ruby-lang.org/issues/8709">#8709</a>]</p>
<p>Sort the results which matched single wildcard or character set in<br>
binary ascending order, unless <code>sort: false</code> is given. The order<br>
of an Array of pattern strings and braces are not affected.</p>