Bug #4343

Dir.glob does match files without extension

Added by Vit Ondruch about 4 years ago. Updated almost 4 years ago.

[ruby-core:34970]
Status:Rejected
Priority:Normal
Assignee:-
ruby -v:ruby 1.9.3dev (2011-01-28) [i386-mingw32] Backport:

Description

=begin
C:\temp\pat>dir bla.*
Svazek v jednotce C je Windows7_x64_OS.
Sériové číslo svazku je 2C6E-5F69.

Výpis adresáře C:\temp\pat

29.01.2011 15:37 0 bla
29.01.2011 15:37 0 bla.rb
Souborů: 2, Bajtů: 0
Adresářů: 0, Volných bajtů: 21 453 963 264

C:\temp\pat>ruby -e "p Dir.glob('bla.*')"
["bla.rb"]
=end

History

#1 Updated by Nobuyoshi Nakada about 4 years ago

=begin
Yes, it is a expected result, as you specified the pattern with extension.

=end

#2 Updated by Nobuyoshi Nakada about 4 years ago

  • Status changed from Open to Feedback

=begin

=end

#3 Updated by Vit Ondruch about 4 years ago

=begin
Well by displaying the "dir" output I tried to point out that it is not that expected.

Btw. these are other not so logical outputs:

c:\temp>ruby -e "p Dir.glob('bla.{,*}')"
["bla.", "bla.rb"]

c:\temp>ruby -e "p Dir.glob('bla')"
["bla"]

Why there is for the first case listed "bla." instead of just "bla".
=end

#4 Updated by Nobuyoshi Nakada about 4 years ago

=begin

Why there is for the first case listed "bla." instead of just "bla".

Because you gave "bla.". Dir.glob respects the given pattern as possible.
And "bla." and "bla" are same on NTFS.
=end

#5 Updated by Vit Ondruch about 4 years ago

=begin
If they are the same, then my original scenario has to list also "bla", otherwise you are not consistent.
=end

#6 Updated by Jeremy Bopp about 4 years ago

=begin
On 01/29/2011 10:19 AM, Nobuyoshi Nakada wrote:

Issue #4343 has been updated by Nobuyoshi Nakada.

Why there is for the first case listed "bla." instead of just "bla".

Because you gave "bla.". Dir.glob respects the given pattern as possible.
And "bla." and "bla" are same on NTFS.

Wouldn't that fact imply that "bla.*" should also match "bla" in that
case? I think that is where the confusion lies. Some globs, such as
"bla.*", won't match "bla" while others, such as "bla.", will match.

I would argue that for compatibility reasons, globbing on "bla." should
not match even on Windows. Globbing under the Windows cmd shell is
different than under Unix shells, but Ruby should enforce a single
globbing strategy for all supported platforms for consistency.

-Jeremy

=end

#7 Updated by Jeremy Bopp about 4 years ago

=begin
On 01/29/2011 10:33 AM, Vít Ondruch wrote:

Dne 29.1.2011 17:27, Jeremy Bopp napsal(a):

On 01/29/2011 10:19 AM, Nobuyoshi Nakada wrote:

Issue #4343 has been updated by Nobuyoshi Nakada.

Why there is for the first case listed "bla." instead of just "bla".
Because you gave "bla.". Dir.glob respects the given pattern as
possible.
And "bla." and "bla" are same on NTFS.
Wouldn't that fact imply that "bla.*" should also match "bla" in that
case? I think that is where the confusion lies. Some globs, such as
"bla.*", won't match "bla" while others, such as "bla.", will match.

I would argue that for compatibility reasons, globbing on "bla." should
not match even on Windows. Globbing under the Windows cmd shell is
different than under Unix shells, but Ruby should enforce a single
globbing strategy for all supported platforms for consistency.

-Jeremy

Well ruby should enforce as much as it can, but you should remember that
while you can do this on linux:

vita@vita-desktop:~$ echo something > bla
vita@vita-desktop:~$ echo something > bla.
vita@vita-desktop:~$ ls bla*
bla bla.

the windows version will have different result no matter what:

C:\temp>echo something > bla

C:\temp>echo something > bla.

C:\temp>dir bla*
Svazek v jednotce C je Windows7_x64_OS.
Sériové číslo svazku je 2C6E-5F69.

Výpis adresáře C:\temp

29.01.2011 17:31 12 bla
Souborů: 1, Bajtů: 12
Adresářů: 0, Volných bajtů: 21 459 759 104

C:\temp>dir bla*.*
Svazek v jednotce C je Windows7_x64_OS.
Sériové číslo svazku je 2C6E-5F69.

Výpis adresáře C:\temp

29.01.2011 17:31 12 bla
Souborů: 1, Bajtů: 12
Adresářů: 0, Volných bajtů: 21 459 759 104

My ultimate argument is that Ruby should support only Unix-style
globbing, except in cases where a cross platform extension to globbing
is implemented. Globbing under the Windows cmd shell is slightly
broken, IMHO. I agree that Windows users expect globbing to be
implemented like that, but having slightly different globbing
implementations in Ruby makes it much more difficult than it needs to be
to write cross platform Ruby scripts.

I could possibly see adding a new flag for globbing that enables
Windows-style globbing for any platform, but that should never be
enabled by default. Most scripts using globbing probably expect
Unix-style globbing since that is basically what we have now. Enabling
Windows-style globbing by default would likely introduce subtle defects
in those scripts.

-Jeremy

=end

#8 Updated by Loren Segal about 4 years ago

=begin

On 1/29/2011 12:09 PM, Vít Ondruch wrote:

Well glob should behave on Windows by Windows conventions and on Unix
by Unix conventions. It is like if you will insist that file creation
has to behave the same way on Unix as on Windows. See my previous
example.

As pointed out by Jeremy, there are obvious cross-platform compatibility
issues with supporting different globbing formats on different
platforms. It makes much more sense to pick one style and stick with it.
Consider the following naively implemented script in UNIX:

  # directory contents are: README, README.ext, README.jp
  # we want to delete everything but README
  Dir.glob("README.*").each {|f| File.unlink(f) }

This script would fail in Windows if README matched, namely, it would
violate the reasonable expectations that every other environment has.
You don't want scripts to work on one platform and then fail miserably
in another.

I think offering an optional argument to glob to turn on Windows style
globbing would be a reasonable compromise, but I don't think Dir.glob
should have Windows behaviour by default, even in Windows.

  • Loren

=end

#9 Updated by Jeremy Bopp about 4 years ago

=begin
On 01/29/2011 11:09 AM, Vít Ondruch wrote:

Well glob should behave on Windows by Windows conventions and on Unix by
Unix conventions. It is like if you will insist that file creation has
to behave the same way on Unix as on Windows. See my previous example.

Where possible, Ruby scripts should see Ruby as the platform, not
Linux, not OSX, and not Windows. Obviously, there will be times when
Ruby can't cover the gaps between systems or where people want/need to
do something in a platform specific way (play with the Windows registry
much?), and Ruby should support addressing those needs. However,
globbing is one of those cases where Ruby can offer a consistent
implementation with relative ease.

Another example: Dir.glob 'bla*' is ultimately something different on
Unix and Windows. I am not sure how would you like to workaround this ....

Again, my argument is that Windows globbing should be disregarded
entirely within Ruby. Windows globbing is implemented in the cmd shell,
and Ruby has no obligation to emulate that. The only reason there is
confusion here is because the Windows implementation is just close
enough to the Ruby implementation to fool people into thinking that the
two are actually or should be equivalent when they are definitely and
sometimes dangerously not.

As a writer of cross platform Ruby scripts, I absolutely hate it when I
have to go out of my way to handle equivalent tasks in platform specific
ways. It increases my support and testing burden considerably, so I
much prefer that my platform (Ruby in this case) handles the differences
for me whenever feasible. As far as globbing goes, a consistent
implementation allows me to focus more on my application logic than
coding platform specific guards that rework my globs for every platform
that has a variance.

I guess I could summarize all of this by saying that Ruby is not the cmd
shell, so don't try to treat it as such. :-)

-Jeremy

=end

#10 Updated by Vit Ondruch about 4 years ago

=begin
Ok, the confusion comes from the differences between platforms. Lets have file named "foo.bar".

On Windows, the filename consist of two parts, the filename "foo" and the extension "bar". These two parts are traditionally separated by dot.

On Unices, there is just filename "foo.bar", one string, no extensions etc.

Ruby should choose to treat the filename either as a whole string, i.e. the Unix way, or try to support platform specific behavior. Currently, it is mixture:

ruby -e "p Dir.glob('bla.*')"
["bla.rb"]

behaves differently from:

ruby -e "p Dir.glob('bla.{,*}')"
["bla.", "bla.rb"]

In first case, Ruby treats the glob pattern as a string, i.e. Unix, nothing know about something like extension, while in the second case it suddenly knows something about extensions and what they are.
=end

#11 Updated by Heesob Park about 4 years ago

=begin
Here is some more results of Dir.glob on Windows.

irb(main):001:0> Dir.glob('bla*')
=> ["bla", "bla.rb"]
irb(main):002:0> Dir.glob('bla.rb')
=> ["bla.rb"]
irb(main):003:0> Dir.glob('bla.rb...........')
=> ["bla.rb..........."]
irb(main):004:0> Dir.glob('bla.rb ')
=> ["bla.rb "]
irb(main):005:0> Dir.glob('bla.rb>>>>>>>>>>>')
=> ["bla.rb>>>>>>>>>>>"]
irb(main):006:0> Dir.glob('bla.rb<<<<<<<<<<<')
=> ["bla.rb<<<<<<<<<<<"]
irb(main):007:0> Dir.glob('bla.rb>>>>.......')
=> ["bla.rb>>>>......."]
irb(main):008:0> Dir.glob('bla.rb ....')
=> ["bla.rb ...."]
irb(main):009:0> Dir.glob('bla.rb<< ....')
=> ["bla.rb<< ...."]

=end

#12 Updated by mathew murphy about 4 years ago

=begin
On Sat, Jan 29, 2011 at 11:49, Jeremy Bopp jeremy@bopp.net wrote:

Where possible, Ruby scripts should see Ruby as the platform, not
Linux, not OSX, and not Windows.

If that's true, are Ruby filenames case-sensitive or not? And are they
case-preserving or not?

mathew
--
URL:http://www.pobox.com/~meta/

=end

#13 Updated by Jeremy Bopp about 4 years ago

=begin
On 01/30/2011 06:30 PM, mathew wrote:

On Sat, Jan 29, 2011 at 11:49, Jeremy Bopp jeremy@bopp.net wrote:

Where possible, Ruby scripts should see Ruby as the platform, not
Linux, not OSX, and not Windows.

If that's true, are Ruby filenames case-sensitive or not? And are they
case-preserving or not?

The handling of file names is dependent on the underlying filesystem.
That means that file names are case-insensitive and case-preserving by
default on FAT and NTFS (used by Windows) and on HFSX (used by default
for OSX, I think). Keep in mind though that it's possible to load NTFS
in a case-sensitive mode by tweaking a registry key somewhere, and I've
heard you can do something similar for HFSX. Therefore, you (and Ruby
for that matter) can't simply assume on any given platform that you'll
have one behavior or another. :-)

-Jeremy

=end

#14 Updated by mathew murphy about 4 years ago

=begin
On Sun, Jan 30, 2011 at 23:07, Jeremy Bopp jeremy@bopp.net wrote:

On 01/30/2011 06:30 PM, mathew wrote:

On Sat, Jan 29, 2011 at 11:49, Jeremy Bopp jeremy@bopp.net wrote:

Where possible, Ruby scripts should see Ruby as the platform, not
Linux, not OSX, and not Windows.

If that's true, are Ruby filenames case-sensitive or not? And are they
case-preserving or not?

The handling of file names is dependent on the underlying filesystem.
That means that file names are case-insensitive and case-preserving by
default on FAT and NTFS (used by Windows) and on HFSX (used by default
for OSX, I think).

So if "it depends on the OS and filesystem" is the right answer for
case sensitivity, why isn't it the right answer for how file globs
work?

mathew
--
URL:http://www.pobox.com/~meta/

=end

#15 Updated by Jeremy Bopp about 4 years ago

=begin
On 02/03/2011 09:38 PM, mathew wrote:

On Sun, Jan 30, 2011 at 23:07, Jeremy Bopp jeremy@bopp.net wrote:

On 01/30/2011 06:30 PM, mathew wrote:

On Sat, Jan 29, 2011 at 11:49, Jeremy Bopp jeremy@bopp.net wrote:

Where possible, Ruby scripts should see Ruby as the platform, not
Linux, not OSX, and not Windows.

If that's true, are Ruby filenames case-sensitive or not? And are they
case-preserving or not?

The handling of file names is dependent on the underlying filesystem.
That means that file names are case-insensitive and case-preserving by
default on FAT and NTFS (used by Windows) and on HFSX (used by default
for OSX, I think).

So if "it depends on the OS and filesystem" is the right answer for
case sensitivity, why isn't it the right answer for how file globs
work?

Globs aren't implemented by the OS or the filesystem. They're
implemented by applications that run upon them. Ruby has one
implementation of globbing, and cmd has another one.

Ruby can't do anything about the filesystem implementation. There is no
way to completely hide the differences between case-sensitive and
case-insensitive filesystems. As a result, you as a programmer are
forced to be aware of those potential issues if you want to write a
cross platform application no matter what tools you use.

-Jeremy

=end

#16 Updated by mathew murphy about 4 years ago

=begin
On Thu, Feb 3, 2011 at 22:29, Jeremy Bopp jeremy@bopp.net wrote:

Globs aren't implemented by the OS or the filesystem.

Wrong.

% man -s3 glob

GLOB(3) Linux Programmer's Manual GLOB(3)

NAME
glob, globfree - find pathnames matching a pattern, free memory from
glob()

SYNOPSIS
#include

    int glob(const char *pattern, int flags,
             int (*errfunc) (const char *epath, int eerrno),
             glob_t *pglob);
    void globfree(glob_t *pglob);

[...]

And on Windows:
http://msdn.microsoft.com/en-us/library/8bch7bkk.aspx

mathew
--
URL:http://www.pobox.com/~meta/

=end

#17 Updated by Jeremy Bopp about 4 years ago

=begin
On 02/04/2011 10:33 AM, mathew wrote:

On Thu, Feb 3, 2011 at 22:29, Jeremy Bopp jeremy@bopp.net wrote:

Globs aren't implemented by the OS or the filesystem.

Wrong.

% man -s3 glob

GLOB(3) Linux Programmer's Manual GLOB(3)

NAME
glob, globfree - find pathnames matching a pattern, free memory from
glob()

SYNOPSIS
#include

   int glob(const char *pattern, int flags,
            int (*errfunc) (const char *epath, int eerrno),
            glob_t *pglob);
   void globfree(glob_t *pglob);

[...]

That implementation is provided by libc:

$ dpkg -S glob.h
libc6-dev: /usr/include/glob.h

As a result, we are free to ignore it, and in fact Ruby must ignore it
because Ruby has some extensions not supported by this function. We
don't have such an option when it comes to the case-sensitivity of
filesystems. Feel free to try to prove me wrong on that point though.

And on Windows:
http://msdn.microsoft.com/en-us/library/8bch7bkk.aspx

The same is true on Windows. The implementation of glob functionality
is something that the programming environment is able to ignore simply
by implementing its own. Furthermore, what you linked does not appear
to be a generic globbing function. I think it will only work for
expanding program arguments, so you probably can't use it within your
program to scan directory contents using a glob.

The question that the Ruby community has to answer is how it wants to
handle something like globbing where canonical tools included with one
platform behave differently than equivalent tools on another one do.

My argument is that where possible, such as with globbing, Ruby should
pick a single implementation to be the default on all platforms. It
doesn't matter what the default is as long as it's functional and as
consistently implemented as possible on the platform. This aids writing
cross platform scripts since there are fewer gotchyas when porting the
scripts. If Ruby chose instead to implement platform specific quirks by
default, the script writer would have yet another set of things for
which he/she must account during development and testing.

Now, if there was enough desire, it should be possible to implement
platform-specific quirks; however, their use should be completely
optional and never the default.

-Jeremy

=end

#18 Updated by mathew murphy about 4 years ago

=begin
On Fri, Feb 4, 2011 at 12:11, Jeremy Bopp jeremy@bopp.net wrote:

On 02/04/2011 10:33 AM, mathew wrote:

% man -s3 glob
[...]
That implementation is provided by libc:

And libc is part of Unix. The glob() library call is in POSIX.

As a result, we are free to ignore it, and in fact Ruby must ignore it
because Ruby has some extensions not supported by this function.

Just because you implement extensions doesn't mean you need to be
gratuitously incompatible with the POSIX interface.

We don't have such an option when it comes to the case-sensitivity of
filesystems.

Sure we do. Many software packages implement workarounds for
filesystem case-sensitivity, length limits, character set limits, and
other OS-dependent behavior. To pick three random examples:

http://oreilly.com/catalog/samba/chapter/book/ch05_04.html
http://netatalk.sourceforge.net/2.1/htmldocs/AppleVolumes.default.5.html
http://commons.apache.org/io/apidocs/org/apache/commons/io/comparator/NameFileComparator.html

The same is true on Windows.  The implementation of glob functionality
is something that the programming environment is able to ignore simply
by implementing its own.

By the same argument, we can ignore anything and implement it all
again. But people tend to hate languages that do that, like Java.

 Furthermore, what you linked does not appear
to be a generic globbing function.  I think it will only work for
expanding program arguments, so you probably can't use it within your
program to scan directory contents using a glob.

I didn't claim otherwise. There are other functions for scanning
directory contents, and they also provide globbing at the OS level.
http://msdn.microsoft.com/en-us/library/aa364418(v=vs.85).aspx

The question that the Ruby community has to answer is how it wants to
handle something like globbing where canonical tools included with one
platform behave differently than equivalent tools on another one do.

My argument is that where possible, such as with globbing, Ruby should
pick a single implementation to be the default on all platforms.

Right. It's the same issue that surrounds many other things, including
processes, threads, locales and internationalization, text encodings,
and so on. That was why I brought up filesystem case-sensitivity as an
example.

There are two basic approaches languages take. The one you seem to be
advocating for is the Java approach, where you try and come up with a
single API that you can implement on every platform. The resulting API
isn't the same as the native OS one, but its behavior is the same
everywhere. Hence Java has its own locales and ignores the OS ones; it
has its own file APIs that are more restricted than the OS ones; and
so on. It limits what you can do in order to ensure portability.

The other approach is the one Ruby generally seems to take, where you
provide the native APIs, and leave it up to the developer to deal with
any platform specific behavior. Hence Ruby provides dbm, etc, fcntl,
openssl, syslog, readline, winole, win32api, and so on, all of which
can make your code non-portable.

Personally, I prefer the Ruby approach. Frankly, I don't care if
Ruby's behavior is different on Windows, because none of my Ruby code
needs to run on Windows. What I do care about is having APIs which
conform to POSIX and let me do all the Unix things I need to do, like
writing to syslog.

It doesn't matter what the default is as long as it's functional and as
consistently implemented as possible on the platform.

Oh, but it does matter. When I run a Ruby script and pass it some
fileglobs, I expect it to expand those fileglobs in the normal manner
every other program on my OS does. If it doesn't do that, that will be
a big surprise.

Your "it doesn't matter if it's not the same as the OS, as long as
it's consistent everywhere" is exactly why most people hate Java
applications.

If Ruby chose instead to implement platform specific quirks by
default, the script writer would have yet another set of things for
which he/she must account during development and testing.

That ship sailed a long time ago. Ruby implements all kinds of
platform-specific quirks; library-version-specific quirks, even.
(readline, dbm)

If you want guaranteed uniformity across platforms, you're probably
using the wrong language. You might have luck with JRuby, certainly
the JVM is going to be your best bet for a runtime.

Meanwhile, I don't see any other people eager for MRI to be more like
the JVM. In fact, the trend seems to be the opposite--one of the big
popular features added to Ruby 1.9 was native threads instead of
cross-platform Ruby threads.

mathew
--
URL:http://www.pobox.com/~meta/

=end

#19 Updated by Nobuyoshi Nakada about 4 years ago

  • Category set to core
  • Status changed from Feedback to Rejected

=begin
File.fnmatch?("bla.*", "bla") returns false, so "bla" cannot be
returned by Dir.glob("bla.*").

=end

#20 Updated by Jeremy Bopp about 4 years ago

=begin
mathew,

First of all, let me assure that I'm well aware of engineering trade
offs. I suggested that consistency be maintained where feasible, not
always and at the expense of sanity. I also suggested that extensions
would be useful if there was demand, such as adding support for
Windows-style globbing. I don't want Ruby put into a straight jacket,
but I also don't want to make it needlessly difficult for Ruby
developers to get their work done.

Your examples of managing filesystem case-sensitivity all have major
shortcomings to interoperability with non-Ruby programs when only used
within Ruby itself. Two of your solutions are used by file servers that
present a single filesystem level interface to all programs. They can
get away with the hacks they perform because all programs use the hacked
side of the interface. Going behind the file server's back can cause
trouble actually. The third solution only provides a sorting mechanism
for file lists and does not describe a way to store and retrieve file
data given a file path.

If Ruby tried to enforce a lowest common denominator filesystem approach
of any kind, Ruby programs would appear broken to the users who need
interoperability with non-Ruby programs. Few developers would use it as
a result, and you'll note that not even Java attempts to do such a
foolish thing.

You'll likely argue that Ruby having its own globbing solution appears
similarly broken. My counter is that end users do not use globbing
enough within Ruby to warrant the extra support load on Ruby developers
who would need to be cognizant of the differences for each platform.
Users' use of globbing will usually be limited to the command line where
globbing is already handled in a platform-specific way outside of the
Ruby script. I further contend that developers are more than capable of
learning Ruby's implementation and tend to appreciate the fact that they
don't have to worry about differences in implementation per platform.

Consider yourself lucky that you only need to support a single platform
with your scripts. I'm pleased to escape the old days of non-portable C
code and endless #ifdefs. If you think I'm misguided or an aberration,
ponder the reasons for the existence of Gnulib, APR, libiberty, and
similar abstraction libraries.

-Jeremy

=end

Also available in: Atom PDF