Project

General

Profile

Actions

Bug #14456

closed

Dir.glob with FNM_CASEFOLD gives ArgumentError: invalid byte sequence in UTF-8

Added by Gondolin (Damien Robert) over 3 years ago. Updated about 1 month ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:85448]

Description

With ruby 2.5.0p0, in a folder that contains a file encoded in latin-1, I get the following error:

Dir.glob('*a', File::FNM_CASEFOLD)
ArgumentError: invalid byte sequence in UTF-8

Note that Dir.glob('*', File::FNM_CASEFOLD), Dir.glob('a*', File::FNM_CASEFOLD) and Dir.glob('*a') work, so it is a bit strange that
Dir.glob('*a', File::FNM_CASEFOLD) does not.


Related issues

Has duplicate Ruby master - Bug #14455: Dir.glob with FNM_CASEFOLD gives ArgumentError: invalid byte sequence in UTF-8ClosedActions

Updated by Gondolin (Damien Robert) over 3 years ago

Apologies for the spam, my browser submitted this bug several times and I don't know how to erase the other bug reports.

Updated by shevegen (Robert A. Heiler) over 3 years ago

I don't know how to erase the other bug reports.

Not sure if it can be removed, but I think you can change the status
to "closed" (or someone from the ruby core team probably could).

To the bug report, just out of curiosity, can you avoid the UTF
problem if you change to use another encoding before calling
Dir.glob? For example, I usually use 'ISO-8859-1' mostly due to
german umlauts but the terminal/shell used, such as my case
usually mate-terminal (based on vte) these days, there is also an
option where I can decide to use UTF-8 as locale, or the current
active locale (ISO-8859-1). When I use both ISO and the locale
setting, I almost never get invalid byte sequence errors (and
of course if I sync any external input read... File.read() has
an option for specifying the :encoding, perhaps Dir.glob()
could also benefit from a hash with a key called :encoding but
I am perhaps digressing...)

Actions #3

Updated by duerst (Martin Dürst) over 3 years ago

  • Has duplicate Bug #14455: Dir.glob with FNM_CASEFOLD gives ArgumentError: invalid byte sequence in UTF-8 added

Updated by Gondolin (Damien Robert) over 3 years ago

shevegen (Robert A. Heiler) wrote:

Not sure if it can be removed, but I think you can change the status
to "closed" (or someone from the ruby core team probably could).

Someone did, thanks!

To the bug report, just out of curiosity, can you avoid the UTF
problem if you change to use another encoding before calling
Dir.glob?

Yes: Dir.glob('*a'.encode!('ISO-8859-1'), File::FNM_CASEFOLD) works.

Actions #5

Updated by Gondolin (Damien Robert) over 3 years ago

  • Description updated (diff)

Updated by jeremyevans0 (Jeremy Evans) about 2 months ago

This is still an issue in the master branch. The problem can be solved by actually ignoring File::FNM_CASEFOLD, which is something that the Dir.glob documentation says that it does. I've submitted a pull request that fixes this: https://github.com/ruby/ruby/pull/4583

Actions #7

Updated by jeremyevans (Jeremy Evans) about 1 month ago

  • Status changed from Open to Closed

Applied in changeset git|a2592702ae4c18662a162805aa06d88046742f05.


Actually ignore FNM_CASEFOLD flag in Dir.glob

This was already documented as being ignored, but it wasn't being
ignored, causing an issue in a particular case where a UTF-8
pattern was provided and a filename was tested that wasn't valid
UTF-8.

Fixes [Bug #14456]

Actions

Also available in: Atom PDF