Project

General

Profile

Actions

Bug #7267

closed

Dir.glob on Mac OS X returns unexpected string encodings for unicode file names

Added by kennygrant (Kenny Grant) about 12 years ago. Updated over 11 years ago.

Status:
Closed
Target version:
ruby -v:
ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin11.4.0]
Backport:
[ruby-core:48745]

Description

Tested on Ruby 1.9.3-p194 and ruby-2.0.0-preview1 on Mac OS X 10. 7.5

When calling file system methods with Ruby on Mac OS X, it is not possible to manipulate the resulting file name as a normal UTF-8 string, even though it reports the encoding as UTF-8. It seems to be a UTF-8-MAC string, even when the default encoding is set to UTF-8. This leads to confusion as the string can be manipulated normally except for any unicode characters, which seem to be decomposed. So a regexp using utf-8 characters won't work on the string, unless it is first converted from UTF-8-MAC. I'd expect the string encoding to be UTF-8, or at least to report that it is not a normal UTF-8 string if it has to be UTF-8-MAC for some reason.

Example, run with a file called Testé.txt in the same folder:

def transform_string s
puts "Testing string #{s}"
puts s.gsub(/é/,'TEST')
end

Dir.glob("./*.txt").each do |f|
puts "Inline string works as expected"
s = "./Testé.txt"
puts transform_string s

puts "File name from Dir.glob does not"
puts transform_string f

puts "Encoded file name works as expected, though it is reported as UTF-8, not UTF-8-MAC"
f.encode!('UTF-8','UTF-8-MAC')
puts transform_string f
end


Files

test.rb (926 Bytes) test.rb Test script kennygrant (Kenny Grant), 11/02/2012 07:54 PM
Testé.txt (21 Bytes) Testé.txt Test file with UTF-8 name kennygrant (Kenny Grant), 11/02/2012 07:54 PM
results.txt (1.09 KB) results.txt kennygrant (Kenny Grant), 11/02/2012 08:32 PM
writer.rb (221 Bytes) writer.rb kennygrant (Kenny Grant), 11/03/2012 07:50 AM

Related issues 3 (0 open3 closed)

Related to Ruby master - Bug #2154: filesystem encoding of UNIXRejectednaruse (Yui NARUSE)Actions
Related to Ruby master - Feature #7280: How to set filesystem encodingClosednaruse (Yui NARUSE)Actions
Related to Ruby master - Feature #10084: Add Unicode String Normalization to String classClosedduerst (Martin Dürst)Actions
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0