Bug #9712
closedDir.entries replace Unicode character with questionmarks
Description
My basis when testing this is that I have a computer with English OS - codepage Windows-1252. The tests might yield different result if the Windows codepage is different - so please pay attention to that if you are unable to reproduce.
Given a folder named "Foo" which contains a sub-folder "てすと" ("\u3066\u3059\u3068") Dir.entries("Foo") will return:
[".", "..", "???"]
The characters that doesn't fit my filesystem codepage is translated into question marks.
I would have expected the strings returned to be in some Unicode format.
Updated by usa (Usaku NAKAMURA) over 9 years ago
- Status changed from Open to Rejected
check Dir.entries('Foo', encoding: 'utf-8')
Updated by thomthom (Thomas Thomassen) over 9 years ago
Usaku NAKAMURA wrote:
check Dir.entries('Foo', encoding: 'utf-8')
Ah, well that worked. I'd been referring to the Ruby 2.0.0 docs where this argument is missing:
http://www.ruby-doc.org/core-2.0/Dir.html#method-c-entries
But why is this needed?
On my machine it returns the strings by default in Windows-1252 - which is the same as File.find('filesystem'). I guess it returns it based on that? But for Windows this is really awkward. Windows-1252 is the compatibility codepage - but the file system itself is perfectly capable of handling Unicode characters.
I see Ruby explicitly calls the W versions of the Windows file functions instead of declaring the UNICODE flag - this makes all system calls treat Ruby with compatibility handling.
The Windows file system isn't actually Windows-1252 encoded - or any other encoding ruby currently reports. It's all Unicode - I can use any character I like, so why isn't Ruby just returning result from file functions as Unicode?
Updated by zzak (Zak Scott) over 9 years ago
- Category changed from platform/windows to doc
- Status changed from Rejected to Assigned
- Assignee changed from windows to zzak (Zak Scott)
- Priority changed from 5 to Normal
Updated by thomthom (Thomas Thomassen) over 9 years ago
Additional info:
I've made the RB files have # encoding: UTF-8
and set -E UTF-8:UTF-8
which from my understanding of the documentation should affect the encoding returned by Dir
:
* call-seq:
* Encoding.default_internal -> enc
*
* Returns default internal encoding. Strings will be transcoded to the
* default internal encoding in the following places if the default internal
* encoding is not nil:
*
* * File names from Dir
But I'm not seeing this behaviour.
Updated by naruse (Yui NARUSE) over 9 years ago
- Backport changed from 2.0.0: UNKNOWN, 2.1: UNKNOWN to 2.0.0: DONTNEED, 2.1: DONTNEED
Thomas Thomassen wrote:
Usaku NAKAMURA wrote:
check Dir.entries('Foo', encoding: 'utf-8')
Ah, well that worked. I'd been referring to the Ruby 2.0.0 docs where this argument is missing:
http://www.ruby-doc.org/core-2.0/Dir.html#method-c-entriesBut why is this needed?
On my machine it returns the strings by default in Windows-1252 - which is the same as File.find('filesystem'). I guess it returns it based on that?
yes.
But for Windows this is really awkward. Windows-1252 is the compatibility codepage - but the file system itself is perfectly capable of handling Unicode characters.
I see Ruby explicitly calls the W versions of the Windows file functions instead of declaring the UNICODE flag - this makes all system calls treat Ruby with compatibility handling.
The Windows file system isn't actually Windows-1252 encoded - or any other encoding ruby currently reports. It's all Unicode - I can use any character I like, so why isn't Ruby just returning result from file functions as Unicode?
- Ruby side: many part of Ruby implementation already uses W version API but some part are not. therefore for consistency it is still ANSI based
- User side: there's many legacy code which imply ANSI strings
Ruby must migrate to Unicode on some day future, but we haven't done yet.
Updated by zzak (Zak Scott) about 9 years ago
- Status changed from Assigned to Closed
I think the current documentation explains this well, so closing.
/*
* call-seq:
* Dir.entries( dirname ) -> array
* Dir.entries( dirname, encoding: enc ) -> array
*
* Returns an array containing all of the filenames in the given
* directory. Will raise a <code>SystemCallError</code> if the named
* directory doesn't exist.
*
* The optional <i>enc</i> argument specifies the encoding of the directory.
* If not specified, the filesystem encoding is used.
*
* Dir.entries("testdir") #=> [".", "..", "config.h", "main.rb"]
*
*/
Updated by thomthom (Thomas Thomassen) about 9 years ago
As mentioned the documentation for version 2.0 is missing this. SketchUp 2014 embedded Ruby 2.0 and developers usually refer to the documentation for the version embedded. Any chance to see that documentation updated? It's causing a bit of confusion and issues.
Zachary Scott wrote:
I think the current documentation explains this well, so closing.
/* * call-seq: * Dir.entries( dirname ) -> array * Dir.entries( dirname, encoding: enc ) -> array * * Returns an array containing all of the filenames in the given * directory. Will raise a <code>SystemCallError</code> if the named * directory doesn't exist. * * The optional <i>enc</i> argument specifies the encoding of the directory. * If not specified, the filesystem encoding is used. * * Dir.entries("testdir") #=> [".", "..", "config.h", "main.rb"] * */
Updated by zzak (Zak Scott) about 9 years ago
- Status changed from Closed to Assigned
- Backport changed from 2.0.0: DONTNEED, 2.1: DONTNEED to 2.0.0: REQUIRED, 2.1: DONTNEED
In that case, we need to ask for a backport of r42058; this must be done by usa the maintainer of 2.0 series.
Thank you for the report.
Updated by jeremyevans0 (Jeremy Evans) about 4 years ago
- Status changed from Assigned to Closed