Project

General

Profile

Bug #9712

Dir.entries replace Unicode character with questionmarks

Added by thomthom (Thomas Thomassen) about 5 years ago. Updated over 1 year ago.

Status:
Assigned
Priority:
Normal
Target version:
-
ruby -v:
ruby 2.2.0dev (2014-04-07 trunk 45528) [i386-mswin32_100]
[ruby-core:61893]

Description

My basis when testing this is that I have a computer with English OS - codepage Windows-1252. The tests might yield different result if the Windows codepage is different - so please pay attention to that if you are unable to reproduce.

Given a folder named "Foo" which contains a sub-folder "てすと" ("\u3066\u3059\u3068") Dir.entries("Foo") will return:
[".", "..", "???"]

The characters that doesn't fit my filesystem codepage is translated into question marks.

I would have expected the strings returned to be in some Unicode format.

History

Updated by usa (Usaku NAKAMURA) about 5 years ago

  • Status changed from Open to Rejected

check Dir.entries('Foo', encoding: 'utf-8')

Updated by thomthom (Thomas Thomassen) about 5 years ago

Usaku NAKAMURA wrote:

check Dir.entries('Foo', encoding: 'utf-8')

Ah, well that worked. I'd been referring to the Ruby 2.0.0 docs where this argument is missing:
http://www.ruby-doc.org/core-2.0/Dir.html#method-c-entries

But why is this needed?
On my machine it returns the strings by default in Windows-1252 - which is the same as File.find('filesystem'). I guess it returns it based on that? But for Windows this is really awkward. Windows-1252 is the compatibility codepage - but the file system itself is perfectly capable of handling Unicode characters.

I see Ruby explicitly calls the W versions of the Windows file functions instead of declaring the UNICODE flag - this makes all system calls treat Ruby with compatibility handling.

The Windows file system isn't actually Windows-1252 encoded - or any other encoding ruby currently reports. It's all Unicode - I can use any character I like, so why isn't Ruby just returning result from file functions as Unicode?

Updated by zzak (Zachary Scott) about 5 years ago

  • Category changed from platform/windows to doc
  • Status changed from Rejected to Assigned
  • Assignee changed from cruby-windows to zzak (Zachary Scott)
  • Priority changed from 5 to Normal

Updated by thomthom (Thomas Thomassen) about 5 years ago

Additional info:

I've made the RB files have # encoding: UTF-8 and set -E UTF-8:UTF-8 which from my understanding of the documentation should affect the encoding returned by Dir:

 * call-seq:
 *   Encoding.default_internal -> enc
 *
 * Returns default internal encoding.  Strings will be transcoded to the
 * default internal encoding in the following places if the default internal
 * encoding is not nil:
 *
 * * File names from Dir

But I'm not seeing this behaviour.

Updated by naruse (Yui NARUSE) about 5 years ago

  • Backport changed from 2.0.0: UNKNOWN, 2.1: UNKNOWN to 2.0.0: DONTNEED, 2.1: DONTNEED

Thomas Thomassen wrote:

Usaku NAKAMURA wrote:

check Dir.entries('Foo', encoding: 'utf-8')

Ah, well that worked. I'd been referring to the Ruby 2.0.0 docs where this argument is missing:
http://www.ruby-doc.org/core-2.0/Dir.html#method-c-entries

But why is this needed?
On my machine it returns the strings by default in Windows-1252 - which is the same as File.find('filesystem'). I guess it returns it based on that?

yes.

But for Windows this is really awkward. Windows-1252 is the compatibility codepage - but the file system itself is perfectly capable of handling Unicode characters.

I see Ruby explicitly calls the W versions of the Windows file functions instead of declaring the UNICODE flag - this makes all system calls treat Ruby with compatibility handling.

The Windows file system isn't actually Windows-1252 encoded - or any other encoding ruby currently reports. It's all Unicode - I can use any character I like, so why isn't Ruby just returning result from file functions as Unicode?

  • Ruby side: many part of Ruby implementation already uses W version API but some part are not. therefore for consistency it is still ANSI based
  • User side: there's many legacy code which imply ANSI strings

Ruby must migrate to Unicode on some day future, but we haven't done yet.

Updated by zzak (Zachary Scott) almost 5 years ago

  • Status changed from Assigned to Closed

I think the current documentation explains this well, so closing.

/*
 *  call-seq:
 *     Dir.entries( dirname )                -> array
 *     Dir.entries( dirname, encoding: enc ) -> array
 *
 *  Returns an array containing all of the filenames in the given
 *  directory. Will raise a <code>SystemCallError</code> if the named
 *  directory doesn't exist.
 *
 *  The optional <i>enc</i> argument specifies the encoding of the directory.
 *  If not specified, the filesystem encoding is used.
 *
 *     Dir.entries("testdir")   #=> [".", "..", "config.h", "main.rb"]
 *
 */

Updated by thomthom (Thomas Thomassen) almost 5 years ago

As mentioned the documentation for version 2.0 is missing this. SketchUp 2014 embedded Ruby 2.0 and developers usually refer to the documentation for the version embedded. Any chance to see that documentation updated? It's causing a bit of confusion and issues.

Zachary Scott wrote:

I think the current documentation explains this well, so closing.

/*
 *  call-seq:
 *     Dir.entries( dirname )                -> array
 *     Dir.entries( dirname, encoding: enc ) -> array
 *
 *  Returns an array containing all of the filenames in the given
 *  directory. Will raise a <code>SystemCallError</code> if the named
 *  directory doesn't exist.
 *
 *  The optional <i>enc</i> argument specifies the encoding of the directory.
 *  If not specified, the filesystem encoding is used.
 *
 *     Dir.entries("testdir")   #=> [".", "..", "config.h", "main.rb"]
 *
 */

Updated by zzak (Zachary Scott) almost 5 years ago

  • Status changed from Closed to Assigned
  • Backport changed from 2.0.0: DONTNEED, 2.1: DONTNEED to 2.0.0: REQUIRED, 2.1: DONTNEED

In that case, we need to ask for a backport of r42058; this must be done by usa the maintainer of 2.0 series.

Thank you for the report.

#9

Updated by naruse (Yui NARUSE) over 1 year ago

  • Target version deleted (2.2.0)

Also available in: Atom PDF