Project

General

Profile

Actions

Feature #12654

closed

On Windows use UTF-8 as filesystem encoding

Added by davispuh (Dāvis Mosāns) over 8 years ago. Updated about 4 years ago.

Status:
Closed
Assignee:
Target version:
[ruby-core:76693]

Description

Windows (NTFS) supports Unicode and there can be paths/filenames with other characters than current ANSI/OEM codepage can encode.

See attached patch.


Files

Updated by nobu (Nobuyoshi Nakada) over 8 years ago

Try chcp.com 65001.

Updated by davispuh (Dāvis Mosāns) over 8 years ago

Nobuyoshi Nakada wrote:

Try chcp.com 65001.

That's not really needed. For example File.read works with any console's codepage. But Dir.entries and Dir.pwd works only for ANSI paths no matter what console's codepage is set to. There's quite inconsistency between what encodings are used and IMO best solution is just use UTF-8 everywhere.

Anyway this patch is for Ruby 3

Updated by nobu (Nobuyoshi Nakada) over 8 years ago

Dāvis Mosāns wrote:

That's not really needed. For example File.read works with any console's codepage. But Dir.entries and Dir.pwd works only for ANSI paths no matter what console's codepage is set to. There's quite inconsistency between what encodings are used and IMO best solution is just use UTF-8 everywhere.

I think they should be the console's codepage (or "locale" encoding), not UTF-8 always.

Updated by davispuh (Dāvis Mosāns) over 8 years ago

Nobuyoshi Nakada wrote:

Dāvis Mosāns wrote:

That's not really needed. For example File.read works with any console's codepage. But Dir.entries and Dir.pwd works only for ANSI paths no matter what console's codepage is set to. There's quite inconsistency between what encodings are used and IMO best solution is just use UTF-8 everywhere.

I think they should be the console's codepage (or "locale" encoding), not UTF-8 always.

I strongly disagree. WinAPI, PowerShell and cmd supports Unicode independently of used codepage, you can navigate to paths which can't be represented with active codepage. There's really no reason to make such arbitrary limitation. Such limitation would force everyone to use UTF-8 codepage because otherwise Ruby applications won't be able to handle Unicode paths/filenames.

By default cmd opens in OEM codepage and it needs to be specifically changed. Also for example if other applications start Ruby's process with CREATE_NO_WINDOW passed to CreateProcess then Ruby will have OEM codepage or if with DETACHED_PROCESS then it will be ANSI codepage and this isn't easily changeable by parent process.

Codepages are legacy thing and it would cause only more problems and confusion. By using UTF-8 we get full Unicode support and it doesn't matter what is active codepage.

Updated by usa (Usaku NAKAMURA) over 8 years ago

Premises:

  1. We don't introduce such breakage of compatibility until Ruby 3.
  2. At Ruby 3, on Windows, we're planning to use UTF-8 as the default locale.
  3. Ruby 3 will not force users to use UTF-8. Users will be able to choose encoding which they want to use.

The point of the issue is that users cannot choose filesystem encoding.
If filesystem encoding is fixed to UTF-8, it causes other (but similar) problems.

Using locale as filesystem encoding has an advantage.
Users can change locale with -E option.
Then, I vote +1 to nobu's opinion.

Updated by davispuh (Dāvis Mosāns) over 8 years ago

Usaku NAKAMURA wrote:

If filesystem encoding is fixed to UTF-8, it causes other (but similar) problems.

UTF-8 can be easily encoded to any other encoding but opposite isn't always true.

But yeah I agree with other points.

Updated by naruse (Yui NARUSE) almost 5 years ago

  • Target version set to 3.0
  • Assignee set to windows
Actions #8

Updated by nobu (Nobuyoshi Nakada) about 4 years ago

  • Status changed from Open to Closed

Applied in changeset git|5b98b2ce39ed979aec614365a2dc3e1c30052bca.


win32: Use UTF-8 as filesystem encoding [Feature #12654]

Co-Authored-By: Dāvis Mosāns

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0