Project

General

Profile

Actions

Feature #20959

open

Add a way to get codepage of console.

Feature #20959: Add a way to get codepage of console.

Added by YO4 (Yoshinao Muramatsu) 12 months ago. Updated about 5 hours ago.

Status:
Assigned
Assignee:
Target version:
-
[ruby-core:120279]
Tags:

Description

Abstract

Add a way to retrieve code pages of console.
On Windows, Encoding.find("locale") returns the console codepage.
To prepare for future changes, specify console instead of locale to get the encoding when a console code page is needed.

Background

On Windows, Encoding.find("locale") returns the console codepage.
This is different from locale in other environments. Also the name and content do not seem to match.
In the future, if we change the locale_encoding of the Windows port to the locale codepage of the C runtime library, we need to get the encoding for the console.
Strings received via pipe from cmd.exe or powershell are encoded in the console codepage.
This would be necessary when communicating with other programs via pipes.

Proposal

Make Encoding.find("console") return the encoding that represents the console codepage.

Background(continued)

Since Windows 10, UTF-8 support seems to be enhanced in the commandline environment.

  • build 17134
    ucrt supports UTF-8 locale codepage setlocale(LC_CTYPE, ".utf8")
    Windows support UTF-8 ANSI Codepage(experimental) Beta: Use Unicode UTF-8 for worldwide language support
  • build 18362
    support Set a process code page to UTF-8 via manifest.
  • build 19041
    Time zone name holds in wchar_t internaly, so proper tzname can be obtained regardless of the Windows language setting.

This should not be all.
Through these, I feel that the Microsoft team recommends the use of utf-8. It also seems that Microsoft, which has previously depricated the ANSI version of the API, is treating it as a valid alternative to use in UTF-8.
If it becomes widespread on Windows for libraries to respect the C runtime library locale, ruby would do better to follow it.

Of course, it is good to be able to get the encoding of the console by specifying a console.

Future plan

By reducing differences from other platforms, bugs and extra code are hoped to be reduced.

  • Do setlocale(LC_CTYPE, ".utf8"); in main.c (or refer LC_* environment variables).
  • Encoding.find("locale") returns C runtime library locale.

Since strings obtained from the Windows system have a Unicode code range, the API to obtain fixed UTF-8 encoding remains unchanged.

Discussion

The code page we can get from Windows also has ACP and OEMCP, but are these necessary?
Is it reasonable to get locale_encoding if Encoding.find("console") is called on other platforms?

Updated by nobu (Nobuyoshi Nakada) 12 months ago Actions #1 [ruby-core:120402]

  • Assignee set to windows
  • Target version set to 4.0

Updated by hsbt (Hiroshi SHIBATA) 11 months ago Actions #2

  • Tags set to windows

Updated by hsbt (Hiroshi SHIBATA) 11 months ago Actions #3

  • Tags changed from windows to win

Updated by hsbt (Hiroshi SHIBATA) 9 months ago Actions #4

  • Status changed from Open to Assigned

Updated by hsbt (Hiroshi SHIBATA) 29 days ago Actions #5

  • Target version deleted (4.0)

Updated by naruse (Yui NARUSE) about 10 hours ago Actions #6 [ruby-core:124143]

I think your idea is that Microsoft is pushing more for UTF-8 support in console apps using the Visual C++ runtime, and in the future, setting the locale to UTF-8 might become the usual way. If that happens, the console code page and locale code page could end up not matching, so adding Encoding.find("console") would probably be a good idea.

I totally agree with the first part—it's nice to see Ruby adapting to these updates.
Also, like you said, Windows has a few main encodings that apps deal with:

  • ACP
  • Console code page
  • Console output code page
    And yes, Ruby's Encoding.find("locale") usually uses GetConsoleCP(), because the input encoding follows the console code page.

But if Ruby starts using setlocale(LC_CTYPE, ".utf8"), it would handle outside stuff with UTF-8 or the Wide APIs. In that case, do we really need the console code page as much? For example, right now, Ruby can already output Unicode strings to the console using WriteConsoleW, without worrying about the console output code page.

The code page we can get from Windows also has ACP and OEMCP, but are these necessary?

Encoding.find("filesystem") returns OEMCP because filesystem (VFAT/FAT32) uses OEMCP.

Is it reasonable to get locale_encoding if Encoding.find("console") is called on other platforms?

If Encoding.find("console") is the encoding of the input from console, it will be the encoding of the terminal on Unix.
Usually it is inherited to the application, but as far as I know, there is no way to get it from the application.

Updated by YO4 (Yoshinao Muramatsu) about 5 hours ago Actions #7 [ruby-core:124155]

When I filed #20959, I had #20929 in mind.
Since Windows APIs return time zone names localized based on the user's language settings rather than the system's language settings, it was important to retrieve them using a language-independent encoding.

That issue has been resolved in a different form by #21144.
While setlocale(LC_CTYPE, “.utf8”) seems like the desired direction, its necessity for me has decreased at this point.

On the other hand, cmd.exe's internal commands and PowerShell in its default state use the console's code page.
That is, the following idioms may exist:

dir | ruby -e “puts STDIN.read.force_encoding('locale')”

This will become an obstacle when changing encoding('locale') in the future. This is why we need Encoding.find('console') right now.
Depending on the pipe's counterpart, Encoding.default_external may be preferable in some cases, so both options are important.
On Windows, the C runtime locale for each process is currently inconsistent, so changing the locale encoding for this purpose does not seem useful at present.

Regarding OEMCP, in environments where it is a factor, the console code page should be OEMCP. So it may not need to be considered.
In Windows, "C" locale seems to use ACP in default. Of course, programs that use mbcs also utilize ACP.
Therefore, ACP appears to be in demand. If there were a gem or ffi to obtain ACP, it might become clear.

Actions

Also available in: PDF Atom