Project

General

Profile

Actions

Bug #18353

closed

Czech keyboard input encoding on czech Windows

Added by koleq (Ondřej Kurz) 2 months ago. Updated about 2 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x64-mingw32]
[ruby-core:106191]

Description

Inputing czech characters in czech Windows does not work unless "text.force_encoding("CP852")" is used, I would be expecting for this to work seemlesly just like it does in python

This issue also does not happen in WSL (Windows Subsystem for Linux) where is just works without encoding issues.

To test you can run this code and copy the "ěščřžýáíé" and paste it,
you will see the fisrt print works just fine but you input does not.

I do not know if it's reproduceble on another language version of Windows.

Ruby

puts("ěščřžýáíé")
text = gets
# input.force_encoding("CP852") this line fixes the input, but probably not the best solution if other windows languages use another code page.
puts(text)

output:

ěščřžýáíé
ěščřžýáíé
����젡�

"text.encoding" returns "UTF-8"
"text.bytes.inspect" returns "[216, 231, 159, 253, 167, 236, 160, 161, 130, 10]"

Python 3

print("ěščřžýáíé")
text = input()
print(text)

output:

ěščřžýáíé
ěščřžýáíé
ěščřžýáíé

I don't know how to check encoding or return bytes of the current encoding in python.

I was told on Ruby discord that my terminal is misconfigured but that is not the case, it does it in multiple terminals and I can't be expecting users to be changing their terminal settings.

other languages like Python or C# do not seem to have this issue.

I wonder what python does to ge around encoding issues on Czech Windows.

Updated by nobu (Nobuyoshi Nakada) 2 months ago

Seems default external encoding doesn't match.
What does chcp.com command say?
And what does ruby -e 'p Encoding.default_encoding, Encoding.default_internal, Encoding.locale_charmap'?

Updated by koleq (Ondřej Kurz) 2 months ago

nobu (Nobuyoshi Nakada) wrote in #note-1:

Seems default external encoding doesn't match.
What does chcp.com command say?
And what does ruby -e 'p Encoding.default_encoding, Encoding.default_internal, Encoding.locale_charmap'?

at the time of writing this I'm at work but I also have ruby here so here are result from my work pc.

H:\>chcp
Active code page: 852

H:\>ruby -e 'p Encoding.default_encoding, Encoding.default_internal, Encoding.locale_charmap'
-e:1:in `<main>': undefined method `default_encoding' for Encoding:Class (NoMethodError)
Did you mean?  default_internal

H:\>ruby -e 'p Encoding.default_internal, Encoding.locale_charmap'
nil
"CP852"

H:\>ruby -v
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x64-mingw32]

Updated by nobu (Nobuyoshi Nakada) 2 months ago

  • Status changed from Open to Feedback

koleq (Ondřej Kurz) wrote in #note-2:

H:\>ruby -e 'p Encoding.default_encoding, Encoding.default_internal, Encoding.locale_charmap'
-e:1:in `<main>': undefined method `default_encoding' for Encoding:Class (NoMethodError)

Sorry, it's a typo, should be Encoding.default_external.

And is the environment variable RUBYOPT set?

Updated by koleq (Ondřej Kurz) 2 months ago

nobu (Nobuyoshi Nakada) wrote in #note-3:

koleq (Ondřej Kurz) wrote in #note-2:

H:\>ruby -e 'p Encoding.default_encoding, Encoding.default_internal, Encoding.locale_charmap'
-e:1:in `<main>': undefined method `default_encoding' for Encoding:Class (NoMethodError)

Sorry, it's a typo, should be Encoding.default_external.

And is the environment variable RUBYOPT set?

H:\>ruby -e 'p Encoding.default_external, Encoding.default_internal, Encoding.locale_charmap'
#<Encoding:UTF-8>
nil
"CP852"

RUBYOPT enviroment variable is not set, if it was not set by RubyInstaller for Windows, I checked my system and it does not seem to be set. I do not even know what it is, or what it should be.

Actions #5

Updated by nobu (Nobuyoshi Nakada) about 2 months ago

  • Status changed from Feedback to Closed

Applied in changeset git|37cd35aea8afa35476640e454eaf2c53150dc014.


[win32] Transcode input from console [Bug #18353]

On Windows, as the input from console is encoded in the active
code page, convert the input to the internal encoding.

Actions

Also available in: Atom PDF