Actions
Bug #18588
closedruby -e 'p gets' with japanese charactors gets additional invalid leading chars and caught Encoding::InvalidByteSequenceError
Description
Input a line starting with japanese charactor from console, almost every time ruby gets additional invalid leading charactors.¶
Reproduce process¶
R:\ruby32\bin>ruby -e 'p gets'
あ
-e:1:in `gets': "\\xA0" on Windows-31J (Encoding::InvalidByteSequenceError)
from -e:1:in `gets'
from -e:1:in `<main>'
expected result¶
R:\ruby32\bin>ruby -e 'p gets'
あ
"あ"
your ruby version (ruby -v)¶
R:\ruby32\bin>ruby -v
ruby 3.2.0dev (2022-02-16T08:57:04Z master 00c7a0d491) [x64-mswin64_140]
R:\ruby32\bin>ver
Microsoft Windows [Version 10.0.19043.1526]
other observations¶
environment¶
- On command prompt window with Legacy Console mode, this issue NOT occurs.
- On Windows Terminal, this issue occurs.
- On Windows Sandbox(Japanese Locale), this issue occurs.
- RubyInstaller binaries has same issue
C:\src\git>ruby -v
ruby 3.1.0p0 (2021-12-25 revision fb4df44d16) [x64-mingw-ucrt]
C:\src\git>ruby -Eutf-8 -e 'p gets'
あ
-e:1:in `gets': "\\xA0" on Windows-31J (Encoding::InvalidByteSequenceError)
from -e:1:in `gets'
from -e:1:in `<main>'
A line starting with single byte charactor(s) got valid value.¶
R:\ruby32\bin>ruby -e 'p gets'
:あ
":あ\n" # <= valid
external encoding affects¶
- with Windows-31J, second enter key for line input.
R:\ruby32\bin>ruby -EWindows-31J -e 'p gets'
あ
# <= Second enter key required
"\xA0\xFFあ\n" # <= \xA0\xFF is additional chars
charactor variations¶
R:\ruby32\bin>ruby -EWindows-31J -e 'p gets.b'
あ # <= \x{82A0}
"\xA0\xFF\x82\xA0\n"
R:\ruby32\bin>ruby -EWindows-31J -e 'p gets.b'
# <= \x{8140} fullwidth space
"@\x00\x81@\n"
R:\ruby32\bin>ruby -EWindows-31J -e 'p gets.b'
、 # <= \x{8141}
"A\x00\x81A\n"
R:\ruby32\bin>ruby -EWindows-31J -e 'p gets.b'
。 # <= \x{8142}
"B\x00\x81B\n"
sysread got valid value.¶
R:\ruby32\bin>ruby -e 'p STDIN.sysread(1024).force_encoding(Encoding::Windows_31J)'
あ
"\x{82A0}\r\n" # <= valid
STDIN.binmode can not resolv this.¶
R:\ruby32\bin>ruby -e 'STDIN.binmode; p gets.force_encoding(Encoding::Windows_31J)'
あ
# <= Second enter key required
"\xA0\xFF\x{82A0}\r\r\n" # <= invalid
Ruby 3.0 and earlier versions has a different behavior. especialy sysread returns invalid.¶
C:\src\git>ruby -v
ruby 3.0.3p157 (2021-11-24 revision 3fb7d2cadc) [x64-mingw32]
C:\src\git>ruby -Eutf-8 -e 'p gets'
あ
# <= Second enter key required
"\xA0\xFF\x82\xA0\n" # <= exception not occures but invalid value
C:\src\git>ruby -EWindows-31J -e 'p gets'
あ
# <= Second enter key required
"\xA0\xFFあ\n" # <= also invalid value
C:\src\git>ruby -e 'p STDIN.sysread(1024).force_encoding(Encoding::Windows_31J)'
あ
"\xA0\xFF\x{82A0}\r"
conclusion¶
- ruby 3.1/3.2dev gets return invalid vs sysread return valid
- ruby 3.1/3.2dev sysread return valid vs 3.0 sysread return invalid
- The fact that it works fine in legacy console suggests that windows has some issue, but from the previous it looks like ruby can handle it.
Actions
Like0
Like0Like0