Project

General

Profile

Actions

Bug #18588

closed

ruby -e 'p gets' with japanese charactors gets additional invalid leading chars and caught Encoding::InvalidByteSequenceError

Added by YO4 (Yoshinao Muramatsu) about 2 years ago. Updated about 2 years ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-dev:51165]

Description

Input a line starting with japanese charactor from console, almost every time ruby gets additional invalid leading charactors.

Reproduce process

R:\ruby32\bin>ruby -e 'p gets'
あ
-e:1:in `gets': "\\xA0" on Windows-31J (Encoding::InvalidByteSequenceError)
        from -e:1:in `gets'
        from -e:1:in `<main>'

expected result

R:\ruby32\bin>ruby -e 'p gets'
あ
"あ"

your ruby version (ruby -v)

R:\ruby32\bin>ruby -v
ruby 3.2.0dev (2022-02-16T08:57:04Z master 00c7a0d491) [x64-mswin64_140]

R:\ruby32\bin>ver

Microsoft Windows [Version 10.0.19043.1526]

other observations

environment

  • On command prompt window with Legacy Console mode, this issue NOT occurs.
  • On Windows Terminal, this issue occurs.
  • On Windows Sandbox(Japanese Locale), this issue occurs.
  • RubyInstaller binaries has same issue
C:\src\git>ruby -v
ruby 3.1.0p0 (2021-12-25 revision fb4df44d16) [x64-mingw-ucrt]

C:\src\git>ruby -Eutf-8 -e 'p gets'
あ
-e:1:in `gets': "\\xA0" on Windows-31J (Encoding::InvalidByteSequenceError)
        from -e:1:in `gets'
        from -e:1:in `<main>'

A line starting with single byte charactor(s) got valid value.

R:\ruby32\bin>ruby -e 'p gets'
:あ
":あ\n"  # <= valid

external encoding affects

  • with Windows-31J, second enter key for line input.
R:\ruby32\bin>ruby -EWindows-31J -e 'p gets'
あ
   # <= Second enter key required
"\xA0\xFFあ\n" # <= \xA0\xFF is additional chars

charactor variations

R:\ruby32\bin>ruby -EWindows-31J -e 'p gets.b'
あ  # <= \x{82A0}

"\xA0\xFF\x82\xA0\n"

R:\ruby32\bin>ruby -EWindows-31J -e 'p gets.b'
   # <= \x{8140} fullwidth space

"@\x00\x81@\n"

R:\ruby32\bin>ruby -EWindows-31J -e 'p gets.b'
、  # <= \x{8141}

"A\x00\x81A\n"

R:\ruby32\bin>ruby -EWindows-31J -e 'p gets.b'
。  # <= \x{8142}

"B\x00\x81B\n"

sysread got valid value.

R:\ruby32\bin>ruby -e 'p STDIN.sysread(1024).force_encoding(Encoding::Windows_31J)'
あ
"\x{82A0}\r\n" # <= valid

STDIN.binmode can not resolv this.

R:\ruby32\bin>ruby -e 'STDIN.binmode; p gets.force_encoding(Encoding::Windows_31J)'
あ
   # <= Second enter key required
"\xA0\xFF\x{82A0}\r\r\n" # <= invalid

Ruby 3.0 and earlier versions has a different behavior. especialy sysread returns invalid.

C:\src\git>ruby -v
ruby 3.0.3p157 (2021-11-24 revision 3fb7d2cadc) [x64-mingw32]

C:\src\git>ruby -Eutf-8 -e 'p gets'
あ
   # <= Second enter key required
"\xA0\xFF\x82\xA0\n"  # <= exception not occures but invalid value
C:\src\git>ruby -EWindows-31J -e 'p gets'
あ
   # <= Second enter key required
"\xA0\xFFあ\n"  # <= also invalid value
C:\src\git>ruby -e 'p STDIN.sysread(1024).force_encoding(Encoding::Windows_31J)'
あ
"\xA0\xFF\x{82A0}\r"

conclusion

  1. ruby 3.1/3.2dev gets return invalid vs sysread return valid
  2. ruby 3.1/3.2dev sysread return valid vs 3.0 sysread return invalid
  3. The fact that it works fine in legacy console suggests that windows has some issue, but from the previous it looks like ruby can handle it.
Actions

Also available in: Atom PDF

Like0
Like0Like0