Bug #4097

Unexpected result of STDIN.read on Windows

Added by Heesob Park over 3 years ago. Updated over 2 years ago.

[ruby-core:33460]
Status:Third Party's Issue
Priority:Normal
Assignee:Masaya Tarui
Category:core
Target version:1.9.3
ruby -v:ruby 1.9.3dev (2010-11-28 trunk 29965) [i386-mswin32_90] Backport:

Description

=begin
On Ruby 1.9.x, in case of non-ASCII input, STDIN.read(n) returns some garbage attached string.

C:\work>ruby -ve 'a=STDIN.read(10);p a;p a.length'
ruby 1.9.3dev (2010-11-28 trunk 29965) [i386-mswin32_90]
가나다라abcd
"\xB0\xA1\xB3\xAA\xB4\xD9\xB6\xF3ab\x00\x00\xB8t"
14

On the other hand, Ruby 1.8.6 works fine.

C:\work>ruby -ve 'a=STDIN.read(10);p a;p a.length'
ruby 1.8.6 (2010-02-04 patchlevel 398) [i386-mingw32]
가나다라abcd
"\260\241\263\252\264\331\266\363ab"
10
=end

History

#1 Updated by Usaku NAKAMURA over 3 years ago

=begin
Hello,

In message " [Ruby 1.9-Bug#4097][Open] Unexpected result of STDIN.read on Windows"
on Nov.29,2010 18:26:13, redmine@ruby-lang.org wrote:

On Ruby 1.9.x, in case of non-ASCII input, STDIN.read(n) returns some garbage attached string.

What version of Windows do you use?
I guess you use Korean version of 32bit XP, don't you?

Tarui-san tested many cases on Japanese version of 32bit XP,
and has found that this seems to be a bug of Windows itself...

Regards,
--
U.Nakamura usa@garbagecollect.jp

=end

#2 Updated by Luis Lavena over 3 years ago

=begin
On Mon, Nov 29, 2010 at 8:44 AM, U.Nakamura usa@garbagecollect.jp wrote:

Hello,

In message " [Ruby 1.9-Bug#4097][Open] Unexpected result of STDIN.read on Windows"
   on Nov.29,2010 18:26:13, redmine@ruby-lang.org wrote:

On Ruby 1.9.x, in case of non-ASCII input, STDIN.read(n) returns some garbage attached string.

What version of Windows do you use?
I guess you use Korean version of 32bit XP, don't you?

Tarui-san tested many cases on Japanese version of 32bit XP,
and has found that this seems to be a bug of Windows itself...

Perhaps is associated to the codepage used to input those characters?

I noticed that accented characters do not work for builtin cmd.exe
operations under chcp 437 or 850 for example. But works fine under
1252.

Unicode characters seems to work too under chcp 65001, but not with Ruby.

--
Luis Lavena
AREA 17
-
Perfection in design is achieved not when there is nothing more to add,
but rather when there is nothing more to take away.
Antoine de Saint-Exupéry

=end

#3 Updated by Heesob Park over 3 years ago

=begin
Hi,

2010/11/29 U.Nakamura usa@garbagecollect.jp:

Hello,

In message " [Ruby 1.9-Bug#4097][Open] Unexpected result of STDIN.read on Windows"
   on Nov.29,2010 18:26:13, redmine@ruby-lang.org wrote:

On Ruby 1.9.x, in case of non-ASCII input, STDIN.read(n) returns some garbage attached string.

What version of Windows do you use?
I guess you use Korean version of 32bit XP, don't you?

Yes, you are right.

Tarui-san tested many cases on Japanese version of 32bit XP,
and has found that this seems to be a bug of Windows itself...

I can see this bug on 32bit XP and 2003.
On Windows 7, this bug not appears.

Regards,
Park Heesob

=end

#4 Updated by Masaya Tarui over 3 years ago

=begin
Hello,

WindowsXP seems have a bug at read functions under multibyte console inputs.
I found a issue of coming from same bug of Windows. :-(

does anybody have a good workaround idea ?

ruby -ve 'a=STDIN.read(6);p [a,a.length];a=STDIN.read(2);p [a,a.length];'
ruby 1.9.3dev (2010-11-30 trunk 29978) [i386-mswin32_100]
あいうえおaiueo
["\x82\xA0\x82\xA2\x82\xA4", 6]
["iu", 2]

On Ruby 1.9.x, in case of non-ASCII input, STDIN.read(n) returns some garbage attached string.

C:\work>ruby -ve 'a=STDIN.read(10);p a;p a.length'
ruby 1.9.3dev (2010-11-28 trunk 29965) [i386-mswin32_90]
가나다라abcd
"\xB0\xA1\xB3\xAA\xB4\xD9\xB6\xF3ab\x00\x00\xB8t"
14

Regards,
Masaya TARUI

=end

#5 Updated by Heesob Park over 3 years ago

=begin
Hi,

2010/11/30 Masaya TARUI tarui@prx.jp:

Hello,

WindowsXP seems have a bug at read functions under multibyte console inputs.
I found a issue of coming from same bug of Windows. :-(

does anybody have a good workaround idea ?

ruby -ve 'a=STDIN.read(6);p [a,a.length];a=STDIN.read(2);p [a,a.length];'
ruby 1.9.3dev (2010-11-30 trunk 29978) [i386-mswin32_100]
あいうえおaiueo
["\x82\xA0\x82\xA2\x82\xA4", 6]
["iu", 2]

On Ruby 1.9.x, in case of non-ASCII input, STDIN.read(n) returns some garbage attached string.

C:\work>ruby -ve 'a=STDIN.read(10);p a;p a.length'
ruby 1.9.3dev (2010-11-28 trunk 29965) [i386-mswin32_90]
가나다라abcd
"\xB0\xA1\xB3\xAA\xB4\xD9\xB6\xF3ab\x00\x00\xB8t"
14

I found ReadFile on console reads data per charachacter not byte.

Here is a workaround patch.

--- win32.c 2010-11-30 12:02:33.000000000 +0900
+++ win32.c.new 2010-11-30 12:01:46.000000000 +0900
@@ -5091,6 +5091,34 @@
pol = &ol;
}

  • if (isconsole(osfhnd(fd)) && len!=16384) {
  • int len2=0;
  • while(len2<len) {
  • if (!ReadFile((HANDLE)_osfhnd(fd), buf, 1, &read, pol)) {
  • err = GetLastError();
  • if (err != ERRORIOPENDING) {
  • if (pol) CloseHandle(ol.hEvent);
  • if (err == ERRORACCESSDENIED)
  • errno = EBADF;
  • else if (err == ERRORBROKENPIPE || err == ERRORHANDLEEOF) { + MTHREADONLY(LeaveCriticalSection(&pioinfo(fd)->lock));
  • return 0;
  • }
  • else
  • errno = maperrno(err); + + MTHREADONLY(LeaveCriticalSection(&_pioinfo(fd)->lock));
  • return -1;
  • }
  • }
  • len2 += read;
  • buf = (char *)buf + read;
  • }
  • ret += len;
  • if (size > 0)
  • goto retry; +
  • } else { if (!ReadFile((HANDLE)osfhnd(fd), buf, len, &read, pol)) { err = GetLastError(); if (err != ERRORIO_PENDING) { @@ -5154,6 +5182,7 @@ if (size > 0) goto retry; }
  • }

    MTHREADONLY(LeaveCriticalSection(&pioinfo(fd)->lock));

    Regards,
    Park Heesob

=end

#6 Updated by Usaku NAKAMURA over 3 years ago

  • Status changed from Open to Assigned
  • Assignee set to Masaya Tarui

=begin

=end

#7 Updated by Hiroshi Nakamura almost 3 years ago

  • Target version changed from 2.0.0 to 1.9.3

#8 Updated by Motohiro KOSAKI almost 3 years ago

Tarui-san, ping?

#9 Updated by Masaya Tarui over 2 years ago

  • Status changed from Assigned to Third Party's Issue

Sorry for a delayed response.

Now, STDIN.read(n) under multibyte console inputs might return n+1 bytes String.(by r29980 and r30280)
Multibyte character is never split in read of MS runtime.

And, it is difficult to do STDIN.ungetc last byte because of lapping C-level read function.

I think that
1) it's windows bug,
2) we don't have an api base workaround ,
and
3) we can apply a workaround to application.

So, I change status to 3rd party's issue.
However, the patch is always a welcome.

Thanks,
Masaya TARUI

Also available in: Atom PDF