Bug #21634
openCombining read(1) with eof? causes dropout of results unexpectedly on Windows.
Description
On Windows, when reading a file containing EOF(\x1A), using read(1) with IO#eof? causes unexpected dropout of results.
irb(main):001> IO.binwrite("txt", "abcd\x1A")
=> 5
irb(main):002> open("txt", "r") { p _1.read(1) until _1.eof? }; # works fine
"a"
"b"
"c"
"d"
"\x1A"
irb(main):003> open("txt", "rt") { p _1.read(1) until _1.eof? }; # has failure
"b"
"d"
irb(main):004>
The problem disappeared when I commented out one of the following lines (though this will break other things).
- previous_mode = set_binary_mode_with_seek_cur(fptr); in io_read()
- flush_before_seek(fptr, false); in set_binary_mode_with_seek_cur(()
- io_unread(fptr, discard_rbuf); in flush_before_seek()
Within io_unread(), rbuf.len should have changed as 5, 4, 3,... but instead changed as 4, 2,(end).
Since inconsistencies already exist at this point, the problem appears to originate elsewhere.
I found this in ruby master but the same issue was found at least in ruby-1.9.3-p551.
Updated by YO4 (Yoshinao Muramatsu) 20 days ago
The IO that has mode_enc "rt" will read with O_BINARY but opend with O_TEXT.
This leads fill_cbuf using O_TEXT at rb_io_eof unexpectedly.
I made PR #18410.
Updated by nobu (Nobuyoshi Nakada) 20 days ago
YO4 (Yoshinao Muramatsu) wrote in #note-1:
The IO that has mode_enc "rt" will read with O_BINARY but opend with O_TEXT.
This leads fill_cbuf using O_TEXT at rb_io_eof unexpectedly.I made PR #18410.
Thank you for the patch.
IO#eof? behavior seems changing.
With "txt" file that its content is "abcd\x1A\r\n",
the current IO#eof? returns true at "\x1A", and further more read stops there.
> .\miniruby.exe -v -e "open('txt', 'rt') {|f| p f.read(4); p f.eof?; p f.read}"
ruby 3.5.0dev (2025-10-11T06:00:21Z master e8f0e1423b) +PRISM [arm64-mswin64_140]
"abcd"
true
""
However, with your PR, it seems simply "\x1A" is not considered EOF.
> .\miniruby-new.exe -v -e "open('txt', 'rt') {|f| p f.read(4); p f.eof?; p f.read}"
ruby 3.5.0dev (2025-10-11T06:05:13Z eof-and-fpos 6e568e9cb2) +PRISM [arm64-mswin64_140]
last_commit=Set O_BINARY correctly at rb_io_eof()
"abcd"
false
"\u001A\n"
Updated by YO4 (Yoshinao Muramatsu) 17 days ago
That is interesting behavior I hadn't considered.
My understanding is that with 'rt' uses universal newline conversion and
0x1A is treated as a regular character, on both Windows and other platforms.
For example:
>./miniruby -v -e "open('txt', 'rt') { |f| p f.read(4); p f.eof?; p f.read(1); f.rewind; p f.readline }"
ruby 3.5.0dev (2025-10-10T10:12:35Z master 4bf1475833) +PRISM [x64-mingw-ucrt]
"abcd"
true
nil
"abcd\u001A\n" # => 0x1A is read as regular character
On Windows, there is little need to use universal newline conversion alone,
but the same applies when using encoding conversion. This might slightly expand the impact.
>ruby -v -e "open('txt', 'r:CP932:UTF-8') { |f| p f.read(4); p f.eof?; p f.read(1); f.rewind; p f.readline }"
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [x64-mingw-ucrt]
"abcd"
true
nil
"abcd\u001A\n"
The behavior of IO#readline is as specified, and the existing behavior you pointed out seems to be unintended.
As a future goal, I want to eliminate dependencies on the Microsoft C runtime's read() function,
so I want to eliminate any existing unexplained behavior beforehand.
In this issue, I was focusing on the file position but my patch also affected the behavior at 0x1A for IO#eof?
Unfortunately, since the processes affected by the patch appear to fall outside the use case
(eg. character read stream with binary read method),
I am unable to determine whether any scripts exist that would be impacted by the changes in this patch.
To move forward, is there anything I can do? I would appreciate any advice.