Bug #21530
closedIs IO#eof? supposed to always block and read?
Description
I'm not sure whether or not this is expected behavior, but it seems like eof? blocks when called on $stdin.
For example:
if (str = $stdin.gets)
$stderr.puts "read #{str}"
end
if $stdin.eof? # this call waits for input
$stderr.puts "stdin is eof"
end
I think this is kind of odd behavior because if you input a string but do not input a newline, then hit ^D twice, $stdin
should be at EOF, but eof?
will block and wait for input. If you hit ^D a third time, $stdin will be EOF, but if you input a different character it will not be EOF.
Compare this C program:
#include <stdio.h>
#include <stdlib.h>
#define BUF_SIZE 4096
int main(int argc, char *argv[]) {
char buf[BUF_SIZE];
if (fgets(buf, BUF_SIZE, stdin)) {
fprintf(stderr, "read %s\n", buf);
}
if (feof(stdin)) { // Does not block
fprintf(stderr, "stdin is EOF\n");
}
}
If you hit ^D twice with this C program, feof
will return true for stdin
. I would have expected the Ruby program and the C program to behave similarly, but they don't. Is this expected? The documentation indeed says that eof?
will read, but shouldn't the IO be at EOF after the second ^D?
Thank you.
Updated by nobu (Nobuyoshi Nakada) 1 day ago
It has been changed intentionally, AFAIR, to allow read from the tty twice.
Updated by mame (Yusuke Endoh) 1 day ago
The short answer is: Ruby handles EOF in the Pascal style, not the C style.
In C, the FILE
structure has an EOF flag. When a read(2)
syscall returns 0, the EOF flag in the FILE structure is set. In the example provided, if you forcefully interrupt the input for fgets by pressing ^D twice, the EOF flag is set, and a subsequent call to feof
returns true.
On the other hand, in Pascal and Ruby, the IO object itself does not have an EOF flag. Therefore, even if IO#gets
is forcefully interrupted with a double ^D, the IO object does not remember this state, and a subsequent call to IO#eof? will attempt to read again, thus blocking.
This is a trade-off, and neither approach is definitively "correct,", but Ruby's stateless approach has some advantages:
- Simple and robust: There is no hidden state in an IO, which is good itself. It avoids common C bugs related to incorrect
feof()
checks. - Flexible: It works consistently for streams that can grow over time, like sockets or files being appended to (similar to tail -f).
What @nobu (Nobuyoshi Nakada) said is the second one. For example, you can continuously read from standard input or a growing file:
$ ruby -e 'p [1, $stdin.read]; p [2, $stdin.read]'
foo^D^D[1, "foo"]
bar^D^D[2, "bar"]
FYI, a more detailed answer is written in the Japanese book "API design case study" by @akr (Akira Tanaka) who designed Ruby's IO. You may want to read it :-)
https://gihyo.jp/book/2016/978-4-7741-7802-8
1.02 feof関数とIO#eof?メソッド ——過去にEOFに出会ったのか、それとも今現在EOFなのか
- C言語とPascalにおけるファイルの終端
- ユーザにとってわかりやすいファイルの終端
- まとめ
1.04 EOFフラグの除去 ——モードで挙動が変化するのは良くない
- stdioのEOFフラグ
- RubyにおけるEOFフラグ
- EOFフラグの再実装の試み
- まとめ
Updated by tenderlovemaking (Aaron Patterson) about 23 hours ago
- Status changed from Open to Rejected
mame (Yusuke Endoh) wrote in #note-2:
The short answer is: Ruby handles EOF in the Pascal style, not the C style.
In C, the
FILE
structure has an EOF flag. When aread(2)
syscall returns 0, the EOF flag in the FILE structure is set. In the example provided, if you forcefully interrupt the input for fgets by pressing ^D twice, the EOF flag is set, and a subsequent call tofeof
returns true.On the other hand, in Pascal and Ruby, the IO object itself does not have an EOF flag. Therefore, even if
IO#gets
is forcefully interrupted with a double ^D, the IO object does not remember this state, and a subsequent call to IO#eof? will attempt to read again, thus blocking.This is a trade-off, and neither approach is definitively "correct,", but Ruby's stateless approach has some advantages:
- Simple and robust: There is no hidden state in an IO, which is good itself. It avoids common C bugs related to incorrect
feof()
checks.- Flexible: It works consistently for streams that can grow over time, like sockets or files being appended to (similar to tail -f).
What @nobu (Nobuyoshi Nakada) said is the second one. For example, you can continuously read from standard input or a growing file:
$ ruby -e 'p [1, $stdin.read]; p [2, $stdin.read]' foo^D^D[1, "foo"] bar^D^D[2, "bar"]
Excellent. It makes sense. Thank you for the explanation and background information.
FYI, a more detailed answer is written in the Japanese book "API design case study" by @akr (Akira Tanaka) who designed Ruby's IO. You may want to read it :-)
Great! I bought a copy and I'll read it! Thank you!