Project

General

Profile

Bug #14400

IO#ungetc and IO#ungetbyte documentation is inconsistent with the behavior

Added by Eregon (Benoit Daloze) 24 days ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.6.0dev (2018-01-25 trunk 62035) [x86_64-linux]
[ruby-core:85108]

Description

The documentation of IO#ungetc states:

Pushes back one character (passed as a parameter) onto ios, such that a
subsequent buffered character read will return it. Only one character may be
pushed back before a subsequent read operation (that is, you will be able to
read only the last of several characters that have been pushed back). Has no
effect with unbuffered reads (such as IO#sysread).

And similar for IO#ungetbyte:

Pushes back bytes (passed as a parameter) onto ios, such that a
subsequent buffered read will return it. Only one byte may be pushed back
before a subsequent read operation (that is, you will be able to read only the
last of several bytes that have been pushed back). Has no effect with
unbuffered reads (such as IO#sysread).

The part about only one byte/character is inconsistent with the actual behavior,
most notably because both of these methods accept a String with multiple characters as argument.

STDIN.ungetc "Hello World!"
STDIN.read 12 #=> "Hello World!"

STDIN.ungetbyte "Foo Bar"
STDIN.read 7 #=> "Foo Bar"

(There are even specs for it:
https://github.com/ruby/spec/blob/7fa22023d69620ea3ff4d0ed2eb71fd7b02dd950/core/io/ungetc_spec.rb#L98
https://github.com/ruby/spec/blob/7fa22023d69620ea3ff4d0ed2eb71fd7b02dd950/core/io/ungetbyte_spec.rb#L21)

that is, you will be able to read only the last of several characters that have been pushed back

is contradicting what happens.

The behavior with large Strings is confusing.
It seems to allow arbitrarily large strings (but only if there was not a ungetbyte already/the buffer was empty?).

$ pry
[1] pry(main)> STDIN.ungetbyte "a"*10_000
=> nil
[2] pry(main)> STDIN.ungetbyte "a"*10_000
IOError: ungetbyte failed

$ pry
[1] pry(main)> STDIN.ungetbyte "a"*100_000
=> nil
[2] pry(main)> STDIN.ungetbyte "a"*100_000
IOError: ungetbyte failed
from (pry):2:in `ungetbyte'

$ pry
[1] pry(main)> STDIN.ungetbyte "a"*100_000
=> nil
[2] pry(main)> STDIN.read(100_000).size
=> 100000
[3] pry(main)> STDIN.ungetbyte "a"*100_000
=> nil
[4] pry(main)> STDIN.read(100_000).size
=> 100000

And it's not as simple as if two consecutive ungetbyte were forbidden:

$ pry
[1] pry(main)> STDIN.ungetbyte "a"*10_000_000
=> nil
[2] pry(main)> STDIN.ungetbyte "a"
IOError: ungetbyte failed
from (pry):2:in `ungetbyte'

$ pry
[1] pry(main)> STDIN.ungetbyte "a"
=> nil
[2] pry(main)> STDIN.ungetbyte "a"
=> nil

So how are those methods supposed to behave?
Can the documentation be updated to match the behavior and/or the behavior be fixed to be simpler?

I also wonder when those methods are useful.
There seems to be very few usages in the stdlib.
Maybe they should just be removed?
It seems easy to make a custom IO wrapper/buffer supporting pushing characters/bytes back.

Also available in: Atom PDF