Bug #20889
openIO#ungetc and IO#ungetbyte should not cause IO#pos to report an inaccurate position
Description
require 'tempfile'
Tempfile.open(encoding: 'utf-8') do |f|
f.write('0123456789')
f.rewind
f.ungetbyte(93)
f.pos # => -1; negative value is surprising!
end
Tempfile.open(encoding: 'utf-8') do |f|
f.write('0123456789')
f.rewind
f.ungetc('a'.encode('utf-8'))
f.pos # => -1; similar to the ungetbyte case
end
Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
f.write('0123456789')
f.rewind
f.ungetc('a'.encode('utf-16le'))
f.pos # => 0; maybe should be -2 to match the previous ungetc case?
end
It doesn't seem logical that IO#pos
should ever be affected by IO#ungetc
or IO#ungetbyte
. The pushed characters or bytes aren't really in the stream source. The value of IO#pos
implies that jumping directly to that position via IO#seek
and reading from there would return the same character or byte that was pushed, but the pushed characters or bytes are lost when the operation to seek in the stream is performed. In the case where IO#pos
is a negative value, attempting to seek to that position actually raises an exception.
In the IO#ungetc
with character conversion case above, it seems unreasonable to make IO#pos
report an even less correct position. In that case, the position would need to be adjusted by 2 bytes in reverse due to the internal encoding of the stream, but that is completely inconsistent with the behavior of IO#pos
when reading from the stream normally where it reports the underlying stream's byte position and not the number of transcoded bytes that have been read:
require 'tempfile'
Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
f.write('0123456789')
f.rewind
f.getc.bytesize # => 2; due to the internal encoding of the stream
f.pos # => 1; reports actual bytes read from the stream, not transcoded bytes
end
Attempting to use IO#pos
when there are characters or bytes pushed into the read buffer by way of IO#ungetc
or IO#ungetbyte
should result in one of the following behaviors:
- Raise and exception
- Return the stream's position, clearing the read buffer entirely
- Return the stream's position, ignoring the pushed characters or bytes, and produce a warning
No data to display