Project

General

Profile

Actions

Bug #20889

open

IO#ungetc and IO#ungetbyte should not cause IO#pos to report an inaccurate position

Added by javanthropus (Jeremy Bopp) about 20 hours ago.

Status:
Open
Assignee:
-
Target version:
-
ruby -v:
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) [x86_64-linux]
[ruby-core:119895]

Description

require 'tempfile'

Tempfile.open(encoding: 'utf-8') do |f|
  f.write('0123456789')
  f.rewind
  f.ungetbyte(93)
  f.pos       # => -1; negative value is surprising!
end

Tempfile.open(encoding: 'utf-8') do |f|
  f.write('0123456789')
  f.rewind
  f.ungetc('a'.encode('utf-8'))
  f.pos       # => -1; similar to the ungetbyte case
end

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind
  f.ungetc('a'.encode('utf-16le'))
  f.pos       # => 0; maybe should be -2 to match the previous ungetc case?
end

It doesn't seem logical that IO#pos should ever be affected by IO#ungetc or IO#ungetbyte. The pushed characters or bytes aren't really in the stream source. The value of IO#pos implies that jumping directly to that position via IO#seek and reading from there would return the same character or byte that was pushed, but the pushed characters or bytes are lost when the operation to seek in the stream is performed. In the case where IO#pos is a negative value, attempting to seek to that position actually raises an exception.

In the IO#ungetc with character conversion case above, it seems unreasonable to make IO#pos report an even less correct position. In that case, the position would need to be adjusted by 2 bytes in reverse due to the internal encoding of the stream, but that is completely inconsistent with the behavior of IO#pos when reading from the stream normally where it reports the underlying stream's byte position and not the number of transcoded bytes that have been read:

require 'tempfile'

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind
  f.getc.bytesize # => 2; due to the internal encoding of the stream
  f.pos           # => 1; reports actual bytes read from the stream, not transcoded bytes
end

Attempting to use IO#pos when there are characters or bytes pushed into the read buffer by way of IO#ungetc or IO#ungetbyte should result in one of the following behaviors:

  1. Raise and exception
  2. Return the stream's position, clearing the read buffer entirely
  3. Return the stream's position, ignoring the pushed characters or bytes, and produce a warning

No data to display

Actions

Also available in: Atom PDF

Like0