Bug #21686
openIn combination with IO#ungetbyte, the write position may become unpredictable.
Description
In the current implementation, using IO#ungetbyte can cause IO#pos to become negative.
Writing to the same file descriptor in this state will result in unexpected write positions.
require 'tempfile'
Tempfile.open do |f|
f.write("0123456789")
f.rewind
f.getbyte # rbuf filled
f.ungetbyte("-") # f.pos => 0
f.ungetbyte("-") # f.pos => -1
f.write("x") # seek to -1 failed in io_unread
f.rewind
p f.read # => "0123456789x", wrote at the end, questionable
end
When internal io_unread() in io.c is called to prepare for a write operation while the actual file position is advancing due to reading,
lseek fails because it attempts to reset to a negative file position.
IO#seek has a similar issue.
Tempfile.open do |f|
f.write("0123456789")
f.rewind
f.getbyte # rbuf filled
f.ungetbyte("-") # f.pos => 0
f.ungetbyte("-") # f.pos => -1
f.seek(0, File::SEEK_CUR)
p f.pos # => 10, this points to the last of rbuf.
end
I am preparing the PR, but I understand that there is discussion about the position where the write occurs and the possibility of the pos becoming negative.
Updated by YO4 (Yoshinao Muramatsu) about 20 hours ago
I made PR #15204
Changed and Unchanged¶
The behavior of this PR is listed below.
Since IO#ungetc has a different file position behavior when combined with encoding conversion, IO#getbyte is used for explanation.
-
IO#ungetbytemoves the file position. This is to restore the file position as much as possible after executing IO#getbyte. -
After
IO#ungetbyteis invoked, the file position may become negative.
This is to avoid situations where the file position does not advance even after multiple getbyte calls, by restricting it to non-negative values. -
IO#posalways retains the buffer. Flushing the buffer when the file position is negative will not allow the next call to return the same file position. -
IO#seekmay raise an exception ifself.seek(0, File::SEEK_CUR)* is called. This occurs when the file position is negative.
Impact of Behavior Changes¶
Regarding IO#pos, since the typical use case for IO#ungetbyte is to push back the same data that was read, buffer retention or reset will yield identical results in this scenario.
The change to IO#seek is considered acceptable because, in the above typical use case, the file position never becomes negative, and the previous behavior is deemed of no practical value.
Updated by javanthropus (Jeremy Bopp) about 17 hours ago
#20919 may be related. If the PR for this issue is accepted, it likely determines how that issue should be addressed. There's already a PR that is several months old associated with it.
Updated by YO4 (Yoshinao Muramatsu) about 6 hours ago
As proposed in this issue, resolving this problem required determining a solution strategy across multiple methods with considering compatibility.
While I'm only an external contributor and not a core team member, I'll boldly say that the patch appears to merely mask the problem.
From my perspective, that PR seemed likely to cause issues by resetting pointers while leaving the encoding converter's state untouched. However, I hesitated to speak up because I wanted to find the actual conditions where it would cause problems.
As reviewers, they were required to be even more precise in their statements, and when solutions were unclear, they likely couldn't offer careless opinions.
We still need to explore another solution, like javanthropus's proposals in Bug #20889.
I hope my suggestion helps clarify the issues and move the case forward.