Bug #20919
openIO#seek and IO#pos= do not clear the character buffer in some cases while transcoding
Description
When transcoding characters, IO#seek
and IO#pos=
only clear the internal character buffer if IO#getc
is called first:
require 'tempfile'
Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
f.write('0123456789')
f.rewind
f.ungetc('a'.encode('utf-16le'))
# Character buffer WILL NOT be cleared
f.seek(2, :SET)
f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end
Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
f.write('0123456789')
f.rewind
f.ungetc('a'.encode('utf-16le'))
# Character buffer WILL NOT be cleared
f.pos = 2
f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end
Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
f.write('0123456789')
f.rewind
# Added a call to #getc here
f.getc
f.ungetc('a'.encode('utf-16le'))
# Character buffer WILL be cleared now
f.seek(2, :SET)
# Same behavior for #pos=
#f.pos = 2
f.getc # => '2'.encode('utf-16le')
end
Updated by javanthropus (Jeremy Bopp) 5 months ago
- Subject changed from IO#seek does not clear the character buffer in some cases while transcoding to IO#seek and IO#pos= do not clear the character buffer in some cases while transcoding
- Description updated (diff)
Updated by mjrzasa (Maciek Rząsa) 3 months ago
· Edited
I've reproduced it without transcoding:
Tempfile.open() do |f|
f.write('0123456789')
f.rewind
f.ungetc('a')
# Character buffer WILL NOT be cleared
f.seek(2, :SET)
f.getc # => 'a'
end
# => 'a'
Updated by mjrzasa (Maciek Rząsa) 3 months ago
It works OK with StringIO (unsurprisingly)
StringIO.open() do |f|
f.write('0123456789')
f.rewind
f.ungetc('a')
# Character buffer WILL NOT be cleared
f.seek(2)
f.getc
end
# => "1"
Updated by mjrzasa (Maciek Rząsa) 3 months ago
I rerun tests on 3.5.0 and it's indeed related to transcoding
puts "Hello dev-ruby! #{RUBY_VERSION}"
require 'tempfile'
Tempfile.open() do |f|
f.write('0123456789')
f.rewind
f.ungetc('a')
# Character buffer WILL NOT be cleared
f.seek(2, :SET)
puts f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end
Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
f.write('0123456789')
f.rewind
f.ungetc('a'.encode('utf-16le'))
# Character buffer WILL NOT be cleared
f.seek(2, :SET)
puts f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end
Tempfile.open() do |f|
f.write('0123456789')
f.rewind
f.ungetc('a'.encode('utf-16le'))
# Character buffer WILL NOT be cleared
f.seek(2, :SET)
puts f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end
Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
f.write('0123456789')
f.rewind
f.ungetc('a')
# Character buffer WILL NOT be cleared
f.seek(2, :SET)
puts f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end
Hello dev-ruby! 3.5.0
2
a
2
a2
so the issue happened when encoding was set on .open
. Also when a non-encoded char was ungetc'-ed,
getc` returned two characters.
Updated by mjrzasa (Maciek Rząsa) 3 months ago
I have a draft of a fix for this one https://github.com/ruby/ruby/pull/12714
Updated by mjrzasa (Maciek Rząsa) 3 months ago
I believe the fix is ready for review https://github.com/ruby/ruby/pull/12714
Some CI jobs were failing (WebAssembly/Cygwin) but the failures seem not to be related to my changes and they're inconsistent (after rebasing Cygwin passed and WebAsm failed).
Updated by mjrzasa (Maciek Rząsa) 21 days ago
Folks, could I ask for a review (and potential merge) on the fix of this issue https://github.com/ruby/ruby/pull/12714?