Bug #20919
openIO#seek and IO#pos= do not clear the character buffer in some cases while transcoding
Description
When transcoding characters, IO#seek and IO#pos= only clear the internal character buffer if IO#getc is called first:
require 'tempfile'
Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind
  f.ungetc('a'.encode('utf-16le'))
  # Character buffer WILL NOT be cleared
  f.seek(2, :SET)
  f.getc       # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end
Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind
  f.ungetc('a'.encode('utf-16le'))
  # Character buffer WILL NOT be cleared
  f.pos = 2
  f.getc       # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end
Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind
  # Added a call to #getc here
  f.getc
  f.ungetc('a'.encode('utf-16le'))
  # Character buffer WILL be cleared now
  f.seek(2, :SET)
  # Same behavior for #pos=
  #f.pos = 2
  f.getc       # => '2'.encode('utf-16le')
end
        
           Updated by javanthropus (Jeremy Bopp) 11 months ago
          Updated by javanthropus (Jeremy Bopp) 11 months ago
          
          
        
        
      
      - Subject changed from IO#seek does not clear the character buffer in some cases while transcoding to IO#seek and IO#pos= do not clear the character buffer in some cases while transcoding
- Description updated (diff)
        
           Updated by javanthropus (Jeremy Bopp) 11 months ago
          Updated by javanthropus (Jeremy Bopp) 11 months ago
          
          
        
        
      
      - Description updated (diff)
        
           Updated by mjrzasa (Maciek Rząsa) 9 months ago
          
          · Edited
          Updated by mjrzasa (Maciek Rząsa) 9 months ago
          
          · Edited
        
        
      
      I've reproduced it without transcoding:
Tempfile.open() do |f|
  f.write('0123456789')
  f.rewind
  f.ungetc('a')
  # Character buffer WILL NOT be cleared
  f.seek(2, :SET)
  f.getc       # => 'a'
end
# => 'a'
        
           Updated by mjrzasa (Maciek Rząsa) 9 months ago
          Updated by mjrzasa (Maciek Rząsa) 9 months ago
          
          
        
        
      
      It works OK with StringIO (unsurprisingly)
StringIO.open() do |f|
  f.write('0123456789')
  f.rewind
  f.ungetc('a')
  # Character buffer WILL NOT be cleared
  f.seek(2)
  f.getc
end
# => "1"
        
           Updated by mjrzasa (Maciek Rząsa) 9 months ago
          Updated by mjrzasa (Maciek Rząsa) 9 months ago
          
          
        
        
      
      I rerun tests on 3.5.0 and it's indeed related to transcoding
puts "Hello dev-ruby! #{RUBY_VERSION}"
require 'tempfile'
Tempfile.open() do |f|
  f.write('0123456789')
  f.rewind
  f.ungetc('a')
  # Character buffer WILL NOT be cleared
  f.seek(2, :SET)
  puts f.getc       # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end
Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind
  f.ungetc('a'.encode('utf-16le'))
  # Character buffer WILL NOT be cleared
  f.seek(2, :SET)
  puts f.getc       # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end
Tempfile.open() do |f|
  f.write('0123456789')
  f.rewind
  f.ungetc('a'.encode('utf-16le'))
  # Character buffer WILL NOT be cleared
  f.seek(2, :SET)
  puts f.getc       # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end
Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind
  f.ungetc('a')
  # Character buffer WILL NOT be cleared
  f.seek(2, :SET)
  puts f.getc       # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end
Hello dev-ruby! 3.5.0
2
a
2
a2
so the issue happened when encoding was set on .open. Also when a non-encoded char was ungetc'-ed, getc` returned two characters.
        
           Updated by mjrzasa (Maciek Rząsa) 9 months ago
          Updated by mjrzasa (Maciek Rząsa) 9 months ago
          
          
        
        
      
      I have a draft of a fix for this one https://github.com/ruby/ruby/pull/12714
        
           Updated by mjrzasa (Maciek Rząsa) 9 months ago
          Updated by mjrzasa (Maciek Rząsa) 9 months ago
          
          
        
        
      
      I believe the fix is ready for review https://github.com/ruby/ruby/pull/12714
Some CI jobs were failing (WebAssembly/Cygwin) but the failures seem not to be related to my changes and they're inconsistent (after rebasing Cygwin passed and WebAsm failed).
        
           Updated by mjrzasa (Maciek Rząsa) 6 months ago
          Updated by mjrzasa (Maciek Rząsa) 6 months ago
          
          
        
        
      
      Folks, could I ask for a review (and potential merge) on the fix of this issue https://github.com/ruby/ruby/pull/12714?