Project

General

Profile

Actions

Bug #20919

open

IO#seek and IO#pos= do not clear the character buffer in some cases while transcoding

Added by javanthropus (Jeremy Bopp) 5 months ago. Updated 21 days ago.

Status:
Open
Assignee:
-
Target version:
-
ruby -v:
ruby 3.4.0dev (2024-11-28T12:38:16Z master 3af1a04741) +PRISM [x86_64-linux]
[ruby-core:120043]

Description

When transcoding characters, IO#seek and IO#pos= only clear the internal character buffer if IO#getc is called first:

require 'tempfile'

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.ungetc('a'.encode('utf-16le'))

  # Character buffer WILL NOT be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.ungetc('a'.encode('utf-16le'))

  # Character buffer WILL NOT be cleared
  f.pos = 2

  f.getc       # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  # Added a call to #getc here
  f.getc

  f.ungetc('a'.encode('utf-16le'))

  # Character buffer WILL be cleared now
  f.seek(2, :SET)
  # Same behavior for #pos=
  #f.pos = 2

  f.getc       # => '2'.encode('utf-16le')
end
Actions #1

Updated by javanthropus (Jeremy Bopp) 5 months ago

  • Subject changed from IO#seek does not clear the character buffer in some cases while transcoding to IO#seek and IO#pos= do not clear the character buffer in some cases while transcoding
  • Description updated (diff)
Actions #2

Updated by javanthropus (Jeremy Bopp) 5 months ago

  • Description updated (diff)

Updated by mjrzasa (Maciek Rząsa) 3 months ago · Edited

I've reproduced it without transcoding:

Tempfile.open() do |f|
  f.write('0123456789')
  f.rewind
  f.ungetc('a')
  # Character buffer WILL NOT be cleared
  f.seek(2, :SET)
  f.getc       # => 'a'
end
# => 'a'

Updated by mjrzasa (Maciek Rząsa) 3 months ago

It works OK with StringIO (unsurprisingly)

StringIO.open() do |f|
  f.write('0123456789')
  f.rewind
  f.ungetc('a')
  # Character buffer WILL NOT be cleared
  f.seek(2)
  f.getc
end
# => "1"

Updated by mjrzasa (Maciek Rząsa) 3 months ago

I rerun tests on 3.5.0 and it's indeed related to transcoding

puts "Hello dev-ruby! #{RUBY_VERSION}"

require 'tempfile'
Tempfile.open() do |f|
  f.write('0123456789')
  f.rewind
  f.ungetc('a')
  # Character buffer WILL NOT be cleared
  f.seek(2, :SET)
  puts f.getc       # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.ungetc('a'.encode('utf-16le'))

  # Character buffer WILL NOT be cleared
  f.seek(2, :SET)

  puts f.getc       # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end

Tempfile.open() do |f|
  f.write('0123456789')
  f.rewind

  f.ungetc('a'.encode('utf-16le'))

  # Character buffer WILL NOT be cleared
  f.seek(2, :SET)

  puts f.getc       # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.ungetc('a')

  # Character buffer WILL NOT be cleared
  f.seek(2, :SET)

  puts f.getc       # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end

Hello dev-ruby! 3.5.0
2
a
2
a2

so the issue happened when encoding was set on .open. Also when a non-encoded char was ungetc'-ed, getc` returned two characters.

Updated by mjrzasa (Maciek Rząsa) 3 months ago

I have a draft of a fix for this one https://github.com/ruby/ruby/pull/12714

Updated by mjrzasa (Maciek Rząsa) 3 months ago

I believe the fix is ready for review https://github.com/ruby/ruby/pull/12714
Some CI jobs were failing (WebAssembly/Cygwin) but the failures seem not to be related to my changes and they're inconsistent (after rebasing Cygwin passed and WebAsm failed).

Updated by mjrzasa (Maciek Rząsa) 21 days ago

Folks, could I ask for a review (and potential merge) on the fix of this issue https://github.com/ruby/ruby/pull/12714?

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0