Project

General

Profile

Actions

Bug #21683

open

IO#each_codepoint do not take care of encoding when IO uses encoding conversion for reading.

Bug #21683: IO#each_codepoint do not take care of encoding when IO uses encoding conversion for reading.

Added by YO4 (Yoshinao Muramatsu) about 13 hours ago.

Status:
Open
Assignee:
-
Target version:
-
ruby -v:
ruby 3.5.0dev (2025-11-03T10:33:44Z master 0832e954c9) +PRISM [x64-mingw-ucrt]
[ruby-core:123778]

Description

without encoding conversion

irb(main):001> open(File::NULL, 'r') { |f| f.ungetc(%Q[\u{3042}\u{3044}\u{3046}]); f.each_codepoint.map { |c| c.to_s(16) } }
=> ["3042", "3044", "3046"] # => valid

with encoding conversion

irb(main):001> open(File::NULL, 'rt') { |f| f.ungetc(%Q[\u{3042}\u{3044}\u{3046}]); f.each_codepoint.map { |c| c.to_s(16) } }
=> ["e3", "81", "82", "e3", "81", "84", "e3", "81", "86"] # => invalid

prior to ruby 3.4 lacks 6cd98c24fe9aeea3829ac3d554a277f053cec0be (Allow IO#each_codepoint to work with unetc even when encoding conversion active)
using ungetbyte can similarly reproduce this.

irb(main):001> open(File::NULL, 'rt') { |f| f.ungetbyte(%Q[\u{3042}\u{3044}\u{3046}]); p f.each_codepoint.map { |c| c.to_s(16) } }
=> ["e3", "81", "82", "e3", "81", "84", "e3", "81", "86"]

No data to display

Actions

Also available in: PDF Atom