Bug #21683: IO#each_codepoint do not take care of encoding when IO uses encoding conversion for reading. - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #21683

closed

IO#each_codepoint do not take care of encoding when IO uses encoding conversion for reading.

Bug #21683: IO#each_codepoint do not take care of encoding when IO uses encoding conversion for reading.

Added by YO4 (Yoshinao Muramatsu) 4 months ago. Updated 4 months ago.

Status:

Closed

Assignee:

Target version:

ruby -v:

ruby 3.5.0dev (2025-11-03T10:33:44Z master 0832e954c9) +PRISM [x64-mingw-ucrt]

Backport:

3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN

[ruby-core:123778]

Description

without encoding conversion

irb(main):001> open(File::NULL, 'r') { |f| f.ungetc(%Q[\u{3042}\u{3044}\u{3046}]); f.each_codepoint.map { |c| c.to_s(16) } }
=> ["3042", "3044", "3046"] # => valid

with encoding conversion

irb(main):001> open(File::NULL, 'rt') { |f| f.ungetc(%Q[\u{3042}\u{3044}\u{3046}]); f.each_codepoint.map { |c| c.to_s(16) } }
=> ["e3", "81", "82", "e3", "81", "84", "e3", "81", "86"] # => invalid

prior to ruby 3.4 lacks 6cd98c24fe9aeea3829ac3d554a277f053cec0be (Allow IO#each_codepoint to work with unetc even when encoding conversion active)
using ungetbyte can similarly reproduce this.

irb(main):001> open(File::NULL, 'rt') { |f| f.ungetbyte(%Q[\u{3042}\u{3044}\u{3046}]); p f.each_codepoint.map { |c| c.to_s(16) } }
=> ["e3", "81", "82", "e3", "81", "84", "e3", "81", "86"]

Updated by nobu (Nobuyoshi Nakada) 4 months ago Actions
Copy link
#1

Status changed from Open to Closed

Applied in changeset git|7e37e4e743a1ca1d5d7bbb87cdd9b943e3a4fe1d.

[Bug #21683] Respect reading encoding at each_codepoint

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Bug #21683

IO#each_codepoint do not take care of encoding when IO uses encoding conversion for reading.

Updated by nobu (Nobuyoshi Nakada) 4 months ago Actions
Copy link
#1

Project

General

Profile

Ruby

Custom queries

Bug #21683

IO#each_codepoint do not take care of encoding when IO uses encoding conversion for reading.

Updated by nobu (Nobuyoshi Nakada) 4 months ago ActionsCopy link #1

Updated by nobu (Nobuyoshi Nakada) 4 months ago Actions
Copy link
#1