Bug #3482: StringScanner#pos returns wrong character position if used with multibyte chars - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #3482

closed

StringScanner#pos returns wrong character position if used with multibyte chars

Bug #3482: StringScanner#pos returns wrong character position if used with multibyte chars

Added by Quintus (Marvin Gülker) about 16 years ago. Updated about 15 years ago.

Status:

Rejected

Assignee:

Target version:

2.0.0

ruby -v:

ruby 1.9.2dev (2010-05-31 revision 28117) [x86_64-linux]

Backport:

Description

=begin
The StringScanner class from 1.9's stdlib works on bytes rather than on characters. That means, if you want to extract substrings from the original string by use of the return value of StringScanner#pos you get incorrect results:

irb(main):001:0> require "strscan"
=> true
irb(main):002:0> str = "abcädeföghi"
=> "abcädeföghi"
irb(main):003:0> ss = StringScanner.new(str)
=> #<StringScanner 0/13 @ "abc\xC3\xA4...">
irb(main):004:0> ss.scan_until(/ä/)
=> "abcä"
irb(main):005:0> ss.pos
=> 5
irb(main):006:0> ss.scan_until(/ö/)
=> "defö"
irb(main):007:0> ss.pos
=> 10
irb(main):008:0>

After the first scan_until I expected the position to be 4, after the second to be 8, which means we finally have an offset of 2 here.

My Ruby version is ruby 1.9.1p378 (2010-01-10 revision 26273) [x86_64-linux], but I also get the same beaviour with the 1.9.2-preview3 (ruby 1.9.2dev (2010-05-31 revision 28117) [x86_64-linux]).
=end

Related issues 1 (0 open — 1 closed)

Updated by mame (Yusuke Endoh) about 16 years ago Actions
Copy link
#1

Status changed from Open to Rejected

=begin
Hi,

It is a spec. See rdoc of StringScanner#pos.

FYI, IO#pos is also byte-oriented.
I guess this is because #pos is supposed to be byte-oriented.

--
Yusuke Endoh mame@tsg.ne.jp
=end

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Bug #3482

StringScanner#pos returns wrong character position if used with multibyte chars

Updated by mame (Yusuke Endoh) about 16 years ago Actions
Copy link
#1

Project

General

Profile

Ruby

Custom queries

Bug #3482

StringScanner#pos returns wrong character position if used with multibyte chars

Updated by mame (Yusuke Endoh) about 16 years ago ActionsCopy link #1

Updated by mame (Yusuke Endoh) about 16 years ago Actions
Copy link
#1