Bug #7442: StringScanner#charpos vs StringScanner#pos - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #7442

closed

StringScanner#charpos vs StringScanner#pos

Added by zenspider (Ryan Davis) over 12 years ago. Updated over 12 years ago.

Status:

Closed

Assignee:

Target version:

3.0

ruby -v:

1.9.x

Backport:

[ruby-core:50190]

Description

=begin
I talked to Matz at rubyconf and he agreed this was a bug I should file. Sorry I took so long to do so.

As mentioned in #3482, StringScanner#pos is byte-oriented even when scanning multibyte strings. The reasoning was that IO#pos is byte-oriented so this is to spec and functioning correctly. The problem is that StringScanner isn't just an IO as it also represents a String and the progress scanning through it. Strings in 1.9+ must respect their encodings and with a few exceptions don't even support the idea of naked bytes. I think StringScanner must be able to respect that.

Given that ss is a StringScanner instance on a string with a valid encoding, getting the substring of the current progress via ss.string[0..ss.pos] can result in a String with invalid encoding. I propose that we add #charpos to make it possible to pull out a valid substring. This would also be useful towards being able to report proper offset or column information in the case of an error when you're using StringScanner as your lexer.

This is the code that I needed to get proper char-offsets (and substrings--I needed both for my purposes):

def string_to_pos
  string.byteslice(0, pos)
end

def charpos
  string_to_pos.length
end

=end

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Also available in: Atom PDF

Like0

Like0Like0Like0Like0

Project

General

Profile

Ruby

Tags

Custom queries

Bug #7442

StringScanner#charpos vs StringScanner#pos

Updated by mame (Yusuke Endoh) over 12 years ago

Updated by zenspider (Ryan Davis) over 12 years ago

Updated by zenspider (Ryan Davis) over 12 years ago

Updated by mame (Yusuke Endoh) over 12 years ago