Feature #2645

Have a method in StringScanner which returns the position in characters rather than in bytes

Added by Stefano Crocco about 5 years ago. Updated over 2 years ago.

[ruby-core:27792]
Status:Rejected
Priority:Low
Assignee:Yui NARUSE

Description

=begin
In ruby 1.9, StringScanner#pos returns the position in number of bytes. I read on the ruby mailing list (http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/352809) this happens because working with character-based indexes would be too slow. However, I think it would be nice if StringScanner also provided a method which returned the position in terms of characters (even if it would be slow). As I see it, the situation is the same as with StringScanner#get_byte and StringScanner#getch. I think this would be useful because, when using StringScanner, you're usually interested in the character rather than in bytes.
=end


Related issues

Duplicates Ruby trunk - Feature #1159: StringScanner に文字ベースでのインデックスを返すメソッドがほしい Rejected 02/14/2009

History

#1 Updated by Kornelius Kalnbach about 5 years ago

=begin
+1...but what to name it?

  • char_pos
  • chpos
  • index (like String#index)

    by the way, the documentation for StringScanner#pos states:

    In the 'terminated' position (i.e. the string
    is exhausted), this value is the length of the string.

    This is not true:

    irb(main):002:0> s = StringScanner.new('äöü'); s.scan(/.*/); s.pos
    => 6
    irb(main):003:0> s.string.length
    => 3
    =end

#2 Updated by Yui NARUSE about 5 years ago

  • Priority changed from Normal to Low

=begin
StringScanner's pos is related to IO#pos.

Feature#1159 is also about this. (but in Japanaese)

A problem is:
ss = StringScanner.new("äöü")
ss.get_byte
ss.char_pos #=> what is this result?

And more, I doubt the use case.
Can you tell us more detailed use case?

the documentation for StringScanner#pos states:
In the 'terminated' position (i.e. the string
is exhausted), this value is the length of the string.

thanks, I fixed the doc.
=end

#3 Updated by Kazuhiro NISHIYAMA almost 5 years ago

  • Category set to ext
  • Target version set to 2.0.0

=begin

=end

#4 Updated by Kazuhiro NISHIYAMA almost 5 years ago

  • Status changed from Open to Feedback

=begin

=end

#5 Updated by Thomas Leitner over 4 years ago

=begin
I had a similar problem: I wanted to extract a part of a StringScanner-backed string.

Consider the following use case:

  • The StringScanner ss is used to arrive at a certain position.
  • The current position is saved, ie. start_pos = ss.pos.
  • Then ss is used to do some scanning, arriving at a new position: end_pos = ss.pos
  • Extracting the string between start_pos and end_pos using ss.string[start_pos..end_pos] does not work in case the range contains multibyte characters.

    My work-around is the following:

    # Extract the part of the StringScanner +strscan+ backed string specified by the +range+. This
    # method works correctly under Ruby 1.8 and Ruby 1.9.
    def extract_string(range, strscan)
    result = nil
    if RUBY_VERSION >= '1.9'
    begin
    enc = strscan.string.encoding
    strscan.string.force_encoding('ASCII-8BIT')
    result = strscan.string[range].force_encoding(enc)
    ensure
    strscan.string.force_encoding(enc)
    end
    else
    result = strscan.string[range]
    end
    result
    end
    =end

#6 Updated by Yui NARUSE almost 3 years ago

  • Description updated (diff)

gettalong wrote:

I had a similar problem: I wanted to extract a part of a StringScanner-backed string.

Consider the following use case:

  • The StringScanner ss is used to arrive at a certain position.
  • The current position is saved, ie. start_pos = ss.pos.
  • Then ss is used to do some scanning, arriving at a new position: end_pos = ss.pos
  • Extracting the string between start_pos and end_pos using ss.string[start_pos..end_pos] does not work in case the range contains multibyte characters.

You can use String#byteslice.

#7 Updated by Yusuke Endoh almost 3 years ago

  • Assignee set to Yui NARUSE

#8 Updated by Yui NARUSE over 2 years ago

  • Status changed from Feedback to Rejected

Also available in: Atom PDF