=begin
In ruby 1.9, StringScanner#pos returns the position in number of bytes. I read on the ruby mailing list (http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/352809) this happens because working with character-based indexes would be too slow. However, I think it would be nice if StringScanner also provided a method which returned the position in terms of characters (even if it would be slow). As I see it, the situation is the same as with StringScanner#get_byte and StringScanner#getch. I think this would be useful because, when using StringScanner, you're usually interested in the character rather than in bytes.
=end
=begin
I had a similar problem: I wanted to extract a part of a StringScanner-backed string.
Consider the following use case:
The StringScanner ss is used to arrive at a certain position.
The current position is saved, ie. start_pos = ss.pos.
Then ss is used to do some scanning, arriving at a new position: end_pos = ss.pos
Extracting the string between start_pos and end_pos using ss.string[start_pos..end_pos] does not work in case the range contains multibyte characters.
My work-around is the following:
# Extract the part of the StringScanner +strscan+ backed string specified by the +range+. This
# method works correctly under Ruby 1.8 and Ruby 1.9.
def extract_string(range, strscan)
result = nil
if RUBY_VERSION >= '1.9'
begin
enc = strscan.string.encoding
strscan.string.force_encoding('ASCII-8BIT')
result = strscan.string[range].force_encoding(enc)
ensure
strscan.string.force_encoding(enc)
end
else
result = strscan.string[range]
end
result
end