Project

General

Profile

Backport #4028

substring selection and utf8 encoding problem

Added by barcala (Fco. Mario Barcala Rodríguez) over 8 years ago. Updated almost 8 years ago.

Status:
Assigned
Priority:
Normal
[ruby-core:33072]

Description

=begin
Substring selection does not work with some utf8 encoded strings. Below is an example. The first substring is well extracted but the second not (extrange characters appear at the end of the substring).

It seems it occurs when the string includes letters with umlauts, accents, etc.

$ irb

ruby-1.9.1-p378 > word = "Ábaco"
=> "Ábaco"
ruby-1.9.1-p378 > substr = word[word.length-1,word.length]
=> "o"
ruby-1.9.1-p378 > word = "Coordinador de ONG's do País Valenciano"
=> "Coordinador de ONG's do País Valenciano"
ruby-1.9.1-p378 > substr = word[word.length-1,word.length]
=> "o\x00\x00\x01\x00\x01\x00\x00\x00"
=end


Related issues

Is duplicate of Ruby trunk - Bug #2379: String#[] returns invalid values for short multibyte stringsClosed11/18/2009Actions

History

#1

Updated by barcala (Fco. Mario Barcala Rodríguez) over 8 years ago

=begin
The same error occurs in ruby-1.9.1-p430
=end

#2

Updated by barcala (Fco. Mario Barcala Rodríguez) over 8 years ago

=begin
It seems to be solved in ruby-1.9.2-p0 version. I can't reproduce the error in 1.9.2-p0
=end

#3

Updated by barcala (Fco. Mario Barcala Rodríguez) over 8 years ago

=begin
Showed example uses substring selection in a wrong way. Example should be:

ruby-1.9.1-p378 > word = "Ábaco"
=> "Ábaco"
ruby-1.9.1-p378 > substr = word[word.length-1,1]
=> "o"
ruby-1.9.1-p378 > word = "Coordinador de ONG's do País Valenciano"
=> "Coordinador de ONG's do País Valenciano"
ruby-1.9.1-p378 > substr = word[word.length-1,1]
=> "o"

This new example works fine, so the problem arises only when the second value of substring selection exceeds the limits of the string.
=end

#4

Updated by naruse (Yui NARUSE) over 8 years ago

  • Status changed from Open to Assigned
  • Assignee set to yugui (Yuki Sonoda)
  • Priority changed from 5 to Normal

=begin
Confirmed:
ruby 1.9.1p430 (2010-08-16 revision 28997) [x86_64-freebsd8.1]
ruby-1.9.1-p378 > word = "Coordinador de ONG's do País Valenciano"
=> "Coordinador de ONG's do País Valenciano"
ruby-1.9.1-p378 > substr = word[word.length-1,word.length]
=> "o\x00\x00\x01\x00\x01\x00\x00\x00"
=end

Also available in: Atom PDF