Bug #19361: String#[Integer] is orders slower for strings with some UTF characters - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #19361

closed

String#[Integer] is orders slower for strings with some UTF characters

Bug #19361: String#[Integer] is orders slower for strings with some UTF characters

Added by vzdor (Vladimir Zdorovenco) almost 3 years ago. Updated almost 3 years ago.

Status:

Rejected

Assignee:

Target version:

ruby -v:

ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux-gnu]

Backport:

2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN

[ruby-core:111952]

Description

#[] is not only slower compared to itself, but slower compared to #each_char.

seq1

# s = '*' * 10e4
s = 'ф' * 10e4
count = 0
size = s.size
while count < size
  s[count]
  count += 1
end

seq2

ss = 'ф' * 10e4
s = ss.chars
count = 0
size = s.size
while count < size
  s[count]
  count += 1
end

On my computer seq1 runs in 11 seconds and seq2 in 0.5 second. It can
be '克' symbol, too, I'm sure not only those symbols.

I would not have assumed seq1 can be slower, I do not call s[n] more
than once for some n.

It is a Debian package with some patches, but they do not touch string.c.

$ locale
LANG=en_US.UTF-8

Updated by byroot (Jean Boussier) almost 3 years ago Actions
Copy link
#1 [ruby-core:111957]

Status changed from Open to Rejected

This is expected. String#[Integer] doesn't return a byte but a character, which in UTF-8 may be of variable size, so Ruby has to scan the string from the beginning every time.

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Tags

Custom queries

Bug #19361

String#[Integer] is orders slower for strings with some UTF characters

Updated by byroot (Jean Boussier) almost 3 years ago Actions
Copy link
#1 [ruby-core:111957]

Project

General

Profile

Ruby

Tags

Custom queries

Bug #19361

String#[Integer] is orders slower for strings with some UTF characters

Updated by byroot (Jean Boussier) almost 3 years ago ActionsCopy link #1 [ruby-core:111957]

Updated by byroot (Jean Boussier) almost 3 years ago Actions
Copy link
#1 [ruby-core:111957]