Actions
Bug #19361
closedString#[Integer] is orders slower for strings with some UTF characters
Status:
Rejected
Assignee:
-
Target version:
-
ruby -v:
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux-gnu]
Description
#[] is not only slower compared to itself, but slower compared to #each_char.
seq1
# s = '*' * 10e4
s = 'ф' * 10e4
count = 0
size = s.size
while count < size
s[count]
count += 1
end
seq2
ss = 'ф' * 10e4
s = ss.chars
count = 0
size = s.size
while count < size
s[count]
count += 1
end
On my computer seq1 runs in 11 seconds and seq2 in 0.5 second. It can
be '克' symbol, too, I'm sure not only those symbols.
I would not have assumed seq1 can be slower, I do not call s[n] more
than once for some n.
It is a Debian package with some patches, but they do not touch string.c.
$ locale
LANG=en_US.UTF-8
Updated by byroot (Jean Boussier) over 2 years ago
- Status changed from Open to Rejected
This is expected. String#[Integer]
doesn't return a byte but a character, which in UTF-8 may be of variable size, so Ruby has to scan the string from the beginning every time.
Actions
Like0
Like0