Bug #13228

s[i]=c(assigning a character) for String is slower than Array on Linux

Added by yoshiokatsuneo (Tsuneo Yoshioka) over 3 years ago. Updated over 3 years ago.

Target version:
ruby -v:
ruby 2.4.0p0 (2016-12-24 revision 57164) [x86_64-linux]


s[i]=c(assigning a character) for String is slower than for Array on Linux.

If I split the String to Array, and assign characters, and join the Array to String,
then it is much faster than assigning characters directly to the string.

Somehow, I don't see the performance difference on Mac OS X.

~$ time ruby -e 'N=100000; s="a"*N; N.times{s[Random.rand(N)]="Z"}; puts s' >/dev/null

real    0m0.879s
user    0m0.836s
sys     0m0.012s
~$ time ruby -e 'N=100000;s="a"*N;s=s.split(""); N.times{s[Random.rand(N)]="Z"}; puts s.join("")' >/dev/null

real    0m0.153s
user    0m0.108s
sys     0m0.016s

~$ uname -a
Linux aaaaaaaa 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
~$ ruby --version
ruby 2.4.0p0 (2016-12-24 revision 57164) [x86_64-linux]
~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.1 LTS
Release:        16.04
Codename:       xenial

Updated by wanabe (_ wanabe) over 3 years ago

  • ruby -v set to ruby 2.4.0p0 (2016-12-24 revision 57164) [x86_64-linux]

perf shows that ruby spent most of the time in search_nonascii().

$ perf record ruby -ve 'n=100000; s = "a" * n; t =; n.times do |i| s[i] = "z"; end; p - t'
ruby 2.5.0dev (2017-02-18 trunk 57652) [x86_64-linux]
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 0.858 MB (21558 samples) ]

$ perf report -n --stdio|head -20
# To display the header info, please use --header/--header-only options.
# Total Lost Samples: 0
# Samples: 21K of event 'cycles'
# Event count (approx.): 15739606081
# Overhead       Samples  Command   Shared Object      Symbol                                    
# ........  ............  ........  .................  ..........................................
    96.14%         20654  ruby      ruby               [.] search_nonascii
     0.18%            45  ruby      ruby               [.] ruby_yyparse
     0.18%            38  ruby      [wl]               [k] osl_readl
     0.17%            38  ruby      ruby               [.] vm_exec_core
     0.13%            27  ruby      [kernel.kallsyms]  [k] delay_tsc
     0.07%            16  ruby      ruby               [.] rb_str_splice_0
     0.07%            15  ruby      ruby               [.] gc_page_sweep
     0.07%            15  ruby      ruby               [.] rb_enc_from_index
     0.06%            13  ruby      ruby               [.] rb_str_update

I wonder the script uses only ASCII characters, and we have RUBY_ENC_CODERANGE_7BIT.
But rb_str_splice_0() calls rb_str_modify() and clear code-range information by ENC_CODERANGE_CLEAR().

Updated by sorah (Sorah Fukumori) over 3 years ago

Difference of locale configuration, not OS?

sorah@yuuki ~ $ uname -a
Linux yuuki 4.9.6-gentoo-r1 #1 SMP Sun Feb 12 01:20:31 UTC 2017 x86_64 Intel(R) Celeron(R) CPU N3050 @ 1.60GHz GenuineIntel GNU/Linux

sorah@yuuki ~ $ time env LANG=C  ruby -e 'N=100000; s="a"*N; N.times{s[Random.rand(N)]="Z"}; puts s' >/dev/null

real    0m0.387s
user    0m0.229s
sys     0m0.085s
sorah@yuuki ~ $ time env LANG=en_US.UTF-8  ruby -e 'N=100000; s="a"*N; N.times{s[Random.rand(N)]="Z"}; puts s' >/dev/null

real    0m3.015s
user    0m2.919s
sys     0m0.079s

Updated by naruse (Yui NARUSE) over 3 years ago

  • Status changed from Open to Rejected

It's natural because String index access requires character counting.
If you need performance and the string is ASCII or Binary, you can set encoding of the string by String#force_encoding.
Then ruby can use direct index access.

Maybe your Mac's locale is LANG=C and strings are handled as single byte encoding.
You can confirm this by Encoding.locale_charmap.


Updated by nobu (Nobuyoshi Nakada) over 3 years ago

  • Description updated (diff)

Also available in: Atom PDF