Feature #18634
Updated by peterzhu2118 (Peter Zhu) over 2 years ago
# GitHub PR: https://github.com/ruby/ruby/pull/5660 # Feature description This patch changes arrays to allocate through Variable Width Allocation. Similar to strings (implemented in ticket [#18239](https://bugs.ruby-lang.org/issues/18239)), arrays allocated through Variable Width Allocation are embedded, meaning the contents of the array directly follow the array object headers. When an array is resized, we fallback to allocating memory through the malloc heap. If the array was initially allocated in a larger slot, it would result in wastage of memory. However, in the benchmarks below, we can see that this wastage does not cause memory usage to increase significantly. # What's next We're working on implementing cross size pool compaction for Variable Width Allocation. This will allow us to both downsize objects (to save memory) and upsize objects (to improve cache performance). We're going to continue on implementing more types on Variable Width Allocation, such as Objects, Hashes, and ISeqs. # Benchmark setup Benchmarking was done on a bare-metal Ubuntu machine on AWS. All benchmark results are using glibc by default, except when jemalloc is explicitly specified. ``` $ uname -a Linux 5.13.0-1014-aws #15~20.04.1-Ubuntu SMP Thu Feb 10 17:55:03 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux ``` glibc version: ``` $ ldd --version ldd (Ubuntu GLIBC 2.31-0ubuntu9.2) 2.31 Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Written by Roland McGrath and Ulrich Drepper. ``` jemalloc version: ``` $ apt list --installed | grep jemalloc WARNING: apt does not have a stable CLI interface. Use with caution in scripts. libjemalloc-dev/focal,now 5.2.1-1ubuntu1 amd64 [installed] libjemalloc2/focal,now 5.2.1-1ubuntu1 amd64 [installed,automatic] ``` To measure memory usage over time, the [mstat tool](https://github.com/bpowers/mstat) was used. master was benchmarked on commit [bec492c77e](https://github.com/ruby/ruby/commit/bec492c77ed7659cafd2447cd042acde489c8d28). The branch was rebased on top of the same commit. ## railsbench For railsbench, we ran the [railsbench benchmark](https://github.com/k0kubun/railsbench/blob/master/bin/bench). For both the performance and memory benchmarks, 25 runs were conducted for each combination (branch + glibc, master + glibc, branch + jemalloc, master + jemalloc). For both glibc and jemalloc allocators, there is not a significant change in RPS, response times, or max memory usage. We can see in the RSS over time graph that the memory behavior of the branch and master is very similar. ### glibc ``` +-----------------------+--------+--------+-------------+ | | Branch | master | Improvement | +-----------------------+--------+--------+-------------+ | RPS | 810.38 | 809.50 | 1.00x | | p50 (ms) | 1.20 | 1.20 | 1.00x | | p90 (ms) | 1.32 | 1.32 | 1.00x | | p99 (ms) | 1.75 | 1.72 | 0.98x | | p100 (ms) | 5.53 | 6.02 | 1.09x 0.92x | | Max memory usage (MB) | 90.19 | 90.45 | 1.00x | +-----------------------+--------+--------+-------------+ ``` ![](https://user-images.githubusercontent.com/15860699/157101671-98568350-8960-4a33-8e55-856ab32a4bc1.png) ### jemalloc ``` +-----------------------+--------+--------+-------------+ | | Branch | master | Improvement | +-----------------------+--------+--------+-------------+ | RPS | 834.04 | 840.81 | 0.99x | | p50 (ms) | 1.18 | 1.17 | 0.99x | | p90 (ms) | 1.27 | 1.26 | 0.99x | | p99 (ms) | 1.69 | 1.65 | 0.98x | | p100 (ms) | 5.54 | 7.03 | 1.27x | | Max memory usage (MB) | 88.50 | 87.48 | 0.99x | +-----------------------+--------+--------+-------------+ ``` ![](https://user-images.githubusercontent.com/15860699/157101712-27d3e02f-4611-45b6-9c8b-c5983c301817.png) ## discourse Discourse was benchmarked through the [`script/bench.rb`](https://github.com/discourse/discourse/blob/main/script/bench.rb) benchmarking script. The response times for the `home` endpoint and RSS memory usage is shown below. We see a slight increase in memory usage (5%) with glibc and an insignificant memory usage increase with jemalloc. We don't see big differences in response times. ### glibc ``` +-----------+--------+--------+-------------+ | | Branch | master | Improvement | +-----------+--------+--------+-------------+ | p50 (ms) | 75 | 76 | 1.01x | | p90 (ms) | 88 | 90 | 1.02x | | p99 (ms) | 248 | 261 | 1.05x | | RSS (MB) | 364.48 | 383.80 | 1.05x | +-----------+--------+--------+-------------+ ``` ### jemalloc ``` +-----------+--------+--------+-------------+ | | Branch | master | Improvement | +-----------+--------+--------+-------------+ | p50 (ms) | 73 | 73 | 1.00x | | p90 (ms) | 84 | 86 | 1.02x | | p99 (ms) | 241 | 242 | 1.00x | | RSS (MB) | 347.56 | 349.86 | 1.01x | +-----------+--------+--------+-------------+ ``` ## rdoc generation In rdoc generation, we see a small improvement in performance in glibc and no change in performance for jemalloc. We see a small max memory usage increase for both glibc and jemalloc. Howevver, the RSS over time graph shows that except for the very end, the branch actually has lower memory usage than master. ### glibc ``` +-----------------------+--------+--------+-------------+ | | Branch | master | Improvement | +-----------------------+--------+--------+-------------+ | Time (s) | 17.81 | 18.11 | 1.02x | | Max memory usage (MB) | 287.74 | 283.24 | 0.98x | +-----------------------+--------+--------+-------------+ ``` ![](https://user-images.githubusercontent.com/15860699/157101976-805bde67-897e-473e-a2b7-16cdba7d21e4.png) ### jemalloc ``` +-----------------------+--------+--------+-------------+ | | Branch | master | Improvement | +-----------------------+--------+--------+-------------+ | Time (s) | 17.59 | 17.46 | 0.99x | | Max memory usage (MB) | 289.92 | 277.30 | 0.96x | +-----------------------+--------+--------+-------------+ ``` ![](https://user-images.githubusercontent.com/15860699/157102010-ad5cd8b9-91ab-4058-8e1b-35bdf2af47a4.png) ## optcarrot We don't see a change in performance in optcarrot. ``` +------+--------+--------+-------------+ | | Branch | master | Improvement | +------+--------+--------+-------------+ | FPS | 43.10 | 43.25 | 1.00x | +------+--------+--------+-------------+ ``` ## Liquid benchmarks We don't see a big change in performance in liquid benchmarks. ``` +----------------------+--------+--------+-------------+ | | Branch | master | Improvement | +----------------------+--------+--------+-------------+ | Parse (i/s) | 39.57 | 40.43 | 0.98x | | Render (i/s) | 129.78 | 130.22 | 1.00x | | Parse & Render (i/s) | 28.43 | 28.89 | 0.98x | +----------------------+--------+--------+-------------+ ``` ## Microbenchmarks These microbenchmarks are very favourable for VWA since the arrays created have a length of 10, so they are embedded in VWA and allocated on the malloc heap for master. ``` +-------------+--------+--------+-------------+ | | Branch | master | Improvement | +-------------+--------+--------+-------------+ | Array#first | 2.282k | 2.014k | 1.13x | | Array#last | 2.095k | 2.092k | 1.00x | | Array#[0]= | 2.232k | 2.079k | 1.07x | | Array#[-1]= | 2.181k | 2.064k | 1.06x | | Array#each | 319.92 | 314.22 | 1.02x | +-------------+--------+--------+-------------+ ``` {{collapse(Benchmark source code) ```ruby require "bundler/inline" gemfile do source "https://rubygems.org" gem "benchmark-ips" end COUNT = 10_000 arrays = [] COUNT.times do arrays << Array.new(10) end Benchmark.ips do |x| x.report("Array#first") do |times| i = 0 while i < times COUNT.times { |i| arrays[i].first } i += 1 end end x.report("Array#last") do |times| i = 0 while i < times COUNT.times { |i| arrays[i].last } i += 1 end end x.report("Array#[0]=") do |times| i = 0 while i < times COUNT.times { |i| arrays[i][0] = 0 } i += 1 end end x.report("Array#[-1]=") do |times| i = 0 while i < times COUNT.times { |i| arrays[i][-1] = 9 } i += 1 end end x.report("Array#each") do |times| i = 0 while i < times COUNT.times { |i| arrays[i].each { |x| } } i += 1 end end end ``` }}