Project

General

Profile

Feature #18634

Updated by peterzhu2118 (Peter Zhu) over 2 years ago

# GitHub PR: https://github.com/ruby/ruby/pull/5660 

 # Feature description 

 This patch changes arrays to allocate through Variable Width Allocation. 

 Similar to strings (implemented in ticket [#18239](https://bugs.ruby-lang.org/issues/18239)), arrays allocated through Variable Width Allocation are embedded, meaning the contents of the array directly follow the array object headers. 

 When an array is resized, we fallback to allocating memory through the malloc heap. If the array was initially allocated in a larger slot, it would result in wastage of memory. However, in the benchmarks below, we can see that this wastage does not cause memory usage to increase significantly. 

 # What's next 

 We're working on implementing cross size pool compaction for Variable Width Allocation. This will allow us to both downsize objects (to save memory) and upsize objects (to improve cache performance). 

 We're going to continue on implementing more types on Variable Width Allocation, such as Objects, Hashes, and ISeqs. 

 # Benchmark setup 

 Benchmarking was done on a bare-metal Ubuntu machine on AWS. All benchmark results are using glibc by default, except when jemalloc is explicitly specified. 

 ``` 
 $ uname -a 
 Linux 5.13.0-1014-aws #15~20.04.1-Ubuntu SMP Thu Feb 10 17:55:03 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux 
 ``` 

 glibc version: 

 ``` 
 $ ldd --version 
 ldd (Ubuntu GLIBC 2.31-0ubuntu9.2) 2.31 
 Copyright (C) 2020 Free Software Foundation, Inc. 
 This is free software; see the source for copying conditions.    There is NO 
 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 
 Written by Roland McGrath and Ulrich Drepper. 
 ``` 

 jemalloc version: 

 ``` 
 $ apt list --installed | grep jemalloc 

 WARNING: apt does not have a stable CLI interface. Use with caution in scripts. 

 libjemalloc-dev/focal,now 5.2.1-1ubuntu1 amd64 [installed] 
 libjemalloc2/focal,now 5.2.1-1ubuntu1 amd64 [installed,automatic] 
 ``` 

 To measure memory usage over time, the [mstat tool](https://github.com/bpowers/mstat) was used. 

 master was benchmarked on commit [bec492c77e](https://github.com/ruby/ruby/commit/bec492c77ed7659cafd2447cd042acde489c8d28). The branch was rebased on top of the same commit. 

 ## railsbench 

 For railsbench, we ran the [railsbench benchmark](https://github.com/k0kubun/railsbench/blob/master/bin/bench). For both the performance and memory benchmarks, 25 runs were conducted for each combination (branch + glibc, master + glibc, branch + jemalloc, master + jemalloc). 

 For both glibc and jemalloc allocators, there is not a significant change in RPS, response times, or max memory usage. We can see in the RSS over time graph that the memory behavior of the branch and master is very similar. 

 ### glibc 

 ``` 
 +-----------------------+--------+--------+-------------+ 
 |                         | Branch | master | Improvement | 
 +-----------------------+--------+--------+-------------+ 
 | RPS                     | 810.38 | 809.50 | 1.00x         | 
 | p50 (ms)                | 1.20     | 1.20     | 1.00x         | 
 | p90 (ms)                | 1.32     | 1.32     | 1.00x         | 
 | p99 (ms)                | 1.75     | 1.72     | 0.98x         | 
 | p100 (ms)               | 5.53     | 6.02     | 1.09x 0.92x         | 
 | Max memory usage (MB) | 90.19    | 90.45    | 1.00x         | 
 +-----------------------+--------+--------+-------------+ 
 ``` 

 ![](https://user-images.githubusercontent.com/15860699/157101671-98568350-8960-4a33-8e55-856ab32a4bc1.png) 


 ### jemalloc 

 ``` 
 +-----------------------+--------+--------+-------------+ 
 |                         | Branch | master | Improvement | 
 +-----------------------+--------+--------+-------------+ 
 | RPS                     | 834.04 | 840.81 | 0.99x         | 
 | p50 (ms)                | 1.18     | 1.17     | 0.99x         | 
 | p90 (ms)                | 1.27     | 1.26     | 0.99x         | 
 | p99 (ms)                | 1.69     | 1.65     | 0.98x         | 
 | p100 (ms)               | 5.54     | 7.03     | 1.27x         | 
 | Max memory usage (MB) | 88.50    | 87.48    | 0.99x         | 
 +-----------------------+--------+--------+-------------+ 
 ``` 

 ![](https://user-images.githubusercontent.com/15860699/157101712-27d3e02f-4611-45b6-9c8b-c5983c301817.png) 

 ## discourse 

 Discourse was benchmarked through the [`script/bench.rb`](https://github.com/discourse/discourse/blob/main/script/bench.rb) benchmarking script. The response times for the `home` endpoint and RSS memory usage is shown below. 

 We see a slight increase in memory usage (5%) with glibc and an insignificant memory usage increase with jemalloc. We don't see big differences in response times. 

 ### glibc 

 ``` 
 +-----------+--------+--------+-------------+ 
 |             | Branch | master | Improvement | 
 +-----------+--------+--------+-------------+ 
 | p50 (ms)    | 75       | 76       | 1.01x         | 
 | p90 (ms)    | 88       | 90       | 1.02x         | 
 | p99 (ms)    | 248      | 261      | 1.05x         | 
 | RSS (MB)    | 364.48 | 383.80 | 1.05x         | 
 +-----------+--------+--------+-------------+ 
 ``` 

 ### jemalloc 

 ``` 
 +-----------+--------+--------+-------------+ 
 |             | Branch | master | Improvement | 
 +-----------+--------+--------+-------------+ 
 | p50 (ms)    | 73       | 73       | 1.00x         | 
 | p90 (ms)    | 84       | 86       | 1.02x         | 
 | p99 (ms)    | 241      | 242      | 1.00x         | 
 | RSS (MB)    | 347.56 | 349.86 | 1.01x         | 
 +-----------+--------+--------+-------------+ 
 ``` 

 ## rdoc generation 

 In rdoc generation, we see a small improvement in performance in glibc and no change in performance for jemalloc. We see a small max memory usage increase for both glibc and jemalloc. Howevver, the RSS over time graph shows that except for the very end, the branch actually has lower memory usage than master. 

 ### glibc 

 ``` 
 +-----------------------+--------+--------+-------------+ 
 |                         | Branch | master | Improvement | 
 +-----------------------+--------+--------+-------------+ 
 | Time (s)                | 17.81    | 18.11    | 1.02x         | 
 | Max memory usage (MB) | 287.74 | 283.24 | 0.98x         | 
 +-----------------------+--------+--------+-------------+ 
 ``` 

 ![](https://user-images.githubusercontent.com/15860699/157101976-805bde67-897e-473e-a2b7-16cdba7d21e4.png) 

 ### jemalloc 

 ``` 
 +-----------------------+--------+--------+-------------+ 
 |                         | Branch | master | Improvement | 
 +-----------------------+--------+--------+-------------+ 
 | Time (s)                | 17.59    | 17.46    | 0.99x         | 
 | Max memory usage (MB) | 289.92 | 277.30 | 0.96x         | 
 +-----------------------+--------+--------+-------------+ 
 ``` 

 ![](https://user-images.githubusercontent.com/15860699/157102010-ad5cd8b9-91ab-4058-8e1b-35bdf2af47a4.png) 

 ## optcarrot 

 We don't see a change in performance in optcarrot. 

 ``` 
 +------+--------+--------+-------------+ 
 |        | Branch | master | Improvement | 
 +------+--------+--------+-------------+ 
 | FPS    | 43.10    | 43.25    | 1.00x         | 
 +------+--------+--------+-------------+ 
 ``` 

 ## Liquid benchmarks 

 We don't see a big change in performance in liquid benchmarks. 

 ``` 
 +----------------------+--------+--------+-------------+ 
 |                        | Branch | master | Improvement | 
 +----------------------+--------+--------+-------------+ 
 | Parse (i/s)            | 39.57    | 40.43    | 0.98x         | 
 | Render (i/s)           | 129.78 | 130.22 | 1.00x         | 
 | Parse & Render (i/s) | 28.43    | 28.89    | 0.98x         | 
 +----------------------+--------+--------+-------------+ 
 ``` 

 ## Microbenchmarks 

 These microbenchmarks are very favourable for VWA since the arrays created have a length of 10, so they are embedded in VWA and allocated on the malloc heap for master. 

 ``` 
 +-------------+--------+--------+-------------+ 
 |               | Branch | master | Improvement | 
 +-------------+--------+--------+-------------+ 
 | Array#first | 2.282k | 2.014k | 1.13x         | 
 | Array#last    | 2.095k | 2.092k | 1.00x         | 
 | Array#[0]=    | 2.232k | 2.079k | 1.07x         | 
 | Array#[-1]= | 2.181k | 2.064k | 1.06x         | 
 | Array#each    | 319.92 | 314.22 | 1.02x         | 
 +-------------+--------+--------+-------------+ 
 ``` 

 {{collapse(Benchmark source code) 


 ```ruby 
 require "bundler/inline" 
 gemfile do 
   source "https://rubygems.org" 
   gem "benchmark-ips" 
 end 

 COUNT = 10_000 

 arrays = [] 

 COUNT.times do 
   arrays << Array.new(10) 
 end 

 Benchmark.ips do |x| 
   x.report("Array#first") do |times| 
     i = 0 
     while i < times 
       COUNT.times { |i| arrays[i].first } 
       i += 1 
     end 
   end 

   x.report("Array#last") do |times| 
     i = 0 
     while i < times 
       COUNT.times { |i| arrays[i].last } 
       i += 1 
     end 
   end 

   x.report("Array#[0]=") do |times| 
     i = 0 
     while i < times 
       COUNT.times { |i| arrays[i][0] = 0 } 
       i += 1 
     end 
   end 

   x.report("Array#[-1]=") do |times| 
     i = 0 
     while i < times 
       COUNT.times { |i| arrays[i][-1] = 9 } 
       i += 1 
     end 
   end 

   x.report("Array#each") do |times| 
     i = 0 
     while i < times 
       COUNT.times { |i| arrays[i].each { |x| } } 
       i += 1 
     end 
   end 
 end 
 ``` 
 }} 

Back