Actions

Copy link

Feature #18634

closed

Variable Width Allocation: Arrays

Feature #18634: Variable Width Allocation: Arrays

Added by peterzhu2118 (Peter Zhu) almost 4 years ago. Updated almost 4 years ago.

Status:

Closed

Assignee:

Target version:

[ruby-core:107910]

Description

5660 ¶

Feature description¶

This patch changes arrays to allocate through Variable Width Allocation.

Similar to strings (implemented in ticket #18239), arrays allocated through Variable Width Allocation are embedded, meaning the contents of the array directly follow the array object headers.

When an array is resized, we fallback to allocating memory through the malloc heap. If the array was initially allocated in a larger slot, it would result in wastage of memory. However, in the benchmarks below, we can see that this wastage does not cause memory usage to increase significantly.

What's next¶

We're working on implementing cross size pool compaction for Variable Width Allocation. This will allow us to both downsize objects (to save memory) and upsize objects (to improve cache performance).

We're going to continue on implementing more types on Variable Width Allocation, such as Objects, Hashes, and ISeqs.

Benchmark setup¶

Benchmarking was done on a bare-metal Ubuntu machine on AWS. All benchmark results are using glibc by default, except when jemalloc is explicitly specified.

$ uname -a
Linux 5.13.0-1014-aws #15~20.04.1-Ubuntu SMP Thu Feb 10 17:55:03 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

glibc version:

$ ldd --version
ldd (Ubuntu GLIBC 2.31-0ubuntu9.2) 2.31
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.

jemalloc version:

$ apt list --installed | grep jemalloc

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

libjemalloc-dev/focal,now 5.2.1-1ubuntu1 amd64 [installed]
libjemalloc2/focal,now 5.2.1-1ubuntu1 amd64 [installed,automatic]

To measure memory usage over time, the mstat tool was used.

master was benchmarked on commit bec492c77e. The branch was rebased on top of the same commit.

railsbench¶

For railsbench, we ran the railsbench benchmark. For both the performance and memory benchmarks, 25 runs were conducted for each combination (branch + glibc, master + glibc, branch + jemalloc, master + jemalloc).

For both glibc and jemalloc allocators, there is not a significant change in RPS, response times, or max memory usage. We can see in the RSS over time graph that the memory behavior of the branch and master is very similar.

glibc¶

+-----------------------+--------+--------+-------------+
|                       | Branch | master | Improvement |
+-----------------------+--------+--------+-------------+
| RPS                   | 810.38 | 809.50 | 1.00x       |
| p50 (ms)              | 1.20   | 1.20   | 1.00x       |
| p90 (ms)              | 1.32   | 1.32   | 1.00x       |
| p99 (ms)              | 1.75   | 1.72   | 0.98x       |
| p100 (ms)             | 5.53   | 6.02   | 1.09x       |
| Max memory usage (MB) | 90.19  | 90.45  | 1.00x       |
+-----------------------+--------+--------+-------------+

jemalloc¶

+-----------------------+--------+--------+-------------+
|                       | Branch | master | Improvement |
+-----------------------+--------+--------+-------------+
| RPS                   | 834.04 | 840.81 | 0.99x       |
| p50 (ms)              | 1.18   | 1.17   | 0.99x       |
| p90 (ms)              | 1.27   | 1.26   | 0.99x       |
| p99 (ms)              | 1.69   | 1.65   | 0.98x       |
| p100 (ms)             | 5.54   | 7.03   | 1.27x       |
| Max memory usage (MB) | 88.50  | 87.48  | 0.99x       |
+-----------------------+--------+--------+-------------+

discourse¶

Discourse was benchmarked through the script/bench.rb benchmarking script. The response times for the home endpoint and RSS memory usage is shown below.

We see a slight increase in memory usage (5%) with glibc and an insignificant memory usage increase with jemalloc. We don't see big differences in response times.

glibc¶

+-----------+--------+--------+-------------+
|           | Branch | master | Improvement |
+-----------+--------+--------+-------------+
| p50 (ms)  | 75     | 76     | 1.01x       |
| p90 (ms)  | 88     | 90     | 1.02x       |
| p99 (ms)  | 248    | 261    | 1.05x       |
| RSS (MB)  | 364.48 | 383.80 | 1.05x       |
+-----------+--------+--------+-------------+

jemalloc¶

+-----------+--------+--------+-------------+
|           | Branch | master | Improvement |
+-----------+--------+--------+-------------+
| p50 (ms)  | 73     | 73     | 1.00x       |
| p90 (ms)  | 84     | 86     | 1.02x       |
| p99 (ms)  | 241    | 242    | 1.00x       |
| RSS (MB)  | 347.56 | 349.86 | 1.01x       |
+-----------+--------+--------+-------------+

rdoc generation¶

In rdoc generation, we see a small improvement in performance in glibc and no change in performance for jemalloc. We see a small max memory usage increase for both glibc and jemalloc. Howevver, the RSS over time graph shows that except for the very end, the branch actually has lower memory usage than master.

glibc¶

+-----------------------+--------+--------+-------------+
|                       | Branch | master | Improvement |
+-----------------------+--------+--------+-------------+
| Time (s)              | 17.81  | 18.11  | 1.02x       |
| Max memory usage (MB) | 287.74 | 283.24 | 0.98x       |
+-----------------------+--------+--------+-------------+

jemalloc¶

+-----------------------+--------+--------+-------------+
|                       | Branch | master | Improvement |
+-----------------------+--------+--------+-------------+
| Time (s)              | 17.59  | 17.46  | 0.99x       |
| Max memory usage (MB) | 289.92 | 277.30 | 0.96x       |
+-----------------------+--------+--------+-------------+

optcarrot¶

We don't see a change in performance in optcarrot.

+------+--------+--------+-------------+
|      | Branch | master | Improvement |
+------+--------+--------+-------------+
| FPS  | 43.10  | 43.25  | 1.00x       |
+------+--------+--------+-------------+

Liquid benchmarks¶

We don't see a big change in performance in liquid benchmarks.

+----------------------+--------+--------+-------------+
|                      | Branch | master | Improvement |
+----------------------+--------+--------+-------------+
| Parse (i/s)          | 39.57  | 40.43  | 0.98x       |
| Render (i/s)         | 129.78 | 130.22 | 1.00x       |
| Parse & Render (i/s) | 28.43  | 28.89  | 0.98x       |
+----------------------+--------+--------+-------------+

Microbenchmarks¶

These microbenchmarks are very favourable for VWA since the arrays created have a length of 10, so they are embedded in VWA and allocated on the malloc heap for master.

+-------------+--------+--------+-------------+
|             | Branch | master | Improvement |
+-------------+--------+--------+-------------+
| Array#first | 2.282k | 2.014k | 1.13x       |
| Array#last  | 2.095k | 2.092k | 1.00x       |
| Array#[0]=  | 2.232k | 2.079k | 1.07x       |
| Array#[-1]= | 2.181k | 2.064k | 1.06x       |
| Array#each  | 319.92 | 314.22 | 1.02x       |
+-------------+--------+--------+-------------+

Benchmark source code Benchmark source code

require "bundler/inline"
gemfile do
  source "https://rubygems.org"
  gem "benchmark-ips"
end

COUNT = 10_000

arrays = []

COUNT.times do
  arrays << Array.new(10)
end

Benchmark.ips do |x|
  x.report("Array#first") do |times|
    i = 0
    while i < times
      COUNT.times { |i| arrays[i].first }
      i += 1
    end
  end

  x.report("Array#last") do |times|
    i = 0
    while i < times
      COUNT.times { |i| arrays[i].last }
      i += 1
    end
  end

  x.report("Array#[0]=") do |times|
    i = 0
    while i < times
      COUNT.times { |i| arrays[i][0] = 0 }
      i += 1
    end
  end

  x.report("Array#[-1]=") do |times|
    i = 0
    while i < times
      COUNT.times { |i| arrays[i][-1] = 9 }
      i += 1
    end
  end

  x.report("Array#each") do |times|
    i = 0
    while i < times
      COUNT.times { |i| arrays[i].each { |x| } }
      i += 1
    end
  end
end

Updated by Eregon (Benoit Daloze) almost 4 years ago Actions
Copy link
#1 [ruby-core:107913]

Improvement is Branch/master?
It seems inconsistent for these 2 example lines:

| p100 (ms)             | 5.53   | 6.02   | 0.92x       |
| p100 (ms)             | 5.54   | 7.03   | 1.27x       |

i.e. the Branch takes less time for both, but once improvement is <1x and once >1x.

Updated by peterzhu2118 (Peter Zhu) almost 4 years ago Actions
Copy link
#2

Description updated (diff)

Updated by peterzhu2118 (Peter Zhu) almost 4 years ago Actions
Copy link
#3 [ruby-core:107914]

Thanks for catching that. It was a math error.

Improvement for numbers where higher is better (e.g. RPS, IPS) is calculated as branch/master.

Improvement for numbers where lower is better (e.g. response times) is calculated as master/branch.

In other words, we want to target improvement >= 1 to not make performance worse.

Updated by peterzhu2118 (Peter Zhu) almost 4 years ago Actions
Copy link
#4

Status changed from Open to Closed

Applied in changeset git|a51f30c6712798fc07e57f692d0d0e5ccc59acf1.

[Feature #18634] Implement Arrays on Variable Width Allocation

This commit implements arrays on Variable Width Allocation. This allows
longer arrays to be embedded (i.e. contents directly follow the object
header) which improves performance through better cache locality.

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Feature #18634

Variable Width Allocation: Arrays

GitHub PR: https://github.com/ruby/ruby/pull/5660 ¶

Feature description¶

What's next¶

Benchmark setup¶

railsbench¶

glibc¶

jemalloc¶

discourse¶

glibc¶

jemalloc¶

rdoc generation¶

glibc¶

jemalloc¶

optcarrot¶

Liquid benchmarks¶

Microbenchmarks¶

Updated by Eregon (Benoit Daloze) almost 4 years ago Actions
Copy link
#1 [ruby-core:107913]

Updated by peterzhu2118 (Peter Zhu) almost 4 years ago Actions
Copy link
#2

Updated by peterzhu2118 (Peter Zhu) almost 4 years ago Actions
Copy link
#3 [ruby-core:107914]

Updated by peterzhu2118 (Peter Zhu) almost 4 years ago Actions
Copy link
#4

Project

General

Profile

Ruby

Custom queries

Feature #18634

Variable Width Allocation: Arrays

GitHub PR: https://github.com/ruby/ruby/pull/5660 ¶

Feature description¶

What's next¶

Benchmark setup¶

railsbench¶

glibc¶

jemalloc¶

discourse¶

glibc¶

jemalloc¶

rdoc generation¶

glibc¶

jemalloc¶

optcarrot¶

Liquid benchmarks¶

Microbenchmarks¶

Updated by Eregon (Benoit Daloze) almost 4 years ago ActionsCopy link #1 [ruby-core:107913]

Updated by peterzhu2118 (Peter Zhu) almost 4 years ago ActionsCopy link #2

Updated by peterzhu2118 (Peter Zhu) almost 4 years ago ActionsCopy link #3 [ruby-core:107914]

Updated by peterzhu2118 (Peter Zhu) almost 4 years ago ActionsCopy link #4

Updated by Eregon (Benoit Daloze) almost 4 years ago Actions
Copy link
#1 [ruby-core:107913]

Updated by peterzhu2118 (Peter Zhu) almost 4 years ago Actions
Copy link
#2

Updated by peterzhu2118 (Peter Zhu) almost 4 years ago Actions
Copy link
#3 [ruby-core:107914]

Updated by peterzhu2118 (Peter Zhu) almost 4 years ago Actions
Copy link
#4