Bug #19438
openRuby 2.7 -> 3.2 Performance Regression in so_k_nucleotide benchmark
Description
Introduction¶
Recently I had been going through some of the old benchmarks in the Ruby Great Implementation Shootout from around 2010.
As an experiment, one night I ran the benchmarks against Ruby 3.2.0, Ruby 3.2.0 --yjit, TruffleRuby, TruffleRuby +GraalVM, and Ruby 2.6.10.
Most results were as expected. However there was a benchmark that Ruby 2.6.10 was consistently outperforming all new Rubies on.
Method¶
After pairing with @eightbitraptor (Matt V-H), we discovered that this old benchmark was remarkably similar to an existing benchmark in the /benchmark
directory, so_k_nucleotide.yml. We decided to go with that benchmark. For brevity I have not included the full 150 lines of the benchmark here.
I tested this benchmark out with 100 runs using benchmark-driver
against Ruby 2.7,3.0,3.1,3.2. (I had discovered that 2.7 was even faster than 2.6.).
It appears that about half of the regression occured from 2.7 -> 3.0; the other half from 3.0 -> 3.2. One other interesting finding is that each minor version does appear to regress
from the last, even if a little.
Code¶
This is my benchmark running code and harnass. The full code and data can be found here
RUNS = 100
results = Hash.new { |h, k| h[k] = [] }
RUNS.times do |i|
puts i
run = `benchmark-driver so_k_nucleotide.yml --chruby '2.7.5;3.0.5;3.1.3;3.2.0' -o simple`
run.scan(/\d\.\d\.\d/).each_with_index do |version, index|
results[version] << run.scan(/\d\.\d\d\d/)[index]
end
end
require 'csv'
columns = results.keys
outdata = CSV.generate do |csv|
csv << columns
RUNS.times do |i|
csv << columns.map { |c| results[c][i] }
end
end
File.write("output.csv", outdata)
Data¶
Ruby 2.7.5 was consistently ~18-20% faster than Ruby 3.2.0 in this Benchmark
Next Steps¶
I am happy to help investigate or learn more about this regression if anyone has any ideas.
Updated by byroot (Jean Boussier) almost 2 years ago
- Subject changed from Ruby 2.7 -> 3.2 Performance Regression to Ruby 2.7 -> 3.2 Performance Regression in CSV
CSV being a gem, it's very likely that the regression isn't in ruby itself, but in the successive versions of the csv gem being bundled with Ruby:
ruby 3.2 => csv 3.2.6
ruby 3.1 => csv 3.2.2
ruby 3.0 => csv 3.1.9
ruby 2.7 => csv 3.1.2
ruby 2.6 => csv 3.0.9
So if you wish to investigate this further, the best start would be to benchmark csv 3.0.9
on Ruby 3.2, and csv 3.2.6
on Ruby 2.7 or 2.6.
I'm not very familiar with the CSV
gem, but CSV as a format is losely defined, so I really wouldn't be surprised if over the years various fixes were added to handle corner cases and that it would impact performance.
Updated by byroot (Jean Boussier) almost 2 years ago
- Subject changed from Ruby 2.7 -> 3.2 Performance Regression in CSV to Ruby 2.7 -> 3.2 Performance Regression in so_k_nucleotide benchmark
Oh, I'm sorry, I misread your benchmark and thought it was a CSV based benchmark, I now see it's radically different, please disregard my previous answer.
Updated by nickborromeo (Nick Borromeo) almost 2 years ago
I took a look at the benchmark, and I was wondering what exactly it was trying to benchmark. It looks like its benchmarking the ability to sum up the word count of a given substring. If you agree that that is what is being benchmarked, one thing we can do is change the benchmark to use tally
instead of manually incrementing the count. I ran the benchmark with this change and got the following result.
❯ benchmark-driver so_k_nucleotide.yml --rbenv '2.7.5;3.0.5;3.1.2;3.2.1' --repeat-count 5
Calculating -------------------------------------
2.7.5 3.0.5 3.1.2 3.2.1
so_k_nucleotide 1.671 1.728 1.893 2.045 i/s - 1.000 times in 0.598269s 0.578549s 0.528261s 0.489104s
Comparison:
so_k_nucleotide
3.2.1: 2.0 i/s
3.1.2: 1.9 i/s - 1.08x slower
3.0.5: 1.7 i/s - 1.18x slower
2.7.5: 1.7 i/s - 1.22x slower
The method to count the frequency looks something like this
def word_tally(seq, length)
words = []
n = seq.length - length + 1
(0 ... length).each do |f|
(f ... n).step(length) do |i|
words << seq[i,length]
end
end
table = words.tally
[n,table]
end
In some of the iterations that I played around with, the slow down seems to be coming from the incrementing of the value of a particular key.
- taking the substring
- accessing the value of that key from the Hash
Updated by byroot (Jean Boussier) almost 2 years ago
If you agree that that is what is being benchmarked, one thing we can do is change the benchmark to use tally instead of manually incrementing the count.
So this is a benchmark that was popularized by the "benchmark-game", if the goal is to look better on that benchmark (I personally don't care for it) then yes that is a solution.
https://benchmarksgame-team.pages.debian.net/benchmarksgame/description/knucleotide.html#knucleotide
However I think the interesting part of this issue, would be to figure out why that specific implementation got slower over time.
It would be interesting to break down the algorithm in smaller pieces and benchmark them idependently to see which operation exactly got slower.
Updated by nickborromeo (Nick Borromeo) almost 2 years ago
However I think the interesting part of this issue, would be to figure out why that specific implementation got slower over time.
I definitely agree!
It would be interesting to break down the algorithm in smaller pieces and benchmark them idependently to see which operation exactly got slower.
before I did the change to tally
I also broke it down to smaller steps and ran the benchmark, apologies for the wall of text to follow; a direct quote from a discussion I had with others looking at this
I may have found something interesting while playing around with the benchmark tool. I wanted to see if we can narrow it down to a specific thing that might be slow. so I got rid of the the sort_by_freq
and find_seq
methods for now and focused mostly on the frecuency
method since stackprof identified that as the slowest one
Just so we have a baseline here is the code and the benchmark of the frecuency
method untouched
def frecuency(seq, length)
frequencies = Hash.new(0)
ns = seq.length + 1 - length
for i in (0 ... ns)
frequencies[seq[i, length]] += 1
end
[ns, frequencies]
end
Calculating -------------------------------------
2.7.5 3.0.5 3.1.2 3.2.1
so_k_nucleotide 7.530 6.524 6.056 6.294 i/s - 1.000 times in 0.132808s 0.153277s 0.165117s 0.158884s
Comparison:
so_k_nucleotide
2.7.5: 7.5 i/s
3.0.5: 6.5 i/s - 1.15x slower
3.2.1: 6.3 i/s - 1.20x slower
3.1.2: 6.1 i/s - 1.24x slower
Now lets look at String#[]
as the only operation
def frecuency(seq, length)
frequencies = Hash.new(0)
ns = seq.length + 1 - length
for i in (0 ... ns)
seq[i, length]
# frequencies[seq[i, length]] += 1
end
[ns, frequencies]
end
❯ benchmark-driver so_k_nucleotide.yml --rbenv '2.7.5;3.0.5;3.1.2;3.2.1' --repeat-count 5
Calculating -------------------------------------
2.7.5 3.0.5 3.1.2 3.2.1
so_k_nucleotide 14.000 14.225 13.797 14.096 i/s - 1.000 times in 0.071427s 0.070300s 0.072481s 0.070944s
Comparison:
so_k_nucleotide
3.0.5: 14.2 i/s
3.2.1: 14.1 i/s - 1.01x slower
2.7.5: 14.0 i/s - 1.02x slower
3.1.2: 13.8 i/s - 1.03x slower
Now lets look at String#[]
and Hash#[]=
def frecuency(seq, length)
frequencies = Hash.new(0)
ns = seq.length + 1 - length
for i in (0 ... ns)
frequencies[seq[i, length]]
end
[ns, frequencies]
end
Calculating -------------------------------------
2.7.5 3.0.5 3.1.2 3.2.1
so_k_nucleotide 12.516 12.248 12.058 13.034 i/s - 1.000 times in 0.079900s 0.081645s 0.082931s 0.076725s
Comparison:
so_k_nucleotide
3.2.1: 13.0 i/s
2.7.5: 12.5 i/s - 1.04x slower
3.0.5: 12.2 i/s - 1.06x slower
3.1.2: 12.1 i/s - 1.08x slower
However the interesting bit is when we assign things to variables or when we increment the counter for a hash value which I think is Hash#[]=
def frecuency(seq, length)
frequencies = Hash.new(0)
ns = seq.length + 1 - length
for i in (0 ... ns)
x = seq[i, length]
end
[ns, frequencies]
end
❯ benchmark-driver so_k_nucleotide.yml --rbenv '2.7.5;3.0.5;3.1.2;3.2.1' --repeat-count 5
Calculating -------------------------------------
2.7.5 3.0.5 3.1.2 3.2.1
so_k_nucleotide 13.641 12.170 11.857 12.001 i/s - 1.000 times in 0.073306s 0.082168s 0.084335s 0.083325s
Comparison:
so_k_nucleotide
2.7.5: 13.6 i/s
3.0.5: 12.2 i/s - 1.12x slower
3.2.1: 12.0 i/s - 1.14x slower
3.1.2: 11.9 i/s - 1.15x slower
This is the same as the baseline in the start of the comment
❯ benchmark-driver so_k_nucleotide.yml --rbenv '2.7.5;3.0.5;3.1.2;3.2.1' --repeat-count 5
Calculating -------------------------------------
2.7.5 3.0.5 3.1.2 3.2.1
so_k_nucleotide 7.626 6.474 6.135 6.239 i/s - 1.000 times in 0.131126s 0.154468s 0.163005s 0.160283s
Comparison:
so_k_nucleotide
2.7.5: 7.6 i/s
3.0.5: 6.5 i/s - 1.18x slower
3.2.1: 6.2 i/s - 1.22x slower
3.1.2: 6.1 i/s - 1.24x slower
I think we might need to look into the assignment methods instead of the access methods for string and hash 🤔