https://bugs.ruby-lang.org/https://bugs.ruby-lang.org/favicon.ico?17113305112023-02-15T18:20:24ZRuby Issue Tracking SystemRuby master - Bug #19438: Ruby 2.7 -> 3.2 Performance Regression in so_k_nucleotide benchmarkhttps://bugs.ruby-lang.org/issues/19438?journal_id=1018802023-02-15T18:20:24Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul><li><strong>Subject</strong> changed from <i>Ruby 2.7 -> 3.2 Performance Regression</i> to <i>Ruby 2.7 -> 3.2 Performance Regression in CSV</i></li></ul><p>CSV being a gem, it's very likely that the regression isn't in ruby itself, but in the successive versions of the csv gem being bundled with Ruby:</p>
<pre><code>ruby 3.2 => csv 3.2.6
ruby 3.1 => csv 3.2.2
ruby 3.0 => csv 3.1.9
ruby 2.7 => csv 3.1.2
ruby 2.6 => csv 3.0.9
</code></pre>
<p>So if you wish to investigate this further, the best start would be to benchmark <code>csv 3.0.9</code> on Ruby 3.2, and <code>csv 3.2.6</code> on Ruby 2.7 or 2.6.</p>
<p>I'm not very familiar with the <code>CSV</code> gem, but CSV as a format is losely defined, so I really wouldn't be surprised if over the years various fixes were added to handle corner cases and that it would impact performance.</p> Ruby master - Bug #19438: Ruby 2.7 -> 3.2 Performance Regression in so_k_nucleotide benchmarkhttps://bugs.ruby-lang.org/issues/19438?journal_id=1018812023-02-15T18:22:07Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul><li><strong>Subject</strong> changed from <i>Ruby 2.7 -> 3.2 Performance Regression in CSV</i> to <i>Ruby 2.7 -> 3.2 Performance Regression in so_k_nucleotide benchmark</i></li></ul><p>Oh, I'm sorry, I misread your benchmark and thought it was a CSV based benchmark, I now see it's radically different, please disregard my previous answer.</p> Ruby master - Bug #19438: Ruby 2.7 -> 3.2 Performance Regression in so_k_nucleotide benchmarkhttps://bugs.ruby-lang.org/issues/19438?journal_id=1021392023-03-04T00:40:17Znickborromeo (Nick Borromeo)
<ul></ul><p>I took a look at the benchmark, and I was wondering what exactly it was trying to benchmark. It looks like its benchmarking the ability to sum up the word count of a given substring. If you agree that that is what is being benchmarked, one thing we can do is change the benchmark to use <code>tally</code> instead of manually incrementing the count. I ran the benchmark with this change and got the following result.</p>
<pre><code>❯ benchmark-driver so_k_nucleotide.yml --rbenv '2.7.5;3.0.5;3.1.2;3.2.1' --repeat-count 5
Calculating -------------------------------------
2.7.5 3.0.5 3.1.2 3.2.1
so_k_nucleotide 1.671 1.728 1.893 2.045 i/s - 1.000 times in 0.598269s 0.578549s 0.528261s 0.489104s
Comparison:
so_k_nucleotide
3.2.1: 2.0 i/s
3.1.2: 1.9 i/s - 1.08x slower
3.0.5: 1.7 i/s - 1.18x slower
2.7.5: 1.7 i/s - 1.22x slower
</code></pre>
<p>The method to count the frequency looks something like this</p>
<pre><code> def word_tally(seq, length)
words = []
n = seq.length - length + 1
(0 ... length).each do |f|
(f ... n).step(length) do |i|
words << seq[i,length]
end
end
table = words.tally
[n,table]
end
</code></pre>
<p>In some of the iterations that I played around with, the slow down seems to be coming from the incrementing of the value of a particular key.</p>
<ul>
<li>taking the substring</li>
<li>accessing the value of that key from the Hash</li>
</ul> Ruby master - Bug #19438: Ruby 2.7 -> 3.2 Performance Regression in so_k_nucleotide benchmarkhttps://bugs.ruby-lang.org/issues/19438?journal_id=1021432023-03-04T09:44:55Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><blockquote>
<p>If you agree that that is what is being benchmarked, one thing we can do is change the benchmark to use tally instead of manually incrementing the count.</p>
</blockquote>
<p>So this is a benchmark that was popularized by the "benchmark-game", if the goal is to look better on that benchmark (I personally don't care for it) then yes that is a solution.</p>
<p><a href="https://benchmarksgame-team.pages.debian.net/benchmarksgame/description/knucleotide.html#knucleotide" class="external">https://benchmarksgame-team.pages.debian.net/benchmarksgame/description/knucleotide.html#knucleotide</a></p>
<p>However I think the interesting part of this issue, would be to figure out why that specific implementation got slower over time.</p>
<p>It would be interesting to break down the algorithm in smaller pieces and benchmark them idependently to see which operation exactly got slower.</p> Ruby master - Bug #19438: Ruby 2.7 -> 3.2 Performance Regression in so_k_nucleotide benchmarkhttps://bugs.ruby-lang.org/issues/19438?journal_id=1021592023-03-06T18:10:06Znickborromeo (Nick Borromeo)
<ul></ul><blockquote>
<p>However I think the interesting part of this issue, would be to figure out why that specific implementation got slower over time.</p>
</blockquote>
<p>I definitely agree!</p>
<blockquote>
<p>It would be interesting to break down the algorithm in smaller pieces and benchmark them idependently to see which operation exactly got slower.</p>
</blockquote>
<p>before I did the change to <code>tally</code> I also broke it down to smaller steps and ran the benchmark, apologies for the wall of text to follow; a direct quote from a discussion I had with others looking at this</p>
<hr>
<p>I may have found something interesting while playing around with the benchmark tool. I wanted to see if we can narrow it down to a specific thing that might be slow. so I got rid of the the <code>sort_by_freq</code> and <code>find_seq</code> methods for now and focused mostly on the <code>frecuency</code> method since stackprof identified that as the <em>slowest</em> one</p>
<p>Just so we have a baseline here is the code and the benchmark of the <code>frecuency</code> method untouched</p>
<pre><code> def frecuency(seq, length)
frequencies = Hash.new(0)
ns = seq.length + 1 - length
for i in (0 ... ns)
frequencies[seq[i, length]] += 1
end
[ns, frequencies]
end
</code></pre>
<pre><code>Calculating -------------------------------------
2.7.5 3.0.5 3.1.2 3.2.1
so_k_nucleotide 7.530 6.524 6.056 6.294 i/s - 1.000 times in 0.132808s 0.153277s 0.165117s 0.158884s
Comparison:
so_k_nucleotide
2.7.5: 7.5 i/s
3.0.5: 6.5 i/s - 1.15x slower
3.2.1: 6.3 i/s - 1.20x slower
3.1.2: 6.1 i/s - 1.24x slower
</code></pre>
<p>Now lets look at <code>String#[]</code> as the only operation</p>
<pre><code> def frecuency(seq, length)
frequencies = Hash.new(0)
ns = seq.length + 1 - length
for i in (0 ... ns)
seq[i, length]
# frequencies[seq[i, length]] += 1
end
[ns, frequencies]
end
</code></pre>
<pre><code>❯ benchmark-driver so_k_nucleotide.yml --rbenv '2.7.5;3.0.5;3.1.2;3.2.1' --repeat-count 5
Calculating -------------------------------------
2.7.5 3.0.5 3.1.2 3.2.1
so_k_nucleotide 14.000 14.225 13.797 14.096 i/s - 1.000 times in 0.071427s 0.070300s 0.072481s 0.070944s
Comparison:
so_k_nucleotide
3.0.5: 14.2 i/s
3.2.1: 14.1 i/s - 1.01x slower
2.7.5: 14.0 i/s - 1.02x slower
3.1.2: 13.8 i/s - 1.03x slower
</code></pre>
<p>Now lets look at <code>String#[]</code> and <code>Hash#[]=</code></p>
<pre><code> def frecuency(seq, length)
frequencies = Hash.new(0)
ns = seq.length + 1 - length
for i in (0 ... ns)
frequencies[seq[i, length]]
end
[ns, frequencies]
end
</code></pre>
<pre><code>Calculating -------------------------------------
2.7.5 3.0.5 3.1.2 3.2.1
so_k_nucleotide 12.516 12.248 12.058 13.034 i/s - 1.000 times in 0.079900s 0.081645s 0.082931s 0.076725s
Comparison:
so_k_nucleotide
3.2.1: 13.0 i/s
2.7.5: 12.5 i/s - 1.04x slower
3.0.5: 12.2 i/s - 1.06x slower
3.1.2: 12.1 i/s - 1.08x slower
</code></pre>
<p>However the interesting bit is when we assign things to variables or when we increment the counter for a hash value which I think is <code>Hash#[]=</code></p>
<pre><code> def frecuency(seq, length)
frequencies = Hash.new(0)
ns = seq.length + 1 - length
for i in (0 ... ns)
x = seq[i, length]
end
[ns, frequencies]
end
</code></pre>
<pre><code>❯ benchmark-driver so_k_nucleotide.yml --rbenv '2.7.5;3.0.5;3.1.2;3.2.1' --repeat-count 5
Calculating -------------------------------------
2.7.5 3.0.5 3.1.2 3.2.1
so_k_nucleotide 13.641 12.170 11.857 12.001 i/s - 1.000 times in 0.073306s 0.082168s 0.084335s 0.083325s
Comparison:
so_k_nucleotide
2.7.5: 13.6 i/s
3.0.5: 12.2 i/s - 1.12x slower
3.2.1: 12.0 i/s - 1.14x slower
3.1.2: 11.9 i/s - 1.15x slower
</code></pre>
<p>This is the same as the baseline in the start of the comment</p>
<pre><code>❯ benchmark-driver so_k_nucleotide.yml --rbenv '2.7.5;3.0.5;3.1.2;3.2.1' --repeat-count 5
Calculating -------------------------------------
2.7.5 3.0.5 3.1.2 3.2.1
so_k_nucleotide 7.626 6.474 6.135 6.239 i/s - 1.000 times in 0.131126s 0.154468s 0.163005s 0.160283s
Comparison:
so_k_nucleotide
2.7.5: 7.6 i/s
3.0.5: 6.5 i/s - 1.18x slower
3.2.1: 6.2 i/s - 1.22x slower
3.1.2: 6.1 i/s - 1.24x slower
</code></pre>
<p>I think we might need to look into the assignment methods instead of the access methods for string and hash 🤔</p>