Project

General

Profile

Actions

Bug #19438

open

Ruby 2.7 -> 3.2 Performance Regression in so_k_nucleotide benchmark

Added by nick.schwaderer (Nicholas Schwaderer) almost 2 years ago. Updated almost 2 years ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:112431]

Description

Introduction

Recently I had been going through some of the old benchmarks in the Ruby Great Implementation Shootout from around 2010.

As an experiment, one night I ran the benchmarks against Ruby 3.2.0, Ruby 3.2.0 --yjit, TruffleRuby, TruffleRuby +GraalVM, and Ruby 2.6.10.

Most results were as expected. However there was a benchmark that Ruby 2.6.10 was consistently outperforming all new Rubies on.

Method

After pairing with @eightbitraptor (Matt V-H), we discovered that this old benchmark was remarkably similar to an existing benchmark in the /benchmark
directory, so_k_nucleotide.yml. We decided to go with that benchmark. For brevity I have not included the full 150 lines of the benchmark here.

I tested this benchmark out with 100 runs using benchmark-driver against Ruby 2.7,3.0,3.1,3.2. (I had discovered that 2.7 was even faster than 2.6.).

It appears that about half of the regression occured from 2.7 -> 3.0; the other half from 3.0 -> 3.2. One other interesting finding is that each minor version does appear to regress
from the last, even if a little.

Code

This is my benchmark running code and harnass. The full code and data can be found here

RUNS = 100

results = Hash.new { |h, k| h[k] = [] }
RUNS.times do |i|
  puts i
  run = `benchmark-driver so_k_nucleotide.yml --chruby '2.7.5;3.0.5;3.1.3;3.2.0' -o simple`
  run.scan(/\d\.\d\.\d/).each_with_index do |version, index|
    results[version] << run.scan(/\d\.\d\d\d/)[index]
  end
end

require 'csv'

columns = results.keys
outdata = CSV.generate do |csv|
  csv << columns
  RUNS.times do |i|
    csv << columns.map { |c| results[c][i] }
  end
end

File.write("output.csv", outdata)

Data

Ruby 2.7.5 was consistently ~18-20% faster than Ruby 3.2.0 in this Benchmark

Screenshot 2023-02-15 at 13 16 10

Next Steps

I am happy to help investigate or learn more about this regression if anyone has any ideas.

Updated by byroot (Jean Boussier) almost 2 years ago

  • Subject changed from Ruby 2.7 -> 3.2 Performance Regression to Ruby 2.7 -> 3.2 Performance Regression in CSV

CSV being a gem, it's very likely that the regression isn't in ruby itself, but in the successive versions of the csv gem being bundled with Ruby:

ruby 3.2 => csv 3.2.6
ruby 3.1 => csv 3.2.2
ruby 3.0 => csv 3.1.9
ruby 2.7 => csv 3.1.2
ruby 2.6 => csv 3.0.9

So if you wish to investigate this further, the best start would be to benchmark csv 3.0.9 on Ruby 3.2, and csv 3.2.6 on Ruby 2.7 or 2.6.

I'm not very familiar with the CSV gem, but CSV as a format is losely defined, so I really wouldn't be surprised if over the years various fixes were added to handle corner cases and that it would impact performance.

Updated by byroot (Jean Boussier) almost 2 years ago

  • Subject changed from Ruby 2.7 -> 3.2 Performance Regression in CSV to Ruby 2.7 -> 3.2 Performance Regression in so_k_nucleotide benchmark

Oh, I'm sorry, I misread your benchmark and thought it was a CSV based benchmark, I now see it's radically different, please disregard my previous answer.

Updated by nickborromeo (Nick Borromeo) almost 2 years ago

I took a look at the benchmark, and I was wondering what exactly it was trying to benchmark. It looks like its benchmarking the ability to sum up the word count of a given substring. If you agree that that is what is being benchmarked, one thing we can do is change the benchmark to use tally instead of manually incrementing the count. I ran the benchmark with this change and got the following result.

❯ benchmark-driver so_k_nucleotide.yml --rbenv '2.7.5;3.0.5;3.1.2;3.2.1' --repeat-count 5
Calculating -------------------------------------
                          2.7.5       3.0.5       3.1.2       3.2.1
     so_k_nucleotide      1.671       1.728       1.893       2.045 i/s -       1.000 times in 0.598269s 0.578549s 0.528261s 0.489104s

Comparison:
                  so_k_nucleotide
               3.2.1:         2.0 i/s
               3.1.2:         1.9 i/s - 1.08x  slower
               3.0.5:         1.7 i/s - 1.18x  slower
               2.7.5:         1.7 i/s - 1.22x  slower

The method to count the frequency looks something like this

      def word_tally(seq, length)
        words = []
        n =  seq.length - length + 1
        (0 ... length).each do |f|
            (f ... n).step(length) do |i|
                words << seq[i,length]
            end
        end
        table = words.tally

        [n,table]
      end

In some of the iterations that I played around with, the slow down seems to be coming from the incrementing of the value of a particular key.

  • taking the substring
  • accessing the value of that key from the Hash

Updated by byroot (Jean Boussier) almost 2 years ago

If you agree that that is what is being benchmarked, one thing we can do is change the benchmark to use tally instead of manually incrementing the count.

So this is a benchmark that was popularized by the "benchmark-game", if the goal is to look better on that benchmark (I personally don't care for it) then yes that is a solution.

https://benchmarksgame-team.pages.debian.net/benchmarksgame/description/knucleotide.html#knucleotide

However I think the interesting part of this issue, would be to figure out why that specific implementation got slower over time.

It would be interesting to break down the algorithm in smaller pieces and benchmark them idependently to see which operation exactly got slower.

Updated by nickborromeo (Nick Borromeo) almost 2 years ago

However I think the interesting part of this issue, would be to figure out why that specific implementation got slower over time.

I definitely agree!

It would be interesting to break down the algorithm in smaller pieces and benchmark them idependently to see which operation exactly got slower.

before I did the change to tally I also broke it down to smaller steps and ran the benchmark, apologies for the wall of text to follow; a direct quote from a discussion I had with others looking at this


I may have found something interesting while playing around with the benchmark tool. I wanted to see if we can narrow it down to a specific thing that might be slow. so I got rid of the the sort_by_freq and find_seq methods for now and focused mostly on the frecuency method since stackprof identified that as the slowest one

Just so we have a baseline here is the code and the benchmark of the frecuency method untouched

      def frecuency(seq, length)
        frequencies = Hash.new(0)
        ns          = seq.length + 1 - length

        for i in (0 ... ns)
          frequencies[seq[i, length]] += 1
        end

        [ns, frequencies]
      end
Calculating -------------------------------------
                          2.7.5       3.0.5       3.1.2       3.2.1
     so_k_nucleotide      7.530       6.524       6.056       6.294 i/s -       1.000 times in 0.132808s 0.153277s 0.165117s 0.158884s

Comparison:
                  so_k_nucleotide
               2.7.5:         7.5 i/s
               3.0.5:         6.5 i/s - 1.15x  slower
               3.2.1:         6.3 i/s - 1.20x  slower
               3.1.2:         6.1 i/s - 1.24x  slower

Now lets look at String#[] as the only operation

      def frecuency(seq, length)
        frequencies = Hash.new(0)
        ns          = seq.length + 1 - length

        for i in (0 ... ns)
          seq[i, length]
          # frequencies[seq[i, length]] += 1
        end

        [ns, frequencies]
      end
❯ benchmark-driver so_k_nucleotide.yml --rbenv '2.7.5;3.0.5;3.1.2;3.2.1' --repeat-count 5
Calculating -------------------------------------
                          2.7.5       3.0.5       3.1.2       3.2.1
     so_k_nucleotide     14.000      14.225      13.797      14.096 i/s -       1.000 times in 0.071427s 0.070300s 0.072481s 0.070944s

Comparison:
                  so_k_nucleotide
               3.0.5:        14.2 i/s
               3.2.1:        14.1 i/s - 1.01x  slower
               2.7.5:        14.0 i/s - 1.02x  slower
               3.1.2:        13.8 i/s - 1.03x  slower

Now lets look at String#[] and Hash#[]=

      def frecuency(seq, length)
        frequencies = Hash.new(0)
        ns          = seq.length + 1 - length

        for i in (0 ... ns)
          frequencies[seq[i, length]]
        end

        [ns, frequencies]
      end
Calculating -------------------------------------
                          2.7.5       3.0.5       3.1.2       3.2.1
     so_k_nucleotide     12.516      12.248      12.058      13.034 i/s -       1.000 times in 0.079900s 0.081645s 0.082931s 0.076725s

Comparison:
                  so_k_nucleotide
               3.2.1:        13.0 i/s
               2.7.5:        12.5 i/s - 1.04x  slower
               3.0.5:        12.2 i/s - 1.06x  slower
               3.1.2:        12.1 i/s - 1.08x  slower

However the interesting bit is when we assign things to variables or when we increment the counter for a hash value which I think is Hash#[]=

      def frecuency(seq, length)
        frequencies = Hash.new(0)
        ns          = seq.length + 1 - length

        for i in (0 ... ns)
          x = seq[i, length]
        end

        [ns, frequencies]
      end
❯ benchmark-driver so_k_nucleotide.yml --rbenv '2.7.5;3.0.5;3.1.2;3.2.1' --repeat-count 5
Calculating -------------------------------------
                          2.7.5       3.0.5       3.1.2       3.2.1
     so_k_nucleotide     13.641      12.170      11.857      12.001 i/s -       1.000 times in 0.073306s 0.082168s 0.084335s 0.083325s

Comparison:
                  so_k_nucleotide
               2.7.5:        13.6 i/s
               3.0.5:        12.2 i/s - 1.12x  slower
               3.2.1:        12.0 i/s - 1.14x  slower
               3.1.2:        11.9 i/s - 1.15x  slower

This is the same as the baseline in the start of the comment

❯ benchmark-driver so_k_nucleotide.yml --rbenv '2.7.5;3.0.5;3.1.2;3.2.1' --repeat-count 5
Calculating -------------------------------------
                          2.7.5       3.0.5       3.1.2       3.2.1
     so_k_nucleotide      7.626       6.474       6.135       6.239 i/s -       1.000 times in 0.131126s 0.154468s 0.163005s 0.160283s

Comparison:
                  so_k_nucleotide
               2.7.5:         7.6 i/s
               3.0.5:         6.5 i/s - 1.18x  slower
               3.2.1:         6.2 i/s - 1.22x  slower
               3.1.2:         6.1 i/s - 1.24x  slower

I think we might need to look into the assignment methods instead of the access methods for string and hash 🤔

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0