Project

General

Profile

Actions

Misc #21833

open

Switch default hash from SipHash13 to XXH3?

Misc #21833: Switch default hash from SipHash13 to XXH3?

Added by samyron (Scott Myron) about 14 hours ago. Updated about 2 hours ago.

Status:
Open
Assignee:
-
[ruby-core:124478]

Description

Has there been any consideration switching to some other hash implementation?

I've searched through the issues and haven't found anything related to switching the default hash from SipHash13 to anything else.

I created a branch which switched rb_memhash from SipHash13 to XXH3.

I created a few simple benchmarks and ran them on my M1 Macbook Air. The results are very promising.

% cat ~/string_hash.yml 
prelude: |
  # Generate sets of short vs medium strings
  TINY_STRINGS = Array.new(100) { Array.new(3).map { (97 + rand(26)).chr }.join }.freeze
  SMALL_STRINGS = Array.new(100) { Array.new(8).map { (97 + rand(26)).chr }.join }.freeze
  MED_STRINGS  = Array.new(100) { Array.new(20).map { (97 + rand(26)).chr }.join }.freeze
  LARGE_STRINGS  = Array.new(100) { Array.new(200).map { (97 + rand(26)).chr }.join }.freeze
  HUGE_STRINGS  = Array.new(100) { Array.new(65536).map { (97 + rand(26)).chr }.join }.freeze

benchmark:
  tiny_strings: |
    TINY_STRINGS.each { |s| s.hash }
  
  small_strings: |
    SMALL_STRINGS.each { |s| s.hash }

  medium_strings: |
    MED_STRINGS.each { |s| s.hash }

  large_strings: |
    LARGE_STRINGS.each { |s| s.hash }

  huge_strings: |
    HUGE_STRINGS.each { |s| s.hash 

% benchmark-driver ~/string_hash.yml \
  -e ruby-master::~/.rubies/ruby-master/bin/ruby \
  -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby  \
  --output compare
Warming up --------------------------------------
        tiny_strings    262.513k i/s -    283.844k times in 1.081258s (3.81μs/i)
       small_strings    259.803k i/s -    280.445k times in 1.079454s (3.85μs/i)
      medium_strings    249.553k i/s -    267.531k times in 1.072041s (4.01μs/i)
       large_strings    116.426k i/s -    126.005k times in 1.082275s (8.59μs/i)
        huge_strings     498.481 i/s -     500.000 times in 1.003047s (2.01ms/i)
Calculating -------------------------------------
                     ruby-master  ruby-xxhash 
        tiny_strings    264.070k     288.960k i/s -    787.538k times in 2.982305s 2.725421s
       small_strings    259.941k     286.229k i/s -    779.407k times in 2.998394s 2.723019s
      medium_strings    249.249k     283.952k i/s -    748.658k times in 3.003655s 2.636561s
       large_strings    116.572k     240.823k i/s -    349.278k times in 2.996244s 1.450351s
        huge_strings     500.164       5.296k i/s -      1.495k times in 2.989019s 0.282263s

Comparison:
                     tiny_strings
         ruby-xxhash:    288960.1 i/s 
         ruby-master:    264070.2 i/s - 1.09x  slower

                    small_strings
         ruby-xxhash:    286229.0 i/s 
         ruby-master:    259941.5 i/s - 1.10x  slower

                   medium_strings
         ruby-xxhash:    283952.5 i/s 
         ruby-master:    249249.0 i/s - 1.14x  slower

                    large_strings
         ruby-xxhash:    240823.1 i/s 
         ruby-master:    116571.9 i/s - 2.07x  slower

                     huge_strings
         ruby-xxhash:      5296.5 i/s 
         ruby-master:       500.2 i/s - 10.59x  slower

Running something a bit more real-world:

% cat ~/json_parse.yml 
prelude: |
  require 'json'
  activitypub_json_txt = File.read("/Users/scott/Development/json/benchmark/data/activitypub.json")
  twitter_json_txt = File.read("/Users/scott/Development/json/benchmark/data/twitter.json")
  citm_catalog_json_txt = File.read("/Users/scott/Development/json/benchmark/data/citm_catalog.json")
  ohai_json_txt = File.read("/Users/scott/Development/json/benchmark/data/ohai.json")

benchmark:
  parse_activitypub_json: |
    JSON.parse(activitypub_json_txt)
  parse_twitter_json_txt: |
    JSON.parse(twitter_json_txt)
  parse_citm_catalog_json_txt: |
    JSON.parse(citm_catalog_json_txt)
  parse_ohai_json_txt: |
    JSON.parse(ohai_json_txt)

% benchmark-driver ~/json_parse.yml \
  -e ruby-master::~/.rubies/ruby-master/bin/ruby \
  -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby  \
  --output compare
Warming up --------------------------------------
     parse_activitypub_json     10.969k i/s -     12.023k times in 1.096043s (91.16μs/i)
     parse_twitter_json_txt      1.169k i/s -      1.265k times in 1.082330s (855.60μs/i)
parse_citm_catalog_json_txt     591.782 i/s -     600.000 times in 1.013887s (1.69ms/i)
        parse_ohai_json_txt     12.000k i/s -     12.782k times in 1.065168s (83.33μs/i)
Calculating -------------------------------------
                            ruby-master  ruby-xxhash 
     parse_activitypub_json     10.986k      11.071k i/s -     32.908k times in 2.995440s 2.972542s
     parse_twitter_json_txt      1.162k       1.172k i/s -      3.506k times in 3.016331s 2.991486s
parse_citm_catalog_json_txt     588.758      601.926 i/s -      1.775k times in 3.014820s 2.948868s
        parse_ohai_json_txt     10.747k      12.400k i/s -     35.999k times in 3.349753s 2.903138s

Comparison:
                  parse_activitypub_json
                ruby-xxhash:     11070.7 i/s 
                ruby-master:     10986.0 i/s - 1.01x  slower

                  parse_twitter_json_txt
                ruby-xxhash:      1172.0 i/s 
                ruby-master:      1162.3 i/s - 1.01x  slower

             parse_citm_catalog_json_txt
                ruby-xxhash:       601.9 i/s 
                ruby-master:       588.8 i/s - 1.02x  slower

                     parse_ohai_json_txt
                ruby-xxhash:     12400.0 i/s 
                ruby-master:     10746.8 i/s - 1.15x  slower

Admittedly, I'm not a hash expert nor a cryptographer. There doesn't seem to be any known vulnerabilities with XXH3 that I have found.


Related issues 2 (0 open2 closed)

Related to Ruby - Feature #16851: Ruby hashing algorithm could be improved using Tabulation HashingFeedbackActions
Related to Ruby - Feature #13017: Switch SipHash from SipHash24 to SipHash13ClosedActions
Actions

Also available in: PDF Atom