Project

General

Profile

Bug #16121

Stop making a redundant hash copy in Hash#dup

Added by dylants (Dylan Thacker-Smith) 26 days ago. Updated 25 days ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.7.0dev (2019-08-23T16:41:09Z master b38ab0a3a9) [x86_64-darwin18]
[ruby-core:94507]

Description

Problem

I noticed while profiling object allocations that Hash#dup was allocating 2 objects instead of only 1 as expected. I looked for alternatives for comparison and found that Hash[hash] created a copy with only a single object allocation and seemed to be more than twice as fast. Reading the source code revealed the difference was that Hash#dup creates a copy of the Hash, then rehashes the copy. However, rehashing is done by making a copy of the hash, so the first copy before rehashing was unnecessary.

Solution

I changed the code to just use rehashing to make the copy of the hash to improve performance while also preserving the existing behaviour.

Benchmark

require 'benchmark'

N = 100000

def report(x, name)
  x.report(name) do
    N.times do
      yield
    end
  end
end

hashes = {
  small_hash: { a: 1 },
  larger_hash: 20.times.map { |i| [('a'.ord + i).chr.to_sym, i] }.to_h
}

Benchmark.bmbm do |x|
  hashes.each do |name, hash|
    report(x, "#{name}.dup") do
      hash.dup
    end
  end
end

results on master

                      user     system      total        real
small_hash.dup    0.401350   0.001638   0.402988 (  0.404608)
larger_hash.dup   7.218548   0.433616   7.652164 (  7.695990)

results with the attached patch

                      user     system      total        real
small_hash.dup    0.336733   0.002425   0.339158 (  0.341760)
larger_hash.dup   6.617343   0.398407   7.015750 (  7.070282)

Files

0001-Remove-redundant-Check_Type-after-to_hash.diff.txt (624 Bytes) 0001-Remove-redundant-Check_Type-after-to_hash.diff.txt [PATCH 1/4] Remove redundant Check_Type after to_hash dylants (Dylan Thacker-Smith), 08/23/2019 07:55 PM
0002-Fix-freeing-and-clearing-destination-hash-in-Hash.diff.txt (1.57 KB) 0002-Fix-freeing-and-clearing-destination-hash-in-Hash.diff.txt [PATCH 2/4] Fix freeing and clearing destination hash in Hash#initialize_copy dylants (Dylan Thacker-Smith), 08/23/2019 07:55 PM
0003-Remove-dead-code-paths-in-rb_hash_initialize_copy.diff.txt (1.12 KB) 0003-Remove-dead-code-paths-in-rb_hash_initialize_copy.diff.txt [PATCH 3/4] Remove dead code paths in rb_hash_initialize_copy dylants (Dylan Thacker-Smith), 08/23/2019 07:55 PM
0004-Stop-making-a-redundant-hash-copy-in-Hash-dup.diff.txt (1.35 KB) 0004-Stop-making-a-redundant-hash-copy-in-Hash-dup.diff.txt [PATCH 4/4] Stop making a redundant hash copy in Hash#dup dylants (Dylan Thacker-Smith), 08/23/2019 07:55 PM

History

Updated by dylants (Dylan Thacker-Smith) 25 days ago

I split up the patch into the 4 attached patches, because I realized there were a few changes that could be emphasized independently, each with their own description of the change. Hopefully that will make the optimization itself in the last patch easier to review.

Technically that makes this a bug fix, since "Patch 2/4" does fix a memory leak in the following contrived code

h = 9.times.map { |i| [i, i] }.to_h
h.send(:initialize_copy, {})

where the st_table for the hash h won't get freed with st_free_table because it doesn't try to call that on the if (RHASH_AR_TABLE_P(hash2)) { conditional block in rb_hash_initialize_copy. Mostly I didn't want to ignore it and turn into undefined behaviour from the optimization.

#2

Updated by dylants (Dylan Thacker-Smith) 25 days ago

  • File deleted (0001-Stop-making-a-redundant-hash-copy-in-Hash-dup.diff.txt)

Also available in: Atom PDF