Actions

Copy link

Bug #20301

open

`Set#add?` does two hash look-ups

Added by AMomchilov (Alexander Momchilov) over 1 year ago. Updated over 1 year ago.

Status:

Open

Assignee:

Target version:

ruby -v:

Backport:

3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN

[ruby-core:116941]

Description

A common usage of Sets is to keep track of seen objects, and do something different whenever an object is seen for the first time, e.g.:

SEEN_VALUES = Set.new
	
def receive_value(value)
	if SEEN_VALUES.add?(value)
		puts "Saw #{value} for the first time."
	else
		puts "Already seen #{value}, ignoring."
	end
end

receive_value(1) # Saw 1 for the first time.
receive_value(2) # Saw 2 for the first time.
receive_value(3) # Saw 3 for the first time.
receive_value(1) # Already seen 1, ignoring.

Readers might reasonably assume that add? is only looking up into the set a single time, but it's actually doing two separate look-ups! (source)

class Set
  def add?(o
    # 1. `include?(o)` looks up into `@hash`
    # 2. if the value isn't there, `add(o)` does a second look-up into `@hash`
    add(o) unless include?(o)
  end
end

This gets especially expensive if the values are large hash/arrays/objects, whose #hash is expensive to compute.

We can optimize this if it was possible to set a value in hash, and retrieve the value that was already there, in a single go. I propose adding Hash#exchange_value to do exactly that. If that existed, we can re-implement #add? as:

class Set
  def add?(o)
    # Only requires a single look-up into `@hash`!
    self unless @hash.exchange_value(o, true)
  end

Here's a proof-of-concept implementation: https://github.com/ruby/ruby/pull/10093

Theory¶

How much of a benefit this has depends on 2 factors:

How much #hash is called, which depends on how many new objects are added to the set.
- If every object is new, then #hash used to be called twice on every #add?.
  - This is where this improvement makes the biggest (2x!) change.
- If every object has already been seen, then #hash was never being called twice before anyway, so there would be no improvement.
  - It's important to not regress in this case, because many use cases of sets don't deal with many distinct objects, but just need to do quick checks against an existing set.
- Every other case lies somewhere in between those two, depending on the % of objects which are new.
How slow #hash is to compute for the key
- If the hash is slow to compute, this change will make a bigger improvement
- If the hash value is fast to compute, then it won't matter as much. Even if we called it half as much, it's a minority of the total time, so it won't have much net impact.

Benchmark summary¶

	All objects are new	All objects are preexisting
objects with slow `#hash`	100.0%	~0.0%
objects with fast `#hash`	24.5%	4.6%

As we see, this change makes a huge improvement the cases where it helps, and crucially, doesn't slow down the cases where it can't.

For the complete benchmark source code and results, see the PR: https://github.com/ruby/ruby/pull/10093

Related issues 1 (1 open — 0 closed)

Actions

Copy link

Also available in: Atom PDF

Like2

Like0Like0Like0Like1Like0Like0Like0Like0Like0Like0Like0Like0Like0

Project

General

Profile

Ruby

Tags

Custom queries

Bug #20301

`Set#add?` does two hash look-ups

Theory¶

Benchmark summary¶

Updated by AMomchilov (Alexander Momchilov) over 1 year ago

Updated by AMomchilov (Alexander Momchilov) over 1 year ago

Updated by Dan0042 (Daniel DeLorme) over 1 year ago · Edited

Updated by Eregon (Benoit Daloze) over 1 year ago

Updated by Eregon (Benoit Daloze) over 1 year ago

Updated by AMomchilov (Alexander Momchilov) over 1 year ago

Updated by AMomchilov (Alexander Momchilov) over 1 year ago

Updated by shyouhei (Shyouhei Urabe) over 1 year ago

Updated by nobu (Nobuyoshi Nakada) over 1 year ago

Updated by shyouhei (Shyouhei Urabe) over 1 year ago

Updated by Eregon (Benoit Daloze) over 1 year ago

Updated by shyouhei (Shyouhei Urabe) over 1 year ago

Updated by AMomchilov (Alexander Momchilov) over 1 year ago