Project

General

Profile

Actions

Bug #16237

closed

Set#union performance issue

Added by lionelperrin (Lionel Perrin) about 2 years ago. Updated about 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.6.2p47 (2019-03-13 revision 67232) [x86_64-linux]
[ruby-core:95221]

Description

I've discovered a very large difference in performances between Set#union and the use of a hash to compute this union.
The issue can be shown with the code below.

Here are the benchmark results:

       user     system      total        real
hash  0.562500   0.078125   0.640625 (  0.639528)
uniq 54.375000   0.312500  54.687500 ( 54.772007)
set 37.312500   0.125000  37.437500 ( 37.474751)
mix1 43.718750   0.000000  43.718750 ( 43.890027)
mix2 37.109375   0.000000  37.109375 ( 37.190839)

The code example:

require 'benchmark'
require 'set'

REPEAT = 1_000

$DISTINCT_COUNT = 100_000_000

def random_array(size=1_000)
  Array.new(size) { Random.rand(1..$DISTINCT_COUNT) }
end

def uniq
  ret = []
  REPEAT.times do
    ret += random_array
    ret.uniq!
  end
  ret
end

def set
  ret = Set.new
  REPEAT.times do
    ret += Set.new(random_array)
  end
  ret
end

def mix1
  ret = Set.new
  REPEAT.times do
    ret += random_array
  end
  ret
end

def mix2
  ret = Set.new
  REPEAT.times do
    ret += random_array.uniq
  end
  ret
end

def hash
  ret = {}
  REPEAT.times do
    random_array.each { |v| ret[v] = nil }
  end
  ret.keys
end

Benchmark.bm do |x|
  %i(hash uniq set mix1 mix2).each do |i|
    x.report(i, &method(i))
  end
end

Updated by mame (Yusuke Endoh) about 2 years ago

  • Status changed from Open to Closed

This is a spec. The reason is because Set#union duplicates self and then merges the argument, which makes your "set" implementation O( n2 ). You may want to use Set#merge instead of Set#union in this case.

def set2
  ret = Set.new
  REPEAT.times do
    ret.merge(Set.new(random_array))
  end
  ret
end
       user     system      total        real
hash  0.404887   0.011980   0.416867 (  0.416911)
set2  0.468687   0.004019   0.472706 (  0.473242)
Actions

Also available in: Atom PDF