Project

General

Profile

Feature #15281

Speed up Set#intersect with size check.

Added by RGBD (Oleg Zubchenko) 17 days ago. Updated 14 days ago.

Status:
Open
Priority:
Normal
Target version:
-
[ruby-core:89696]

Description

Current implementation computes set intersection s1 & s2 in O(s2.size) time.
It can be reduced to O([s1.size, s2.size].min) time.

Additional speedup comes from using #each instead of #do_with_enum.

See files attached for benchmarks.

Pull Request

P.S. using benchmark-ips gem

intersect.rb (1.91 KB) intersect.rb RGBD (Oleg Zubchenko), 11/03/2018 12:57 PM
intersect_standalone.rb (671 Bytes) intersect_standalone.rb RGBD (Oleg Zubchenko), 11/03/2018 12:58 PM
benchmark_results.txt (3.68 KB) benchmark_results.txt RGBD (Oleg Zubchenko), 11/03/2018 01:19 PM

History

#1 Updated by RGBD (Oleg Zubchenko) 17 days ago

  • Description updated (diff)

#2 Updated by RGBD (Oleg Zubchenko) 17 days ago

  • Description updated (diff)

#4 [ruby-core:89697] Updated by marcandre (Marc-Andre Lafortune) 17 days ago

Thanks for the patch.

Sadly, I don't think we can do that as Sets are ordered. That optimization would change the order of the resulting Set in some cases.

#5 [ruby-core:89699] Updated by RGBD (Oleg Zubchenko) 17 days ago

marcandre (Marc-Andre Lafortune) wrote:

Thanks for the patch.

Sadly, I don't think we can do that as Sets are ordered. That optimization would change the order of the resulting Set in some cases.

Where in the documentation can I read that?

this says order is uncertain. I thought it means noone should make assumptions about it.
Top of the page states:

Set implements a collection of unordered values with no duplicates. This is a hybrid of Array's intuitive inter-operation facilities and Hash's fast lookup.

#6 [ruby-core:89700] Updated by marcandre (Marc-Andre Lafortune) 17 days ago

  • Assignee set to matz (Yukihiro Matsumoto)

RGBD (Oleg Zubchenko) wrote:

Set implements a collection of unordered values with no duplicates. This is a hybrid of Array's intuitive inter-operation facilities and Hash's fast lookup.

Good point.

Note that this documentation is 15 years old and predates Hash being ordered.

I'd be enclined to say that Set should be officially ordered, even if "mathematically" speaking that doesn't make sense. I'm assigning this to Matz.

#7 [ruby-core:89704] Updated by duerst (Martin Dürst) 17 days ago

marcandre (Marc-Andre Lafortune) wrote:

I'd be enclined to say that Set should be officially ordered, even if "mathematically" speaking that doesn't make sense. I'm assigning this to Matz.

I'd be inclined to say that Set should be officially UNordered, because they are mathematically unordered.

#8 [ruby-core:89733] Updated by ahorek (Pavel Rosický) 14 days ago

we can do that as Sets are ordered

IMO if they are, it's an implementation detail that you shouldn't rely on. Also there's more room to optimize unordered sets.

if you need an ordered (or indexed?) set, it should be a subclass that implements this behaviour on the top of the generic unordered set. Something like https://ruby-doc.org/stdlib-2.5.3/libdoc/set/rdoc/SortedSet.html

Also available in: Atom PDF