Feature #15281
openSpeed up Set#intersect with size check.
Added by RGBD (Oleg Zubchenko) about 6 years ago. Updated over 4 years ago.
Description
Current implementation computes set intersection s1 & s2 in O(s2.size) time.
It can be reduced to O([s1.size, s2.size].min) time.
Additional speedup comes from using #each instead of #do_with_enum.
See files attached for benchmarks.
P.S. using benchmark-ips gem
Files
intersect.rb (1.91 KB) intersect.rb | RGBD (Oleg Zubchenko), 11/03/2018 12:57 PM | ||
intersect_standalone.rb (671 Bytes) intersect_standalone.rb | RGBD (Oleg Zubchenko), 11/03/2018 12:58 PM | ||
benchmark_results.txt (3.68 KB) benchmark_results.txt | RGBD (Oleg Zubchenko), 11/03/2018 01:19 PM |
Updated by RGBD (Oleg Zubchenko) about 6 years ago
- File benchmark_results.txt benchmark_results.txt added
Updated by marcandre (Marc-Andre Lafortune) about 6 years ago
Thanks for the patch.
Sadly, I don't think we can do that as Set
s are ordered. That optimization would change the order of the resulting Set
in some cases.
Updated by RGBD (Oleg Zubchenko) about 6 years ago
marcandre (Marc-Andre Lafortune) wrote:
Thanks for the patch.
Sadly, I don't think we can do that as
Set
s are ordered. That optimization would change the order of the resultingSet
in some cases.
Where in the documentation can I read that?
this says order is uncertain. I thought it means noone should make assumptions about it.
Top of the page states:
Set implements a collection of unordered values with no duplicates. This is a hybrid of Array's intuitive inter-operation facilities and Hash's fast lookup.
Updated by marcandre (Marc-Andre Lafortune) about 6 years ago
- Assignee set to matz (Yukihiro Matsumoto)
RGBD (Oleg Zubchenko) wrote:
Set implements a collection of unordered values with no duplicates. This is a hybrid of Array's intuitive inter-operation facilities and Hash's fast lookup.
Good point.
Note that this documentation is 15 years old and predates Hash being ordered.
I'd be enclined to say that Set should be officially ordered, even if "mathematically" speaking that doesn't make sense. I'm assigning this to Matz.
Updated by duerst (Martin Dürst) about 6 years ago
marcandre (Marc-Andre Lafortune) wrote:
I'd be enclined to say that Set should be officially ordered, even if "mathematically" speaking that doesn't make sense. I'm assigning this to Matz.
I'd be inclined to say that Set should be officially UNordered, because they are mathematically unordered.
Updated by ahorek (Pavel Rosický) about 6 years ago
we can do that as Sets are ordered
IMO if they are, it's an implementation detail that you shouldn't rely on. Also there's more room to optimize unordered sets.
if you need an ordered (or indexed?) set, it should be a subclass that implements this behaviour on the top of the generic unordered set. Something like https://ruby-doc.org/stdlib-2.5.3/libdoc/set/rdoc/SortedSet.html
Updated by Eregon (Benoit Daloze) over 5 years ago
I think making Set unordered would be a big breaking change, similar to making Hash unordered (as it was in 1.8).
Maybe people forgot, but I find it pretty bad to work with unordered Hashes,
e.g., every iteration method like each
yields a confusing order changing on potentially every mutation.
I'm also fairly confident there is code out there relying on Set ordering.
This proposal doesn't propose to make Set unordered though, but it would introduce non-determinism in the order of elements returned by Set#&.
That seems not ideal.
Also, this trivial optimization can already be done at the application level if needed, and there the change of ordering is obvious and conscious:
set1, set2 = ...
if set1.size < set2.size
set1 & set2
else
set2 & set1
end
So I don't think it's worth introducing non-determinism/changing order in Set for something that can be expressed so simply in application code, with the advantage of the ordering trade-off being clear if done in application code.
Updated by nobu (Nobuyoshi Nakada) over 5 years ago
- Assignee changed from matz (Yukihiro Matsumoto) to knu (Akinori MUSHA)
The author of set.rb is knu.
Updated by nobu (Nobuyoshi Nakada) over 5 years ago
- Status changed from Open to Assigned
Updated by knu (Akinori MUSHA) over 5 years ago
It is clearly stated that Set is an unordered collection and it doesn't even guarantee any determinacy about the order of elements, and as you know, it's just it happened to become ordered and deterministic when the underlying data structure (Hash) got an order.
That being said, I can understand the need for an ordered set if there are already some users using Set as such. We could probably introduce OrderedSet for easier migration before making this "breaking" change (and maybe some others to follow) to Set, I guess?
Updated by sawa (Tsuyoshi Sawada) over 5 years ago
There is a mathematical term "ordered set", which means a different thing: an algebra in which the elements are given intrinsic order.
The concept in question should be named Tuple
or UniqueTuple
.
Updated by chrisseaton (Chris Seaton) over 5 years ago
I'm not sure it should be called Tuple
- to me that implies that elements can appear more than once - that it's a multi-set - and this isn't the case.
Updated by Eregon (Benoit Daloze) over 5 years ago
Just a note, for compatibility there is often a gap between what the documentation states and what Ruby programs assume.
So even though the documentation says it's unordered, I would guess there are programs relying on Sets being ordered since Ruby 1.9.
Maybe not so many programs, but I think we should evaluate before changing.
Updated by marcandre (Marc-Andre Lafortune) over 4 years ago
Apparently the linked PR has been merged last december 31st.
I note there's no NEWS entry.