Feature #16989
openSets: need ♥️
Description
I am opening a series of feature requests on Set
, all of them based on this usecase.
The main usecase I have in mind is my recent experience with RuboCop
. I noticed a big number of frozen arrays being used only to later call include?
on them. This is O(n)
instead of O(1)
.
Trying to convert them to Set
s causes major compatibility issues, as well as very frustrating situations and some cases that would make them much less efficient.
Because of these incompatibilities, RuboCop
is in the process of using a custom class based on Array
with optimized include?
and ===
. RuboCop
runs multiple checks on Ruby code. Those checks are called cops. RuboCop
performance is (IMO) pretty bad and some cops currently are in O(n^2)
where n is the size of the code being inspected. Even given these extremely inefficient cops, optimizing the 100+ such arrays (most of which are quite small btw) gave a 5% speed boost.
RuboCop PRs for reference: https://github.com/rubocop-hq/rubocop-ast/pull/29
https://github.com/rubocop-hq/rubocop/pull/8133
My experience tells me that there are many other opportunities to use Set
s that are missed because Set
s are not builtin, not known enough and have no shorthand notation.
In this issue I'd like to concentrate the discussion on the following request: Set
s should be core objects, in the same way that Complex
were not and are now. Some of the upcoming feature requests would be easier (or only possible) to implement were Set
s builtin.
Updated by marcandre (Marc-Andre Lafortune) over 4 years ago
- Related to Feature #16990: Sets: operators compatibility with Array added
Updated by marcandre (Marc-Andre Lafortune) over 4 years ago
- Related to Feature #16991: Sets: add Set#join added
Updated by marcandre (Marc-Andre Lafortune) over 4 years ago
- Related to Feature #16992: Sets: officially ordered added
Updated by marcandre (Marc-Andre Lafortune) over 4 years ago
- Related to Feature #16993: Sets: from hash keys using Hash#key_set added
Updated by marcandre (Marc-Andre Lafortune) over 4 years ago
- Related to Feature #16994: Sets: shorthand for frozen sets of symbols / strings added
Updated by marcandre (Marc-Andre Lafortune) over 4 years ago
- Related to Feature #16995: Sets: <=> should be specialized added
Updated by nobu (Nobuyoshi Nakada) over 4 years ago
- Status changed from Open to Assigned
- Assignee set to knu (Akinori MUSHA)
Updated by mame (Yusuke Endoh) over 4 years ago
We discussed this issue at the last-month meeting.
https://github.com/ruby/dev-meeting-log/blob/master/DevelopersMeeting20200720Japan.md
matz: Positive to introduce Set into core. But we need to first introduce a set literal. { x, y, z } is good, but JavaScript uses it as another meanings, so it would bring confusion. I have no idea.
knu: As a set library maintainer, I'll first communicate, and try to solve the easy issues that can be sorted out in the library side.
Updated by marcandre (Marc-Andre Lafortune) over 4 years ago
That's great news.
Was there discussion for a frozen set literal of symbols / strings as I proposed in #16994? For sets that need to be mutable or containing different types of objects, I find the existing Set[...]
quite convenient.
Moreover, I would like to avoid having to add a magic comment # frozen_set_literals: true
...
Updated by Dan0042 (Daniel DeLorme) over 4 years ago
matz: Positive to introduce Set into core. But we need to first introduce a set literal. { x, y, z } is good, but JavaScript uses it as another meanings, so it would bring confusion.
I have to ask, is [lack of] similarity with javascript really a problem? Apparently Python uses { x, y, z }
(and set()
for an empty set). #5478 has a lot of discussion on Set literals but ultimately nothing beats the simplicity and obviousness of { x, y, z }
Updated by Eregon (Benoit Daloze) over 4 years ago
IMHO there is no need for a literal Set notation (which would anyway restrict the type of keys, etc).
Some of the upcoming feature requests would be easier (or only possible) to implement were Sets builtin.
@marcandre (Marc-Andre Lafortune) Which feature requests except having a literal notation (#16994) require Set to be builtin? I think only #16993.
I think we could make good progress here by addressing the other feature requests first.
Also moving Set to core makes might make it more complicated to backport to older Rubies.
(Unfortunately Set is only a default gem in 3.0, but it's probably easy enough to require some Set additions with a different require name)
Updated by marcandre (Marc-Andre Lafortune) over 4 years ago
Eregon (Benoit Daloze) wrote in #note-11:
IMHO there is no need for a literal Set notation (which would anyway restrict the type of keys, etc).
I agree. Set[...]
works very well already for constructing sets at runtime out of any time of elements.
Some of the upcoming feature requests would be easier (or only possible) to implement were Sets builtin.
@marcandre (Marc-Andre Lafortune) Which feature requests except having a literal notation (#16994) require Set to be builtin? I think only #16993.
Indeed, the static notation requires a change to the Ruby parser.
#16993 would be easier and faster if builtin, although it is implementable with a transform_values
as I showed in the issue.
A bigger issue (and more important feature imo) is interoperability with Array #16990. If not in core, this would require monkeypatching a bunch of Array
methods if not in core and would also require looping in Ruby so would not be as efficient.
I think we could make good progress here by addressing the other feature requests first.
Also moving Set to core makes might make it more complicated to backport to older Rubies.
If we want to backport to older Rubies, this could still partly be done with a gem, we'll just need some conditional definitions.
Updated by Dan0042 (Daniel DeLorme) over 4 years ago
which would anyway restrict the type of keys
I don't get that part. If you have a literal like { x, y, z }
then x/y/z can be any type of object. There's no restriction. But I agree Set[]
is short and good enough.
Updated by Eregon (Benoit Daloze) over 4 years ago
Also, does moving Set to core mean rewriting it to C?
I think that would be suboptimal, because every implementation would have to maintain its own native implementation then (or reuse the C extension).
It would also hurt readability.
Much of Ruby's performance can be achieved by JIT'ing Ruby code, and Hash is well optimized, so I'm not sure it's a performance gain either.
BTW SortedSet has this rbtree
usage, that seems problematic to move in core (core should not depend on RubyGems):
https://github.com/ruby/ruby/blob/eada6350332155972f19bad52bd8621f607520a2/lib/set.rb#L708
Only moving Set but not SortedSet in core would avoid that issue though, but be somewhat inconsistent.
Updated by Eregon (Benoit Daloze) over 4 years ago
Dan0042 (Daniel DeLorme) wrote in #note-13:
I don't get that part. If you have a literal like
{ x, y, z }
then x/y/z can be any type of object. There's no restriction. But I agreeSet[]
is short and good enough.
Right, I was thinking of #16994.
Set[]
seems already good enough as a general syntax, and no confusion with JS object literals / potentially future Ruby Hash shorthand literals.
Updated by marcandre (Marc-Andre Lafortune) over 4 years ago
Eregon (Benoit Daloze) wrote in #note-14:
Also, does moving Set to core mean rewriting it to C?
I think that would be suboptimal, because every implementation would have to maintain its own native implementation then (or reuse the C extension).
I'm not sure, but Set
is quite small, most of the processing uses Hash
...
Only moving Set but not SortedSet in core would avoid that issue though, but be somewhat inconsistent.
I should have clarified: while I am convinced that Set
is a basic fundamental class, I have yet to use SortedSet
or see it in use. I would keep it out of core (in set
).
Updated by matz (Yukihiro Matsumoto) over 4 years ago
I agree with some of your proposals (#16990, #16991, #16993, #16995). I want @knu (Akinori MUSHA) to work on this. If I missed something, he will tell us.
I strongly disagree with #16994. There's no evidence we need frozen sets of strings or symbols that much. Even if we do, I think frozen arrays should come first.
I weakly disagree with #16992. Currently set orders are determined by the internal hash. We may change the implementation in the future to improve performance or memory overhead. Fixing the order could possibly restrict the future implementation choice.
Matz.
Updated by knu (Akinori MUSHA) over 4 years ago
OK, I think it's good for us now to diverge from the original philosophy of Set when I first wrote it and pursue the performance and integrity with other parts of Ruby. There are many parts in Set where I avoided optimization in order to retain extensibility (like subclassing and stuff), but I'll unlock the bar.
I'm also planning to remove SortedSet and leave it to an external gem because of the partial dependency on rbtree and the fallback implementation which performs quite poorly.
I'm not absolutely sure about introducing literals in the form of { a, b, c }
because I myself is the one who is quite familiar with the shorthand notation introduced in ES6 and would like to have something similar in Ruby. 😆
Updated by marcandre (Marc-Andre Lafortune) over 4 years ago
knu (Akinori MUSHA) wrote in #note-18:
OK, I think it's good for us now to diverge from the original philosophy of Set when I first wrote it and pursue the performance and integrity with other parts of Ruby. There are many parts in Set where I avoided optimization in order to retain extensibility (like subclassing and stuff), but I'll unlock the bar.
I'm also planning to remove SortedSet and leave it to an external gem because of the partial dependency on rbtree and the fallback implementation which performs quite poorly.
I'm not absolutely sure about introducing literals in the form of
{ a, b, c }
because I myself is the one who is quite familiar with the shorthand notation introduced in ES6 and would like to have something similar in Ruby. 😆
This sounds great! 👍
Let me know if I can help.
Updated by marcandre (Marc-Andre Lafortune) over 4 years ago
matz (Yukihiro Matsumoto) wrote in #note-17:
I agree with some of your proposals (#16990, #16991, #16993, #16995). I want @knu (Akinori MUSHA) to work on this. If I missed something, he will tell us.
Thank you for reviewing these :-)
I strongly disagree with #16994. There's no evidence we need frozen sets of strings or symbols that much. Even if we do, I think frozen arrays should come first.
Right, I agree that there is a larger discussion about frozen & static literals to be had.
Would a non-frozen notation for Set of strings / symbols have a chance of being accepted? I would like to avoid Set.new(%w[a list of words])
as can be currently seen in RuboCop
or in Rails:
https://github.com/rails/rails/blob/master/actionpack/lib/action_dispatch/http/cache.rb#L128
https://github.com/rails/rails/blob/master/actionpack/lib/action_dispatch/http/headers.rb#L25-L44
I think that many gems have simply not really taken care about Sets
. For example, this file in Bundler
defines a 40-element array constant, only to call include?
on it...
https://github.com/rubygems/rubygems/blob/master/bundler/lib/bundler/settings.rb#L9
I feel that a builtin way to write this might help.
Updated by mame (Yusuke Endoh) about 4 years ago
FYI: SortedSet has been extracted out from set.rb since 3.0. So there is no longer the dependency problem.
Updated by akr (Akira Tanaka) about 4 years ago
I like extending Hash instead of incorporating Set.
For example, #inspect omit hash values.
(Of course, modifying Hash#inspect is clearly not acceptable, I think some compromise is possible)
Updated by knu (Akinori MUSHA) about 4 years ago
akr (Akira Tanaka) wrote in #note-22:
I like extending Hash instead of incorporating Set.
For example, #inspect omit hash values.
(Of course, modifying Hash#inspect is clearly not acceptable, I think some compromise is possible)
How would you implement methods and operators of Set in Hash? Would they only work for hashes that meet special conditions?
Updated by marcandre (Marc-Andre Lafortune) about 4 years ago
akr (Akira Tanaka) wrote in #note-22:
I like extending Hash instead of incorporating Set.
I don't understand the impacts
One question thing that having builtin set would change:
# shareable_constant_value: literal
X = Set[1, 2, 3] # => error, or acceptable (and frozen)
I would like it to be acceptable and frozen.
Updated by akr (Akira Tanaka) about 4 years ago
knu (Akinori MUSHA) wrote in #note-23:
akr (Akira Tanaka) wrote in #note-22:
I like extending Hash instead of incorporating Set.
For example, #inspect omit hash values.
(Of course, modifying Hash#inspect is clearly not acceptable, I think some compromise is possible)How would you implement methods and operators of Set in Hash? Would they only work for hashes that meet special conditions?
- define a special constant (Hash::DummyValue = Object.new)
- Hash#inspect doesn't show the value: {1 => Hash::DummyValue}.inspect #=> "{1}"
-
{}
can be used as an empty set -
hash << v
would be defined ashash[v] = Hash::DummyValue
- Hash#each doesn't work well as Set but we can use Hash#each_key.
- Hash#to_a doesn't work well as Set but we can use Hash#keys.
Updated by marcandre (Marc-Andre Lafortune) about 4 years ago
I still don't see the benefit of not simply having Set
in core.
Moreover, does this make ary & set
possible? efficient?
Updated by duerst (Martin Dürst) about 4 years ago
marcandre (Marc-Andre Lafortune) wrote in #note-26:
I still don't see the benefit of not simply having
Set
in core.
One direction that distinguishes Ruby from some other programming languages is big classes. A typical example is that there's no Stack
class, the stack interface is just part of Array
. I don't know the history, but I think the original idea was to make the set interface also be part of Array. Your ary & set
example also alludes to that idea. Most of that interface (maybe all) is currently available, but it's not very efficient. Using Hash
for sets eliminates the efficiency problem, as we know.
Moreover, does this make
ary & set
possible? efficient?
I haven't checked the details, but it shouldn't be too difficult to implement this, in a reasonably efficient way. If there are any efficiency problems, they would be on the Array side, not on the Hash side.
Another advantage of using Hash may be that we can get to a syntax that is very close to the actual Mathematical set syntax. We would like to write {1, 2, 3}
, which may even be possible. At the least, it should be possible to move to the following: {1 => Hash::DummyValue, 2, 3}
, i.e. if the first key has a special value, then the remaining values can be left out. We may want to choose a better name for Hash::DummyValue
, maybe something like Hash::SetDummy
.
Updated by akr (Akira Tanaka) about 4 years ago
akr (Akira Tanaka) wrote in #note-25:
- Hash#each doesn't work well as Set but we can use Hash#each_key.
To be fair, "Hash#each doesn't work well" means Enumerable methods doesn't work well.
enum_for(:each_key)
can be used as {1, 2}.enum_for(:each_key).max
, though.
(I forgot to mention that Ruby can interpret {1, 2}
as { 1 => Hash::SetDummy, 2 => Hash::SetDummy }
. I feel the name SetDummy is better than DummyValue. Thanks.)
It may be possible to solve this by Hash#each yields key
instead of [key, value]
if value
is Hash::SetDummy
.
I doubt it will be accepted, though.
Updated by knu (Akinori MUSHA) about 4 years ago
I don't like the idea of making array & hash
possible, as that would go against the trends of making Ruby type-friendly and error-prone.
akr (Akira Tanaka) wrote in #note-25:
- define a special constant (Hash::DummyValue = Object.new)
- Hash#inspect doesn't show the value: {1 => Hash::DummyValue}.inspect #=> "{1}"
{}
can be used as an empty sethash << v
would be defined ashash[v] = Hash::DummyValue
- Hash#each doesn't work well as Set but we can use Hash#each_key.
- Hash#to_a doesn't work well as Set but we can use Hash#keys.
I'd appreciate the list as implementation details, but if it were like some hashes are used as sets, programs might get less readable.
For example, a program that goes hash = {}; hash << 1; hash[:key] = :value
would likely be a mistake, but that couldn't be easily detected without explicit type annotation.
Updated by knu (Akinori MUSHA) about 4 years ago
knu (Akinori MUSHA) wrote in #note-29:
For example, a program that goes
hash = {}; hash << 1; hash[:key] = :value
would likely be a mistake, but that couldn't be easily detected without explicit type annotation.
Well, even with type annotation as long as it is a direct Hash instance.
Updated by marcandre (Marc-Andre Lafortune) about 4 years ago
I'm sorry, I am completely confused by this discussion, I can't make sense of it.
From what I read:
set = {:a, :b, :c} # or whatever notation is used
set.class # => Hash ???
set[:a] # => Hash::SetDummy ???
set[:elem] = :something_else # => what does that mean?
set.to_a # => [:a, :b, :c, [:elem, :something_else]] ???
set.transform_values { ... } ???
set + [1, 2, 3] # => ???
set + Set[:d, :e] # => ???
Am I understanding correctly?
duerst (Martin Dürst) wrote in #note-27:
Using
Hash
for sets eliminates the efficiency problem, as we know.
Are you talking about using Hash
internally, or having Hash's class and API?
Another advantage of using Hash may be that we can get to a syntax that is very close to the actual Mathematical set syntax. We would like to write
{1, 2, 3}
, which may even be possible.
I don't understand how this might be related in any way to using a Hash? What meaning we decide to give to {a, b, c}
is completely independent to what {a: :a, b: :b}
means, or to what %w{a, b, c}
means, or to {|a, b| c}
for that matter.
At the least, it should be possible to move to the following:
{1 => Hash::DummyValue, 2, 3}
, i.e. if the first key has a special value, then the remaining values can be left out. We may want to choose a better name forHash::DummyValue
, maybe something likeHash::SetDummy
.
Are you suggesting that instead of writing Set[1, 2, 3]
we would literally write {1 => Hash::SetDummy, 2, 3}
?
Updated by matz (Yukihiro Matsumoto) about 4 years ago
I agree with adding Set
by default. But no set literals nor SortedSet
at the moment.
Matz.
Updated by Student (Nathan Zook) almost 4 years ago
marcandre (Marc-Andre Lafortune) wrote in #note-31:
I'm sorry, I am completely confused by this discussion, I can't make sense of it.
From what I read:
set = {:a, :b, :c} # or whatever notation is used set.class # => Hash ??? set[:a] # => Hash::SetDummy ??? set[:elem] = :something_else # => what does that mean? set.to_a # => [:a, :b, :c, [:elem, :something_else]] ??? set.transform_values { ... } ??? set + [1, 2, 3] # => ??? set + Set[:d, :e] # => ???
Am I understanding correctly?
duerst (Martin Dürst) wrote in #note-27:
Using
Hash
for sets eliminates the efficiency problem, as we know.Are you talking about using
Hash
internally, or having Hash's class and API?Another advantage of using Hash may be that we can get to a syntax that is very close to the actual Mathematical set syntax. We would like to write
{1, 2, 3}
, which may even be possible.I don't understand how this might be related in any way to using a Hash? What meaning we decide to give to
{a, b, c}
is completely independent to what{a: :a, b: :b}
means, or to what%w{a, b, c}
means, or to{|a, b| c}
for that matter.At the least, it should be possible to move to the following:
{1 => Hash::DummyValue, 2, 3}
, i.e. if the first key has a special value, then the remaining values can be left out. We may want to choose a better name forHash::DummyValue
, maybe something likeHash::SetDummy
.Are you suggesting that instead of writing
Set[1, 2, 3]
we would literally write{1 => Hash::SetDummy, 2, 3}
?
THIS.
Sets are in no way related to Hashes. The fact that they are/can be implemented as hash keys is convenient, but surfacing such an implementations detail violates programming best practice to a crazy amount.
Updated by Eregon (Benoit Daloze) almost 4 years ago
I don't think mixing Hash and Set is good at all. They have fundamentally different APIs.
Updated by greggzst (Grzegorz Jakubiak) over 3 years ago
Is there a chance this feature lands in 3.1?
Updated by greggzst (Grzegorz Jakubiak) about 3 years ago
Any update on that?
Updated by knu (Akinori MUSHA) almost 3 years ago
I've prepared a PR: https://github.com/ruby/ruby/pull/5563
Updated by matz (Yukihiro Matsumoto) almost 3 years ago
I agree with merging the pull request above, and see how it goes.
Matz.
Updated by Anonymous almost 3 years ago
- Status changed from Assigned to Closed
Applied in changeset git|dd3501bb9580951623a9aa7c2f86f7c98f9d6b9c.
Make Set a builtin feature [Feature #16989]
Updated by knu (Akinori MUSHA) almost 3 years ago
- Status changed from Closed to Open
Updated by hsbt (Hiroshi SHIBATA) 10 months ago
- Status changed from Open to Assigned