Feature #16993
openSets: from hash keys using Hash#key_set
Description
To create a set from hash keys currently implies a temporary array for all keys, rehashing all those keys and rebuilding a hash. Instead, the hash could be copied and its values set to true.
h = {a: 1}
# Now:
Set.new(h.keys) # => Set[:a]
# After
h.key_set # => Set[:a], efficiently.
Updated by marcandre (Marc-Andre Lafortune) over 5 years ago
- Related to Feature #16989: Sets: need ♥️ added
Updated by sawa (Tsuyoshi Sawada) over 5 years ago
Would
Set.new(h.each_key)
not work?
Updated by marcandre (Marc-Andre Lafortune) over 5 years ago
sawa (Tsuyoshi Sawada) wrote in #note-2:
Would
Set.new(h.each_key)not work?
It will definitely work, but it will be the same as Set.new(h.keys) (a bit slower because enumerator is a bit slow). It still iterates on the keys, and add them to the new hash.
Here's a comparison:
# frozen_string_literal: true
require 'set'
require 'benchmark/ips'
class Hash
def key_set
s = Set.new
s.instance_variable_set(:@hash, transform_values { true })
s
end
end
size = 50
h = (1..size).to_h { |i| ['x' * i, nil] }
Benchmark.ips do |x|
x.report("key_set") { h.key_set }
x.report("keys") { Set.new(h.keys) }
x.report("new(each_key)") { Set.new(h.each_key) }
x.report("each_key{}") { h2 = {}; h.each_key {|k| h2[k] = true} ; h2 }
The last example builds a Hash, not a Set, but it is to show that you can not be quicker than that if you rehash the keys.
I get these results:
Calculating -------------------------------------
key_set 244.549k (± 7.4%) i/s - 1.219M in 5.017876s
keys 82.661k (± 2.3%) i/s - 417.400k in 5.052408s
new(each_key) 75.002k (± 5.0%) i/s - 377.400k in 5.045102s
each_key{} 114.757k (± 3.8%) i/s - 582.000k in 5.079700s
If you increase the size, the ratio will be bigger.
A builtin keyset would be even faster, since it would avoid calling the block { true }; yielding is not super fast in Ruby.
Updated by greggzst (Grzegorz Jakubiak) about 1 year ago
I opened a PR to add this functionality https://github.com/ruby/ruby/pull/12750
Updated by nobu (Nobuyoshi Nakada) about 1 year ago
This kind of extensions should be done in the corresponding library side, set.rb.
Updated by Dan0042 (Daniel DeLorme) about 1 year ago
I like the concept but I'm going to bikeshed the naming: by itself "key_set" is unclear if you don't already know what it does. It sounds like the purpose is to set a key (like instance_variable_set). I'd prefer something a bit more descriptive like "keys_to_set"
Updated by greggzst (Grzegorz Jakubiak) about 1 year ago
nobu (Nobuyoshi Nakada) wrote in #note-5:
This kind of extensions should be done in the corresponding library side, set.rb.
Thanks for pointing that out. Here's the PR https://github.com/ruby/set/pull/40
Dan0042 (Daniel DeLorme) wrote in #note-6:
I like the concept but I'm going to bikeshed the naming: by itself "key_set" is unclear if you don't already know what it does. It sounds like the purpose is to set a key (like instance_variable_set). I'd prefer something a bit more descriptive like "keys_to_set"
Good point I addressed that
Updated by mame (Yusuke Endoh) 12 months ago
This was discussed at the dev meeting. @matz (Yukihiro Matsumoto) wanted to hear more about the use cases for this method first.
Also, since Set is currently in a halfway state where it is not fully built-in, a built-in Hash#key_set might require a special hack like Enumerable#to_set.
https://github.com/ruby/ruby/blob/7c88cbb4a6c486348c44be24941f17ef8be6b329/prelude.rb#L34
@akr (Akira Tanaka) pointed out the possible consistency issue of Hash#compare_by_identity. Since Hash#transform_values keeps this compare_by_identity flag, the proposed patch will assign a Hash with compare_by_identity flag to @hash in Set. We will need to review it carefully to make sure it does not break the implicit assumptions of set.rb.
Updated by mame (Yusuke Endoh) 11 months ago
Given the convention of deriving methods such as key_set from keys, where the former returns a set instead of an array, one might expect a corresponding derivation like Kernel#instance_variable_set for Kernel#instance_variables. Unfortunately, this would conflict with an existing method name.
Updated by Dan0042 (Daniel DeLorme) 11 months ago
mame (Yusuke Endoh) wrote in #note-9:
Given the convention of deriving methods such as
key_setfromkeys, where the former returns a set instead of an array, one might expect a corresponding derivation likeKernel#instance_variable_setforKernel#instance_variables. Unfortunately, this would conflict with an existing method name.
In https://github.com/ruby/set/pull/40 the name was changed to #keys_to_set, so this comment doesn't apply anymore.
Then again I wish all optimizations like this could be implemented at the VM level, so that hash.keys.to_set would automatically be efficient rather than having to explicitly use the efficient version #keys_to_set.