Bug #18582
closedHash.group_by not grouping correctly with SortedSets
Description
With Ruby 3.0.3, when using SortedSets as group_by value for Hash, equal SortedSets are not grouped as they should be.
This works correctly in Ruby 2.7.1 (when rbtree gem is not present, not tested with rbtree gem)
This works correctly using Sets as the group_by value in both 2.7.1 and 3.0.3
This test code:
require 'set'
require 'sorted_set' if RUBY_VERSION > '3'
puts RUBY_VERSION
# works when keys are Sets
s1 = Set['fubar']
s2 = Set['fubar']
warn "expected #{s1} to equal #{s2}" unless s1 == s2
grouped = { 'a' => s1, 'b' => s2 }.group_by { |_, v| v }
puts "grouped by Sets: #{grouped}"
warn "expected 1 key in hash grouped by Sets, got #{grouped.keys.size}" unless grouped.keys.size == 1
# 3.0.3 fails when keys are SortdSets
ss1 = SortedSet['fubar']
ss2 = SortedSet['fubar']
warn "expected #{ss1} to equal #{ss2}" unless ss1 == ss2
grouped = { 'a' => ss1, 'b' => ss2 }.group_by { |_, v| v }
puts "grouped by SortedSets: #{grouped}"
warn "expected 1 key in hash grouped by SortedSets, got #{grouped.keys.size}" unless grouped.keys.size == 1
prints this under 2.7.1:
2.7.1
grouped by Sets: {#<Set: {"fubar"}>=>[["a", #<Set: {"fubar"}>], ["b", #<Set: {"fubar"}>]]}
grouped by SortedSets: {#<SortedSet: {"fubar"}>=>[["a", #<SortedSet: {"fubar"}>], ["b", #<SortedSet: {"fubar"}>]]}
but prints this under 3.0.3:
3.0.3
grouped by Sets: {#<Set: {"fubar"}>=>[["a", #<Set: {"fubar"}>], ["b", #<Set: {"fubar"}>]]}
grouped by SortedSets: {#<SortedSet: {"fubar"}>=>[["a", #<SortedSet: {"fubar"}>]], #<SortedSet: {"fubar"}>=>[["b", #<SortedSet: {"fubar"}>]]}
expected 1 key in hash grouped by SortedSets, got 2
Updated by nobu (Nobuyoshi Nakada) about 3 years ago
- Description updated (diff)
- Status changed from Open to Third Party's Issue
It is not because of ruby versions, by whether SortedSet
uses RBTree gem or not.
$ ruby2.7 -rset -e 's1 = SortedSet["fubar"]; s2 = SortedSet["fubar"]; p s1.eql?(s2)'
true
$ gem2.7 i --user sorted_set
Fetching set-1.0.2.gem
Fetching rbtree-0.4.5.gem
Fetching sorted_set-1.0.3.gem
WARNING: You don't have /Users/nobu/.gem/ruby/2.7.0/bin in your PATH,
gem executables will not run.
Building native extensions. This could take a while...
Successfully installed rbtree-0.4.5
Successfully installed set-1.0.2
Successfully installed sorted_set-1.0.3
Parsing documentation for rbtree-0.4.5
Installing ri documentation for rbtree-0.4.5
Parsing documentation for set-1.0.2
Installing ri documentation for set-1.0.2
Parsing documentation for sorted_set-1.0.3
Installing ri documentation for sorted_set-1.0.3
Done installing documentation for rbtree, set, sorted_set after 0 seconds
3 gems installed
$ ruby2.7 -rset -e 's1 = SortedSet["fubar"]; s2 = SortedSet["fubar"]; p s1.eql?(s2)'
false
RBTree gem needs to implement #hash
and #eql?
methods, to be hash keys.
Updated by mike@carltons.us (Mike Carlton) about 3 years ago
Thank you very much Nobu for your quick response.
For anyone who stumbles upon this page, I used this quick and dirty monkey patch to add the necessary functionality to RBTree (until RBTree is updated); with this SortedSet works as expected for me in ruby 3.0:
require 'rbtree'
class RBTree
# conditionally define these methods so that if rbtree gains in a future upgrade them we don't override
unless RBTree.instance_methods(false).include?(:eql?)
class_eval <<-END, __FILE__, __LINE__+1
def eql?(other)
# we could use 'self == other' (RBTree already implements ==), but if we do then
# we wind up with SortedSet[1].eql?(SortedSet[1.0]) but !(1.eql?(1.0)) and !(Set[1].eql?(Set[1.0]))
# we'll take a chance on a 64-bit collision instead
self.hash == other.hash
end
END
end
unless RBTree.instance_methods(false).include?(:hash)
class_eval <<-END, __FILE__, __LINE__+1
# Ruby hash.c implements something like MurmurHash on keys and values
# Ruby also starts with a unique seed in each instance (so {a:1}.hash is different in every process)
# We'll do something much simpler, but good enough for our purposes
def hash
result = 0
self.each do |k, v|
# result ^= k.hash; result ^= v.hash is not correct: RBTree[a:1,b:2].hash would equal RBTree[a:2,b:1].hash
# result ^= [ k, v ] would create a lot of unnecessary allocations and garbage
# Ruby internals using gcc 128b integer type where possible and Object.hash returns a 64b integer,
# so we'll take advantage of that and just create a 128b Integer hash instead of hashes of Arrays
# In the SortedSet usage, the values are always 'true'; we will put this in the upper-half as they'll
# cancel and half the time we'll have 64b value (does not really matter, but numbers are easier to read)
result ^= (v.hash << 64) ^ k.hash
end
result
end
END
end
end
Updated by nobu (Nobuyoshi Nakada) about 3 years ago
You can use RBTree.method_defined?(:eql?, false)
and so on, instead of RBTree.instance_methods(false).include?
.
Updated by mike@carltons.us (Mike Carlton) about 3 years ago
Ah, I had tried method_defined?, but it returns true (for inherited Kernel#eql?)
I did not realized that method_defined? also accepted a inherited=false argument.
Thank you.