Hash.group_by not grouping correctly with SortedSets

ruby 3.0.3p157 (2021-11-24 revision 3fb7d2cadc) [x86_64-darwin20]


With Ruby 3.0.3, when using SortedSets as group_by value for Hash, equal SortedSets are not grouped as they should be.
This works correctly in Ruby 2.7.1 (when rbtree gem is not present, not tested with rbtree gem)

This works correctly using Sets as the group_by value in both 2.7.1 and 3.0.3

This test code:

require 'set'
require 'sorted_set' if RUBY_VERSION > '3'


# works when keys are Sets
s1 = Set['fubar']
s2 = Set['fubar']
warn "expected #{s1} to equal #{s2}" unless s1 == s2

grouped = { 'a' => s1, 'b' => s2 }.group_by { |_, v| v }
puts "grouped by Sets: #{grouped}"
warn "expected 1 key in hash grouped by Sets, got #{grouped.keys.size}" unless grouped.keys.size == 1

# 3.0.3 fails when keys are SortdSets
ss1 = SortedSet['fubar']
ss2 = SortedSet['fubar']
warn "expected #{ss1} to equal #{ss2}" unless ss1 == ss2

grouped = { 'a' => ss1, 'b' => ss2 }.group_by { |_, v| v }
puts "grouped by SortedSets: #{grouped}"
warn "expected 1 key in hash grouped by SortedSets, got #{grouped.keys.size}" unless grouped.keys.size == 1

prints this under 2.7.1:

grouped by Sets: {#<Set: {"fubar"}>=>[["a", #<Set: {"fubar"}>], ["b", #<Set: {"fubar"}>]]}
grouped by SortedSets: {#<SortedSet: {"fubar"}>=>[["a", #<SortedSet: {"fubar"}>], ["b", #<SortedSet: {"fubar"}>]]}

but prints this under 3.0.3:

grouped by Sets: {#<Set: {"fubar"}>=>[["a", #<Set: {"fubar"}>], ["b", #<Set: {"fubar"}>]]}
grouped by SortedSets: {#<SortedSet: {"fubar"}>=>[["a", #<SortedSet: {"fubar"}>]], #<SortedSet: {"fubar"}>=>[["b", #<SortedSet: {"fubar"}>]]}
expected 1 key in hash grouped by SortedSets, got 2

Updated by nobu (Nobuyoshi Nakada) over 2 years ago

It is not because of ruby versions, by whether SortedSet uses RBTree gem or not.

$ ruby2.7 -rset -e 's1 = SortedSet["fubar"]; s2 = SortedSet["fubar"]; p s1.eql?(s2)'

$ gem2.7 i --user sorted_set
Fetching set-1.0.2.gem
Fetching rbtree-0.4.5.gem
Fetching sorted_set-1.0.3.gem
WARNING:  You don't have /Users/nobu/.gem/ruby/2.7.0/bin in your PATH,
	  gem executables will not run.
Building native extensions. This could take a while...
Successfully installed rbtree-0.4.5
Successfully installed set-1.0.2
Successfully installed sorted_set-1.0.3
Parsing documentation for rbtree-0.4.5
Installing ri documentation for rbtree-0.4.5
Parsing documentation for set-1.0.2
Installing ri documentation for set-1.0.2
Parsing documentation for sorted_set-1.0.3
Installing ri documentation for sorted_set-1.0.3
Done installing documentation for rbtree, set, sorted_set after 0 seconds
3 gems installed

$ ruby2.7 -rset -e 's1 = SortedSet["fubar"]; s2 = SortedSet["fubar"]; p s1.eql?(s2)'

RBTree gem needs to implement #hash and #eql? methods, to be hash keys.

Updated by (Mike Carlton) over 2 years ago

Thank you very much Nobu for your quick response.

For anyone who stumbles upon this page, I used this quick and dirty monkey patch to add the necessary functionality to RBTree (until RBTree is updated); with this SortedSet works as expected for me in ruby 3.0:

require 'rbtree'

class RBTree
  # conditionally define these methods so that if rbtree gains in a future upgrade them we don't override
  unless RBTree.instance_methods(false).include?(:eql?)
    class_eval <<-END, __FILE__, __LINE__+1
      def eql?(other)
        # we could use 'self == other' (RBTree already implements ==), but if we do then
        # we wind up with SortedSet[1].eql?(SortedSet[1.0]) but !(1.eql?(1.0)) and !(Set[1].eql?(Set[1.0]))
        # we'll take a chance on a 64-bit collision instead
        self.hash == other.hash

  unless RBTree.instance_methods(false).include?(:hash)
    class_eval <<-END, __FILE__, __LINE__+1
      # Ruby hash.c implements something like MurmurHash on keys and values
      # Ruby also starts with a unique seed in each instance (so {a:1}.hash is different in every process)
      # We'll do something much simpler, but good enough for our purposes
      def hash
        result = 0
        self.each do |k, v|
          # result ^= k.hash; result ^= v.hash is not correct: RBTree[a:1,b:2].hash would equal RBTree[a:2,b:1].hash
          # result ^= [ k, v ] would create a lot of unnecessary allocations and garbage
          # Ruby internals using gcc 128b integer type where possible and Object.hash returns a 64b integer,
          # so we'll take advantage of that and just create a 128b Integer hash instead of hashes of Arrays
          # In the SortedSet usage, the values are always 'true'; we will put this in the upper-half as they'll
          # cancel and half the time we'll have 64b value (does not really matter, but numbers are easier to read)
          result ^= (v.hash << 64) ^ k.hash


Updated by nobu (Nobuyoshi Nakada) over 2 years ago

You can use RBTree.method_defined?(:eql?, false) and so on, instead of RBTree.instance_methods(false).include?.

Updated by (Mike Carlton) over 2 years ago

Ah, I had tried method_defined?, but it returns true (for inherited Kernel#eql?)

I did not realized that method_defined? also accepted a inherited=false argument.

Thank you.


