Project

General

Profile

Bug #16850

Object#hash doesn't behave as documented

Added by ana06 (Ana Maria Martinez Gomez) 3 months ago. Updated 3 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:98290]

Description

From Ruby 2.7 Object class documentation:

The hash value is used along with eql? by the Hash class to determine if two objects reference the same hash key.

From this I expect that overwritting the Object#hash method changes the Hashing algorithm. But this is only the case for some objets. Consider the following code:

class Object
  def hash
    i_dont_exist
  end 
end

class Ana 
end

hash = {}
hash[3] = true
hash[3] = false
puts 'Integers are not using the new hash method'
hash[Ana.new] = true
puts 'this will not be executed because the class Ana uses the new hash method'

The output of executing this code is:

Integers are not using the new hash method
Traceback (most recent call last):
        1: from a.rb:13:in `<main>'
a.rb:3:in `hash': undefined local variable or method `i_dont_exist' for #<Ana:0x000055626f0b7a30> (NameError)

This proves that Integer hash method is not used to determine if two objects reference the same hash key, as said in the documentation. Check the any_hash method for the other classes affected.

I think that either the behavior should be modified (aiming for a consistency along different classes and following the documentation) or the documentation updated.

Updated by jeremyevans0 (Jeremy Evans) 3 months ago

I consider this an implementation detail. It never makes sense to override the hash calculation for the cases where Ruby uses the built-in calculation instead of calling #hash, so I don't think Ruby should change behavior to increase consistency. However, we could explicitly document the behavior. Something like:

diff --git a/hash.c b/hash.c
index e75a431c09..b7d320f7c2 100644
--- a/hash.c
+++ b/hash.c
@@ -315,6 +315,9 @@ objid_hash(VALUE obj)
  * implementations of Ruby.  If you need a stable identifier across Ruby
  * invocations and implementations you will need to generate one with a custom
  * method.
+ *
+ * Certain core classes such as Integer use built-in hash calculations and
+ * do not call the #hash method when used as a hash key.
  *--
  * \private
  *++

Thoughts on that?

Updated by ana06 (Ana Maria Martinez Gomez) 3 months ago

That looks good to me :+1: I may add String as well (such as Integer and String).

Updated by matz (Yukihiro Matsumoto) 3 months ago

+1 The updates LGTM.

Matz.

Updated by Dan0042 (Daniel DeLorme) 3 months ago

Would it be feasible to raise some sort of error when trying to redefine Integer#hash ?
Not just for this case, but I feel there were other cases where ruby bypasses the method call to use the built-in implementation directly. Can't recall exactly atm.
Making certain methods un-overridable could be a useful signal that "this is not going to work as you expect".

#5

Updated by jeremyevans (Jeremy Evans) 3 months ago

  • Status changed from Open to Closed

Applied in changeset git|de29a022acb93691dfc50db852cb04f763565072.


Document that #hash is not called for certain core classes [ci skip]

Fixes [Bug #16850]

Updated by jeremyevans0 (Jeremy Evans) 3 months ago

Dan0042 (Daniel DeLorme) wrote in #note-4:

Would it be feasible to raise some sort of error when trying to redefine Integer#hash ?
Not just for this case, but I feel there were other cases where ruby bypasses the method call to use the built-in implementation directly. Can't recall exactly atm.
Making certain methods un-overridable could be a useful signal that "this is not going to work as you expect".

It's probably feasible, but I don't think it is a good idea. You can do it in pure Ruby if you want:

module HashOverrideRaise
  def method_added(method)
    raise "don't override hash" if method == :hash
    super
  end

  def include(*)
    super
    raise "don't override hash" unless instance_method(:hash).owner == Kernel
  end
end

Integer.extend HashOverrideRaise

However, this doesn't handle the case in Ana's example, because you are defining Object#hash, and not Integer#hash.

The cases where Ruby bypasses method calls to use built-in code are different than this case. Ruby will default to use the faster built-in code, but if you define a method, Ruby will recognize it and start calling the method. This approach is usually referred to as deoptimization.

In this case, I don't think it makes sense to change Ruby's behavior. There is no reason to override #hash for the cases where Ruby uses a built-in implementation. So the fact that Ruby does not call #hash in all cases should not be a problem. If you override #eql?, Ruby will still respect that and treat objects differently:

h = {}
s = ''
def s.eql?(_) false end
h[s] = 1
h[s] = 2
h
# => {""=>1, ""=>2}

Updated by Dan0042 (Daniel DeLorme) 3 months ago

Thanks for the insight into bypass vs deoptimization.

Of course if you override #hash and #eql? to return true for previously false cases then it wouldn't work, but anyway I don't see much use or reason for doing this on Integer or String. The added documentation is plenty good enough.

Also available in: Atom PDF