Bug #21921
openHash inconsistent ==, >=, <= behavior
Description
Hash seems to have very inconsistent behavior for ==, >=, and <=.
Given that below h1 == h2 is false and that they have the same number of keys, I would expect <= and >= to also be false.
However, surprisingly h1 <= h2 and h2 >= h1 are true, while all other permutations are false.
h1 = {}.compare_by_identity.tap { _1["one"] = 1 } # => {"one" => 1}
h2 = {"one" => 1} # => {"one" => 1}
h1 == h2 # => false
h2 == h1 # => false
h1 >= h2 # => false
h1 <= h2 # => true
h2 >= h1 # => true
h2 <= h1 # => false
Updated by mame (Yusuke Endoh) 1 day ago
The current specification of Hash#<= is to check if all elements of the supposedly smaller Hash are included in the supposedly larger Hash. I think the reported behaviors are actually working as specified.
https://github.com/ruby/ruby/blob/master/doc/language/hash_inclusion.rdoc
Did this behavior cause any issues in a real-world application? If you just noticed the inconsistency, I think it's fine to keep the status quo.
If we really must fix it, raising an exception when comparing a compare_by_identity Hash with a normal Hash via <= or >= (leaving Hash#== as is) seems reasonable.
Updated by cohen (Cohen Carlisle) about 19 hours ago
ยท Edited
This has not affected any real-world app I use. I noticed this while working on a bidirectional hash gem (bihash@rubygems).
However, I don't think the behavior is consistent with the spec as written:
The spec merely says that keys are compared with ==, but I believe they are actually compared with eql? (for normal hashes).
If keys were just compared with ==, all the above comparisons would return true.
I think the spec could also be improved in other ways, assuming there will be no behavioral change:
The spec doesn't mention that before comparison, keys must first be found via the supposedly larger hash's lookup method.
E.g., In h1 <= h2, keys from h1 are looked up in h2 using h2's lookup method, so when h2 is a normal hash it finds the key, but when h2 is an identity hash it fails.
The crux of this issue seems to be that the lookup method differs between identity and normal hashes, which isn't mentioned in the spec.
Raising when comparing identity and normal hashes could also work, but I would expand that raise to include < and >, since they suffer the same problems:
h1 = {}.compare_by_identity
h2 = {}
h1["one"] = 1 # h1 = {"one" => 1}.compare_by_identity
h2["one"] = 1
h2["two"] = 2 # h2 = {"one", "two" => 2}
h1 < h2 # => true
h2 > h1 # => true
h1 = {}
h2 = {}.compare_by_identity
h1["one"] = 1 # h1 = {"one" => 1}
h2["one"] = 1
h2["two"] = 2 # h2 = {"one", "two" => 2}.compare_by_identity
h1 < h2 # => false (surprising, different than above)
h2 > h1 # => false (surprising, different than above)
I think there are actually three interrelated issues here:
- When comparing for inclusion, hashes take keys from the maybe smaller hash and look them up in the maybe larger hash, yielding inconsistent results when identity and normal hashes mix (original and newest example).
- The equality semantics of inclusion are different from
Hash#==semantics, leading to further confusion (original example). - The inclusion specs don't seem consistent with the current behavior and the method/operator docs could call this out more strongly.
Updated by mame (Yusuke Endoh) about 16 hours ago
The documentation for Hash inclusion is a bit hard to understand due to its mathematical phrasing.
The spec merely says that keys are compared with
==, but I believe they are actually compared witheql?(for normal hashes).
The documentation isn't actually saying such a thing. I assume you misread the following sentence:
An entry
h0[k0]in one hash is equal to an entryh1[k1]in another hash if and only if the two keys are equal (k0 == k1) and their two values are equal (h0[k0] == h1[h1]).
This sentence is merely defining the phrase "is equal to". It does not describe the hash lookup mechanism itself.
- Subset (included in or equal to another):
- Hash
h0is a subset of hashh1(see Hash#<=) if each entry inh0is equal to an entry inh1.
This defines the spec of Hash#<=. It says that h0 is a subset of h1 when every entry (key-value pair) included in h0 is equal to some entry included in h1.
While the lookup method is not explicitly stated here either, I think it's quite natural to use h1's lookup method when trying to find if those entries exist in h1.
Raising when comparing identity and normal hashes could also work, but I would expand that raise to include
<and>, since they suffer the same problems:
That makes sense. However, I don't really think we should change the behavior at the risk of breaking existing working code.
Updated by mame (Yusuke Endoh) about 15 hours ago
By the way, I noticed the following behavior, which feels like it might not satisfy the specification:
h1 = {}.compare_by_identity
h1["one"] = true
h1["one"] = true
h2 = { "one" => true }
h1 <= h2 # current: false, expected(?): true
The two "one" => true entries in h1 can both be considered to be included in h2, so the spec seems to imply this should return true. However, I'm torn on whether we should explicitly describe such implementation details in the documentation.