Array Intersection does not seem to use hash
According to the documentation for Array#&, comparison is done using hash and eql? However, this does not seem to be the case in 2.5.0.
If two instances are .eql? but their hashes are not, an array & should be empty:
([Var.new('a')] & [Var.new('b')]).empty?
This test works in 2.4.2 (and all earlier versions), but fails in 2.5.0. See attached script.
#1 [ruby-core:84575] Updated by marcandre (Marc-Andre Lafortune) 10 months ago
If two instances are .eql? but their hashes are not
hash are ill-defined. The definition of
hash is that any two objects that are
eql? must have identical
hash values. Note that objects that are not
eql? may also have identical
hash values, although it is preferable when they don't.
#2 [ruby-core:86241] Updated by styrmis (Stefan Magnuson) 7 months ago
We (myself and Lewis Buckley) investigated this issue as part of the Ruby Hack Day at Cookpad.
We were able to replicate the issue on Ruby 2.6 (trunk)—digging a bit deeper we found that the unexpected behaviour only occurs when both arrays have <= 16 elements.
This is because in Ruby 2.5 a performance enhancement was added to avoid building a hash when the array sizes are small.
The version for small arrays compares elements using
#eql? whereas the implementation in 2.4 defers to the
Hash implementation of equality checking which appears to use the hash directly (and not the
We discussed with Matz during the event, and the conclusion at the time was that:
- The example as given has an inconsistent implementation (between the implementation of
Val#hash), and as such the behaviour is undefined, and could reasonably change between versions
- The documentation for
Object#eql?could be extended to make this clearer
So one possible improvement would be to update the documentation for
Object#eql? to make clear that when
#eql? is overridden it should be kept consistent with the result of a comparison of the
#hash of two objects.
Another possible solution might be to make both variants (for both large and small arrays) use the same method of comparison, i.e. both
#eql? or both hash equivalence.
In a related issue it is suggested that the given
#eql? method implementation is invalid. In terms of the documentation, it might be helpful to give some guidelines regarding how
#eql? implementations in subclasses should work, e.g.
- Where possible, override the
#hashmethod instead of
#eql?, which should avoid the inconsistency posed in this issue
- When overriding
#eql?, take care when adding logic which does not make use of
#eql?on the related objects, e.g. the
truereturn value in the attached example is completely unrelated to
#3 [ruby-core:86295] Updated by styrmis (Stefan Magnuson) 7 months ago
- File object-eql-documentation-tweak.patch object-eql-documentation-tweak.patch added
- File array-eql-by-hash.patch array-eql-by-hash.patch added
Attached are two patches:
- One which uses hash comparison for both the optimised "small" array branch and the default implementation of the
Arrayset operations (essentially restoring the behaviour of Ruby < 2.5)
- Another which adds a warning to the documentation for
Array#eql?, stating that the link between hash equality and
eql?must not be broken in subclasses
As the behaviour being relied upon in Ruby < 2.5 is not well defined (
#eql? has been modified to not take the hashes of compared objects into account), then perhaps updating the documentation is the better approach.
If backwards compatibility of the (undefined) behaviour is to be maintained, then I believe the patch to
Array achieves this, though based on its poor performance (see below) it is perhaps of limited value.
All feedback is of course very welcome!
SMALL_ARRAY_MIN=9 SMALL_ARRAY_MAX=16 SMALL_ARRAY_ITERATIONS=10000 ../install/bin/ruby benchmark/driver.rb --pattern='array_small' --executables='HEAD::../install-trunk/bin/ruby; PATCH::../install/bin/ruby;' -r 10 --format=markdown
Execution time (sec)
Speedup ratio: compare with the result of `HEAD' (greater is better)