Bug #14263
closedArray Intersection does not seem to use hash
Description
According to the documentation for Array#&, comparison is done using hash and eql? However, this does not seem to be the case in 2.5.0.
If two instances are .eql? but their hashes are not, an array & should be empty:
([Var.new('a')] & [Var.new('b')]).empty?
This test works in 2.4.2 (and all earlier versions), but fails in 2.5.0. See attached script.
Files
Updated by marcandre (Marc-Andre Lafortune) almost 7 years ago
If two instances are .eql? but their hashes are not
Then eql?
or hash
are ill-defined. The definition of hash
is that any two objects that are eql?
must have identical hash
values. Note that objects that are not eql?
may also have identical hash
values, although it is preferable when they don't.
Updated by styrmis (Stefan Magnuson) almost 7 years ago
- File test-14264.rb test-14264.rb added
We (myself and Lewis Buckley) investigated this issue as part of the Ruby Hack Day at Cookpad.
We were able to replicate the issue on Ruby 2.6 (trunk)—digging a bit deeper we found that the unexpected behaviour only occurs when both arrays have <= 16 elements.
This is because in Ruby 2.5 a performance enhancement was added to avoid building a hash when the array sizes are small.
The version for small arrays compares elements using #eql?
whereas the implementation in 2.4 defers to the Hash
implementation of equality checking which appears to use the hash directly (and not the #eql?
method).
We discussed with Matz during the event, and the conclusion at the time was that:
- The example as given has an inconsistent implementation (between the implementation of
Val#eql?
andVal#hash
), and as such the behaviour is undefined, and could reasonably change between versions - The documentation for
Object#eql?
could be extended to make this clearer
So one possible improvement would be to update the documentation for Object#eql?
to make clear that when #eql?
is overridden it should be kept consistent with the result of a comparison of the #hash
of two objects.
Another possible solution might be to make both variants (for both large and small arrays) use the same method of comparison, i.e. both #eql?
or both hash equivalence.
In a related issue it is suggested that the given #eql?
method implementation is invalid. In terms of the documentation, it might be helpful to give some guidelines regarding how #eql?
implementations in subclasses should work, e.g.
- Where possible, override the
#hash
method instead of#eql?
, which should avoid the inconsistency posed in this issue - When overriding
#eql?
, take care when adding logic which does not make use of#eql?
on the related objects, e.g. thetrue
return value in the attached example is completely unrelated tov#eql?
Updated by styrmis (Stefan Magnuson) over 6 years ago
- File array-eql-by-hash.patch array-eql-by-hash.patch added
- File object-eql-documentation-tweak.patch object-eql-documentation-tweak.patch added
Attached are two patches:
- One which uses hash comparison for both the optimised "small" array branch and the default implementation of the
Array
set operations (essentially restoring the behaviour of Ruby < 2.5) - Another which adds a warning to the documentation for
Array#eql?
, stating that the link between hash equality andeql?
must not be broken in subclasses
As the behaviour being relied upon in Ruby < 2.5 is not well defined (#eql?
has been modified to not take the hashes of compared objects into account), then perhaps updating the documentation is the better approach.
If backwards compatibility of the (undefined) behaviour is to be maintained, then I believe the patch to Array
achieves this, though based on its poor performance (see below) it is perhaps of limited value.
All feedback is of course very welcome!
Benchmarks¶
SMALL_ARRAY_MIN=9 SMALL_ARRAY_MAX=16 SMALL_ARRAY_ITERATIONS=10000 ../install/bin/ruby benchmark/driver.rb --pattern='array_small' --executables='HEAD::../install-trunk/bin/ruby; PATCH::../install/bin/ruby;' -r 10 --format=markdown
benchmark results:
Execution time (sec)
name | HEAD | PATCH |
---|---|---|
array_small_and | 0.796 | 3.925 |
array_small_diff | 0.645 | 3.277 |
array_small_or | 0.908 | 0.913 |
Speedup ratio: compare with the result of `HEAD' (greater is better)
name | PATCH |
---|---|
array_small_and | 0.203 |
array_small_diff | 0.197 |
array_small_or | 0.994 |
Updated by jeremyevans0 (Jeremy Evans) over 5 years ago
- Related to Bug #14573: rb_ary_or doesn't check objects hash when the array contains less than SMALL_ARRAY_LEN added
Updated by jeremyevans (Jeremy Evans) over 5 years ago
- Status changed from Open to Closed
Applied in changeset git|8bccbf3cfea8c1059d40db5b5e61900cf6cf29d3.
Add more documentation on #eql?/#hash relationship [ci skip]
Fixes [Bug #14263]