Project

General

Profile

Bug #14263

Array Intersection does not seem to use hash

Added by gkellogg (Gregg Kellogg) 10 months ago. Updated 7 months ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:84573]

Description

According to the documentation for Array#&, comparison is done using hash and eql? However, this does not seem to be the case in 2.5.0.

If two instances are .eql? but their hashes are not, an array & should be empty:

([Var.new('a')] & [Var.new('b')]).empty?

This test works in 2.4.2 (and all earlier versions), but fails in 2.5.0. See attached script.

array_intersection.rb (535 Bytes) array_intersection.rb Failing array intersection test gkellogg (Gregg Kellogg), 12/31/2017 09:58 PM
test-14264.rb (956 Bytes) test-14264.rb styrmis (Stefan Magnuson), 03/21/2018 02:13 PM
array-eql-by-hash.patch (2.09 KB) array-eql-by-hash.patch Patch to compare by hash in Array set operations for all array sizes styrmis (Stefan Magnuson), 03/25/2018 09:38 PM
object-eql-documentation-tweak.patch (825 Bytes) object-eql-documentation-tweak.patch Patch to documentation for Object#eql? to add warning regarding hash equality styrmis (Stefan Magnuson), 03/25/2018 09:38 PM

History

#1 [ruby-core:84575] Updated by marcandre (Marc-Andre Lafortune) 10 months ago

If two instances are .eql? but their hashes are not

Then eql? or hash are ill-defined. The definition of hash is that any two objects that are eql? must have identical hash values. Note that objects that are not eql? may also have identical hash values, although it is preferable when they don't.

#2 [ruby-core:86241] Updated by styrmis (Stefan Magnuson) 7 months ago

We (myself and Lewis Buckley) investigated this issue as part of the Ruby Hack Day at Cookpad.

We were able to replicate the issue on Ruby 2.6 (trunk)—digging a bit deeper we found that the unexpected behaviour only occurs when both arrays have <= 16 elements.

This is because in Ruby 2.5 a performance enhancement was added to avoid building a hash when the array sizes are small.

The version for small arrays compares elements using #eql? whereas the implementation in 2.4 defers to the Hash implementation of equality checking which appears to use the hash directly (and not the #eql? method).

We discussed with Matz during the event, and the conclusion at the time was that:

  • The example as given has an inconsistent implementation (between the implementation of Val#eql? and Val#hash), and as such the behaviour is undefined, and could reasonably change between versions
  • The documentation for Object#eql? could be extended to make this clearer

So one possible improvement would be to update the documentation for Object#eql? to make clear that when #eql? is overridden it should be kept consistent with the result of a comparison of the #hash of two objects.

Another possible solution might be to make both variants (for both large and small arrays) use the same method of comparison, i.e. both #eql? or both hash equivalence.

In a related issue it is suggested that the given #eql? method implementation is invalid. In terms of the documentation, it might be helpful to give some guidelines regarding how #eql? implementations in subclasses should work, e.g.

  • Where possible, override the #hash method instead of #eql?, which should avoid the inconsistency posed in this issue
  • When overriding #eql?, take care when adding logic which does not make use of #eql? on the related objects, e.g. the true return value in the attached example is completely unrelated to v#eql?

#3 [ruby-core:86295] Updated by styrmis (Stefan Magnuson) 7 months ago

Attached are two patches:

  • One which uses hash comparison for both the optimised "small" array branch and the default implementation of the Array set operations (essentially restoring the behaviour of Ruby < 2.5)
  • Another which adds a warning to the documentation for Array#eql?, stating that the link between hash equality and eql? must not be broken in subclasses

As the behaviour being relied upon in Ruby < 2.5 is not well defined (#eql? has been modified to not take the hashes of compared objects into account), then perhaps updating the documentation is the better approach.

If backwards compatibility of the (undefined) behaviour is to be maintained, then I believe the patch to Array achieves this, though based on its poor performance (see below) it is perhaps of limited value.

All feedback is of course very welcome!

Benchmarks

SMALL_ARRAY_MIN=9 SMALL_ARRAY_MAX=16 SMALL_ARRAY_ITERATIONS=10000 ../install/bin/ruby benchmark/driver.rb --pattern='array_small' --executables='HEAD::../install-trunk/bin/ruby; PATCH::../install/bin/ruby;' -r 10 --format=markdown


benchmark results:
Execution time (sec)

name HEAD PATCH
array_small_and 0.796 3.925
array_small_diff 0.645 3.277
array_small_or 0.908 0.913

Speedup ratio: compare with the result of `HEAD' (greater is better)

name PATCH
array_small_and 0.203
array_small_diff 0.197
array_small_or 0.994

Also available in: Atom PDF