Bug #14263: Array Intersection does not seem to use hash - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #14263

closed

Array Intersection does not seem to use hash

Added by gkellogg (Gregg Kellogg) over 7 years ago. Updated about 6 years ago.

Status:

Closed

Assignee:

Target version:

ruby -v:

2.5.0

Backport:

2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN

[ruby-core:84573]

Description

According to the documentation for Array#&, comparison is done using hash and eql? However, this does not seem to be the case in 2.5.0.

If two instances are .eql? but their hashes are not, an array & should be empty:

([Var.new('a')] & [Var.new('b')]).empty?

This test works in 2.4.2 (and all earlier versions), but fails in 2.5.0. See attached script.

Files

Download all files

array_intersection.rb (535 Bytes) array_intersection.rb	Failing array intersection test	gkellogg (Gregg Kellogg), 12/31/2017 09:58 PM
test-14264.rb (956 Bytes) test-14264.rb		styrmis (Stefan Magnuson), 03/21/2018 02:13 PM
array-eql-by-hash.patch (2.09 KB) array-eql-by-hash.patch	Patch to compare by hash in Array set operations for all array sizes	styrmis (Stefan Magnuson), 03/25/2018 09:38 PM
object-eql-documentation-tweak.patch (825 Bytes) object-eql-documentation-tweak.patch	Patch to documentation for Object#eql? to add warning regarding hash equality	styrmis (Stefan Magnuson), 03/25/2018 09:38 PM

Related issues 1 (0 open — 1 closed)

Actions

Copy link

#1 [ruby-core:84575]

Updated by marcandre (Marc-Andre Lafortune) over 7 years ago

If two instances are .eql? but their hashes are not

Then eql? or hash are ill-defined. The definition of hash is that any two objects that are eql? must have identical hash values. Note that objects that are not eql? may also have identical hash values, although it is preferable when they don't.

Actions

Copy link

#2 [ruby-core:86241]

Updated by styrmis (Stefan Magnuson) over 7 years ago

File test-14264.rb test-14264.rb added

We (myself and Lewis Buckley) investigated this issue as part of the Ruby Hack Day at Cookpad.

We were able to replicate the issue on Ruby 2.6 (trunk)—digging a bit deeper we found that the unexpected behaviour only occurs when both arrays have <= 16 elements.

This is because in Ruby 2.5 a performance enhancement was added to avoid building a hash when the array sizes are small.

The version for small arrays compares elements using #eql? whereas the implementation in 2.4 defers to the Hash implementation of equality checking which appears to use the hash directly (and not the #eql? method).

We discussed with Matz during the event, and the conclusion at the time was that:

The example as given has an inconsistent implementation (between the implementation of Val#eql? and Val#hash), and as such the behaviour is undefined, and could reasonably change between versions
The documentation for Object#eql? could be extended to make this clearer

So one possible improvement would be to update the documentation for Object#eql? to make clear that when #eql? is overridden it should be kept consistent with the result of a comparison of the #hash of two objects.

Another possible solution might be to make both variants (for both large and small arrays) use the same method of comparison, i.e. both #eql? or both hash equivalence.

In a related issue it is suggested that the given #eql? method implementation is invalid. In terms of the documentation, it might be helpful to give some guidelines regarding how #eql? implementations in subclasses should work, e.g.

Where possible, override the #hash method instead of #eql?, which should avoid the inconsistency posed in this issue
When overriding #eql?, take care when adding logic which does not make use of #eql? on the related objects, e.g. the true return value in the attached example is completely unrelated to v#eql?

Actions

Copy link Download all files

#3 [ruby-core:86295]

Updated by styrmis (Stefan Magnuson) over 7 years ago

File array-eql-by-hash.patch array-eql-by-hash.patch added
File object-eql-documentation-tweak.patch object-eql-documentation-tweak.patch added

Attached are two patches:

One which uses hash comparison for both the optimised "small" array branch and the default implementation of the Array set operations (essentially restoring the behaviour of Ruby < 2.5)
Another which adds a warning to the documentation for Array#eql?, stating that the link between hash equality and eql? must not be broken in subclasses

As the behaviour being relied upon in Ruby < 2.5 is not well defined (#eql? has been modified to not take the hashes of compared objects into account), then perhaps updating the documentation is the better approach.

If backwards compatibility of the (undefined) behaviour is to be maintained, then I believe the patch to Array achieves this, though based on its poor performance (see below) it is perhaps of limited value.

All feedback is of course very welcome!

Benchmarks¶

SMALL_ARRAY_MIN=9 SMALL_ARRAY_MAX=16 SMALL_ARRAY_ITERATIONS=10000 ../install/bin/ruby benchmark/driver.rb --pattern='array_small' --executables='HEAD::../install-trunk/bin/ruby; PATCH::../install/bin/ruby;' -r 10 --format=markdown

benchmark results:
Execution time (sec)

name	HEAD	PATCH
array_small_and	0.796	3.925
array_small_diff	0.645	3.277
array_small_or	0.908	0.913

Speedup ratio: compare with the result of `HEAD' (greater is better)

name	PATCH
array_small_and	0.203
array_small_diff	0.197
array_small_or	0.994

Actions

Copy link

Updated by jeremyevans0 (Jeremy Evans) about 6 years ago

Related to Bug #14573: rb_ary_or doesn't check objects hash when the array contains less than SMALL_ARRAY_LEN added

Actions

Copy link

Updated by jeremyevans (Jeremy Evans) about 6 years ago

Status changed from Open to Closed

Applied in changeset git|8bccbf3cfea8c1059d40db5b5e61900cf6cf29d3.

Add more documentation on #eql?/#hash relationship [ci skip]

Fixes [Bug #14263]

Actions

Copy link

Also available in: Atom PDF

Like0

Like0Like0Like0Like0Like0

Project

General

Profile

Ruby

Tags

Custom queries

Bug #14263

Array Intersection does not seem to use hash

Updated by marcandre (Marc-Andre Lafortune) over 7 years ago

Updated by styrmis (Stefan Magnuson) over 7 years ago

Updated by styrmis (Stefan Magnuson) over 7 years ago

Benchmarks¶

Updated by jeremyevans0 (Jeremy Evans) about 6 years ago

Updated by jeremyevans (Jeremy Evans) about 6 years ago