Feature #16146
openArray .difference allow custom comparison
Added by ngomez (Nancy Gomez) about 5 years ago. Updated about 5 years ago.
Description
Hello!
I wanted to know if there's any plan to implement the ability to define comparison between individual items in arrays.
Here is where I originally asked the question on Stack Overflow: https://stackoverflow.com/questions/57316775/is-there-any-way-to-specify-how-to-compare-of-array-of-objects-for-difference-f
But specifically, I noticed nearly all the existing Array functions allow us to define ways for comparison, such as #uniq but #difference does not :( This is quite unfortunate but I'm hoping it's just because this function was introduced in 2.6.0, and hopefully this feature can be implemented?
Thanks for any and all info :)
Updated by ngomez (Nancy Gomez) about 5 years ago
- Subject changed from Array .distance allow custom comparison to Array .difference allow custom comparison
Updated by duerst (Martin Dürst) about 5 years ago
Please provide an actual usage example directly in this issue, not only by reference.
Updated by shevegen (Robert A. Heiler) about 5 years ago
Seems to be the same link as https://bugs.ruby-lang.org/issues/16118; so perhaps accidental double
post; could close one of these two - or perhaps both. :D
As there was not yet any comment on the other post, I'll comment briefly here.
I think the standard way for custom comparisons in ruby is to define the strategy you wish to
employ in order to compare e. g elements in an array. It's a bit short-sighted to think that
one has to rely solely on .difference alone for that comparison.
such as #uniq but #difference does not :(
If the issue is related between #uniq and #difference then I think the issue request should
focus on precisely that part. This would make it easier for the ruby core team (respectively
matz) to consider the merits (pro/con) of the suggestion itself. It's too difficult to
try to understand what is meant in external sites otherwise, at the least I think it is.
Updated by ngomez (Nancy Gomez) about 5 years ago
As recommended I'll place in example directly here:
The feature I am requesting is to allow custom comparison of the Array #difference.
Currently it is comparing with each item's #eql and #hash but in the case of Objects you would need to redefine these to get the intended result.
In my case, I can not simply redefine these as getting the difference of the two arrays of Objects is one small portion of the overall logic, the objects I am using come from a third party which also uses the #eql and #hash for other logic I rely on. So I would have to set and somehow unset my definition (which not sure I can unset anyway) or I can just not use the #difference which seems like such a waste because it would be such an elegant and clear solution.
Here is an example of my desired usage, let's say I have the following class:
# Defined by a third party which uses #eql and #hash for other logic
class Num
def initialize(val)
@val = val
end
def val
@val
end
end
And I have the following arrays:
all = [Num.new(1), Num.new(2), Num.new(3)]
subset = [Num.new(1), Num.new(2)]
I want to find all the values that the subset
is missing from all
:
all.difference(subset)
=> [#<Num:0x00007fcae19e9540 @val=1>, #<Num:0x00007fcae19e9518 @val=2>, #<Num:0x00007fcae19e94f0 @val=3>]
But of course, #difference is comparing view #eql and #hash so I attempt to define how to compare:
all.difference(subset) { |a, b| a.val <=> b.val }
=> [#<Num:0x00007fcae19e9540 @val=1>, #<Num:0x00007fcae19e9518 @val=2>, #<Num:0x00007fcae19e94f0 @val=3>]
```
Still no luck, and I'm not 100% sure that the syntax is exactly correct anyway, so I take a look at the docs: https://ruby-doc.org/core-2.6/Array.html#method-i-difference
And notice no example shows custom comparison, and the implementation doesn't seem to support it either.
I would love it if instead this would be the result:
```
all.difference(subset) { |a, b| a.val <=> b.val }
=> [#<Num:0x00007fcae19e94f0 @val=3>]
```
Does that make sense?
Updated by Dan0042 (Daniel DeLorme) about 5 years ago
What you're asking for is a O(n²) operation. I'm not sure it's a good idea to make that so easy and transparent as a core method.
This is quite easy to code and makes the (inefficiency of) two loops more apparent:
all.reject{ |a| subset.find{ |s| s.val == a.val } }
The one advantage I can see to the <=> comparator though is that it could possibly result in an algorithm more efficient than O(n²). The implementation could sort the two lists and then do a diff-like operation.
only_all,intersection,only_subset = all.uniqdiff3(subset){ |a,b| a.val <=> b.val }
union = only_all + intersection + only_subset
Updated by jonathanhefner (Jonathan Hefner) about 5 years ago
Dan0042 (Daniel DeLorme) wrote:
What you're asking for is a O(n²) operation. I'm not sure it's a good idea to make that so easy and transparent as a core method.
Although the given example specified a two-argument block, the original post references uniq
, which accepts a one-argument block. Using a one-argument block can result in an O(n) implementation. Here is a different example:
Dependency = Struct.new(:name, :version)
deps1 = [Dependency.new("foo", "1.1"), Dependency.new("bar", "2.0"), Dependency.new("qux", "1.0")]
deps2 = [Dependency.new("foo", "1.0"), Dependency.new("bar", "1.0")]
deps1.difference(deps2, &:name)
# == [Dependency.new("qux", "1.0")]
deps1.difference(deps2){|dep| [dep.name, dep.version[/^\d+/]] }
# == [Dependency.new("bar", "2.0"), Dependency.new("qux", "1.0")]
Updated by duerst (Martin Dürst) about 5 years ago
jonathanhefner (Jonathan Hefner) wrote:
Dan0042 (Daniel DeLorme) wrote:
What you're asking for is a O(n²) operation. I'm not sure it's a good idea to make that so easy and transparent as a core method.
Although the given example specified a two-argument block, the original post references
uniq
, which accepts a one-argument block. Using a one-argument block can result in an O(n) implementation.
Well, that assumes that the block return values can be ordered. But it's easy to create cases where values can be compared for equality, but not ordered. Example: [1, 'abc']
.
Also, this proposal is about adding a block to difference
, but if we do that, we should add it to union
and intersection
and so on, too.
Updated by jonathanhefner (Jonathan Hefner) about 5 years ago
duerst (Martin Dürst) wrote:
Well, that assumes that the block return values can be ordered. But it's easy to create cases where values can be compared for equality, but not ordered. Example:
[1, 'abc']
.
I'm not sure that I follow. Here is a naive implementation of what I was envisioning:
def difference(*other_arrays, &block)
if block
rejected = other_arrays.flatten(1).map(&block).to_set
self.reject{|element| rejected.include?(block.call(element)) }
else
# original implementation
end
end
Also, this proposal is about adding a block to
difference
, but if we do that, we should add it tounion
andintersection
and so on, too.
Agreed, that would probably be best for consistency.
Updated by duerst (Martin Dürst) about 5 years ago
jonathanhefner (Jonathan Hefner) wrote:
duerst (Martin Dürst) wrote:
Well, that assumes that the block return values can be ordered. But it's easy to create cases where values can be compared for equality, but not ordered. Example:
[1, 'abc']
.I'm not sure that I follow.
In my sentence above, 'that' refers to 'an O(n) implementation'. Naive implementations don't have that problem, but they are slow (O(n²)).
Here is a naive implementation of what I was envisioning:
Updated by jonathanhefner (Jonathan Hefner) about 5 years ago
duerst (Martin Dürst) wrote:
In my sentence above, 'that' refers to 'an O(n) implementation'. Naive implementations don't have that problem, but they are slow (O(n²)).
But the implementation I posted is O(n). (Note that rejected
is a Set rather than an Array, so rejected.include?
should be O(1)).