Project

General

Profile

Feature #16146

Array .difference allow custom comparison

Added by ngomez (Nancy Gomez) about 1 month ago. Updated 6 days ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:94787]

Description

Hello!

I wanted to know if there's any plan to implement the ability to define comparison between individual items in arrays.

Here is where I originally asked the question on Stack Overflow: https://stackoverflow.com/questions/57316775/is-there-any-way-to-specify-how-to-compare-of-array-of-objects-for-difference-f

But specifically, I noticed nearly all the existing Array functions allow us to define ways for comparison, such as #uniq but #difference does not :( This is quite unfortunate but I'm hoping it's just because this function was introduced in 2.6.0, and hopefully this feature can be implemented?

Thanks for any and all info :)

History

#1

Updated by ngomez (Nancy Gomez) about 1 month ago

  • Subject changed from Array .distance allow custom comparison to Array .difference allow custom comparison

Updated by duerst (Martin Dürst) about 1 month ago

Please provide an actual usage example directly in this issue, not only by reference.

Updated by shevegen (Robert A. Heiler) about 1 month ago

Seems to be the same link as https://bugs.ruby-lang.org/issues/16118; so perhaps accidental double
post; could close one of these two - or perhaps both. :D

As there was not yet any comment on the other post, I'll comment briefly here.

I think the standard way for custom comparisons in ruby is to define the strategy you wish to
employ in order to compare e. g elements in an array. It's a bit short-sighted to think that
one has to rely solely on .difference alone for that comparison.

such as #uniq but #difference does not :(

If the issue is related between #uniq and #difference then I think the issue request should
focus on precisely that part. This would make it easier for the ruby core team (respectively
matz) to consider the merits (pro/con) of the suggestion itself. It's too difficult to
try to understand what is meant in external sites otherwise, at the least I think it is.

Updated by ngomez (Nancy Gomez) about 1 month ago

As recommended I'll place in example directly here:

The feature I am requesting is to allow custom comparison of the Array #difference.
Currently it is comparing with each item's #eql and #hash but in the case of Objects you would need to redefine these to get the intended result.

In my case, I can not simply redefine these as getting the difference of the two arrays of Objects is one small portion of the overall logic, the objects I am using come from a third party which also uses the #eql and #hash for other logic I rely on. So I would have to set and somehow unset my definition (which not sure I can unset anyway) or I can just not use the #difference which seems like such a waste because it would be such an elegant and clear solution.

Here is an example of my desired usage, let's say I have the following class:

# Defined by a third party which uses #eql and #hash for other logic
class Num
  def initialize(val)
    @val = val
  end

  def val
    @val
  end
end

And I have the following arrays:

all = [Num.new(1), Num.new(2), Num.new(3)]
subset = [Num.new(1), Num.new(2)]

I want to find all the values that the subset is missing from all:

all.difference(subset)
=> [#<Num:0x00007fcae19e9540 @val=1>, #<Num:0x00007fcae19e9518 @val=2>, #<Num:0x00007fcae19e94f0 @val=3>]

But of course, #difference is comparing view #eql and #hash so I attempt to define how to compare:

all.difference(subset) { |a, b| a.val <=> b.val }
=> [#<Num:0x00007fcae19e9540 @val=1>, #<Num:0x00007fcae19e9518 @val=2>, #<Num:0x00007fcae19e94f0 @val=3>]

Still no luck, and I'm not 100% sure that the syntax is exactly correct anyway, so I take a look at the docs: https://ruby-doc.org/core-2.6/Array.html#method-i-difference

And notice no example shows custom comparison, and the implementation doesn't seem to support it either.

I would love it if instead this would be the result:

all.difference(subset) { |a, b| a.val <=> b.val }
=> [#<Num:0x00007fcae19e94f0 @val=3>]

Does that make sense?

Updated by Dan0042 (Daniel DeLorme) about 1 month ago

What you're asking for is a O(n²) operation. I'm not sure it's a good idea to make that so easy and transparent as a core method.

This is quite easy to code and makes the (inefficiency of) two loops more apparent:

all.reject{ |a| subset.find{ |s| s.val == a.val } }

The one advantage I can see to the <=> comparator though is that it could possibly result in an algorithm more efficient than O(n²). The implementation could sort the two lists and then do a diff-like operation.

only_all,intersection,only_subset = all.uniqdiff3(subset){ |a,b| a.val <=> b.val }
union = only_all + intersection + only_subset

Updated by jonathanhefner (Jonathan Hefner) 13 days ago

Dan0042 (Daniel DeLorme) wrote:

What you're asking for is a O(n²) operation. I'm not sure it's a good idea to make that so easy and transparent as a core method.

Although the given example specified a two-argument block, the original post references uniq, which accepts a one-argument block. Using a one-argument block can result in an O(n) implementation. Here is a different example:

Dependency = Struct.new(:name, :version)

deps1 = [Dependency.new("foo", "1.1"), Dependency.new("bar", "2.0"), Dependency.new("qux", "1.0")]

deps2 = [Dependency.new("foo", "1.0"), Dependency.new("bar", "1.0")] 

deps1.difference(deps2, &:name)
# == [Dependency.new("qux", "1.0")]

deps1.difference(deps2){|dep| [dep.name, dep.version[/^\d+/]] }
# == [Dependency.new("bar", "2.0"), Dependency.new("qux", "1.0")]

Updated by duerst (Martin Dürst) 6 days ago

jonathanhefner (Jonathan Hefner) wrote:

Dan0042 (Daniel DeLorme) wrote:

What you're asking for is a O(n²) operation. I'm not sure it's a good idea to make that so easy and transparent as a core method.

Although the given example specified a two-argument block, the original post references uniq, which accepts a one-argument block. Using a one-argument block can result in an O(n) implementation.

Well, that assumes that the block return values can be ordered. But it's easy to create cases where values can be compared for equality, but not ordered. Example: [1, 'abc'].

Also, this proposal is about adding a block to difference, but if we do that, we should add it to union and intersection and so on, too.

Also available in: Atom PDF