Project

General

Profile

Feature #14097

Add union and difference to Array

Added by ana06 (Ana Maria Martinez Gomez) 9 months ago. Updated 3 months ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:83721]

Description

Currently there is a concat method in ruby which does the same as +, but modifying the object. We could introduce a union and difference methods, which would be the equivalent for the | and - operators. This operators are normally less used due to lack of readability and expressivity. You end seeing thinks like:

array.concat(array2).uniq!

just because it is more readable. When it could be written like:

array |= array2

But, as this is not clear for some people, the new method will solve this problem:

array.union(array2)

And now this is clean and readable, as everybody expect from Ruby, the language focused on simplicity and productivity. ;)

Can I send a PR? :)


Related issues

Related to Ruby trunk - Feature #14105: Introduce xor as alias for Set#^Feedback

History

#1 [ruby-core:83724] Updated by ana06 (Ana Maria Martinez Gomez) 9 months ago

This will also allow to add multiple arguments to the union, which is currently not possible:

array.union(array1, array2)

#2 [ruby-core:83727] Updated by mame (Yusuke Endoh) 9 months ago

I'm neutral to your proposal itself. My two cents: Array#union should return a new array instead of modifying self, and Array#union! should be its modifying version.

#3 [ruby-core:83729] Updated by ana06 (Ana Maria Martinez Gomez) 9 months ago

I think it is a great idea. I do not understand why concat modify the array, as most of the method of the Array class has a ! method for that. Should I also introduced a concat! method?

#4 [ruby-core:83734] Updated by k0kubun (Takashi Kokubun) 9 months ago

I think it is a great idea. I do not understand why concat modify the array, as most of the method of the Array class has a ! method for that. Should I also introduced a concat! method?

Probably changing #concat to be non-destructive is too breaking for backward compatibility. We can use #+ for that purpose. And having #concat and #concat! in the same behavior would be just confusing.

#5 [ruby-core:83740] Updated by ana06 (Ana Maria Martinez Gomez) 9 months ago

What about introducing concat! with the same behaviour as concat and deprecating concat. Then we could in the feature give concat the behaviour it deserves. It is confusing as well that this method modify the object and I think we should fix this.

#6 [ruby-core:83742] Updated by jeremyevans0 (Jeremy Evans) 9 months ago

ana06 (Ana Maria Martinez Gomez) wrote:

What about introducing concat! with the same behaviour as concat and deprecating concat. Then we could in the feature give concat the behaviour it deserves. It is confusing as well that this method modify the object and I think we should fix this.

Regarding concat!, there seems to be a misunderstanding that methods should end with ! to be mutating. That is not the convention in the core classes. The core classes have many methods that are mutating but do not end in !. The convention regarding ! is if there is both a method with ! and a method without, the version with ! mutates and the version without returns a potentially modified copy.

The following Array methods that do not end in ! are mutating, so if you want to change concat for "consistency", you would have to change all of them:

  • clear
  • concat
  • delete
  • delete_at
  • delete_if
  • fill
  • insert
  • keep_if
  • pop
  • push
  • replace
  • shift
  • unshift

You'd also probably have to change String and Hash similarly if you wanted this "consistency" in regards to !.

In regards to union, not all arrays are sets, and I'm not in favor of introducing additional set-specific methods to Array. Set#union is already implemented.

#7 [ruby-core:83751] Updated by ana06 (Ana Maria Martinez Gomez) 9 months ago

I would that the difference is that there are some method where is not expected that the Array is modified, and some others where you expect it. So, for example, with pop and push I don't think we should have two methods, one which modify the object and another one which does it, but at least I would keep the Array operators methods consistent.

jeremyevans0 (Jeremy Evans) do you find it coherent having you union method and only one concat? Won't be that confusing? you will always need to check the documentation as you won't know when the object is modified and when not.

Also, union is an operation with Arrays when you when to use then as set for any reason. The main difference is that in the Array the order of the elements matter, so that is unrelated to the Set class.

#8 [ruby-core:83754] Updated by jeremyevans0 (Jeremy Evans) 9 months ago

ana06 (Ana Maria Martinez Gomez) wrote:

jeremyevans0 (Jeremy Evans) do you find it coherent having you union method and only one concat? Won't be that confusing? you will always need to check the documentation as you won't know when the object is modified and when not.

I don't think renaming concat to concat! makes things more coherent. We already have + for a concat that returns a new array. Yes, if you are unfamiliar with the methods you will probably need to read the documentation.

The array class already has a union operator (|) which returns a new array, and in combination with replace you can easily build union. union doesn't seem a common enough need to warrant adding as a separate core method.

#9 [ruby-core:83755] Updated by ana06 (Ana Maria Martinez Gomez) 9 months ago

jeremyevans0 (Jeremy Evans)

The array class already has a union operator (|) which returns a new array, and in combination with replace you can easily build union. union doesn't seem a common enough need to warrant adding as a separate core method.

Yes, a operator that is not clear for many people. I think Ruby deserve something more readable and elegant. Moreover, union would allow to make the union of more than 2 arrays at the same time in a much more efficient way than applying | several times. So it is not only an "stetic" change, it is also a performance improvement.

I am not sure what to mean what replace to build a union. Can you please elaborate?

#10 [ruby-core:83756] Updated by jeremyevans0 (Jeremy Evans) 9 months ago

ana06 (Ana Maria Martinez Gomez) wrote:

jeremyevans0 (Jeremy Evans)

The array class already has a union operator (|) which returns a new array, and in combination with replace you can easily build union. union doesn't seem a common enough need to warrant adding as a separate core method.

Yes, a operator that is not clear for many people. I think Ruby deserve something more readable and elegant.

The argument against the | operator could potentially apply to any operator. Most things are unclear until they are learned. Someone with no knowledge of English might find the | operator more clear than the union method.

Moreover, union would allow to make the union of more than 2 arrays at the same time in a much more efficient way than applying | several times. So it is not only an "stetic" change, it is also a performance improvement.

You could build union without a nested application of |. No doubt you could get the maximum performance by implementing it in C, but I don't believe the cost of maintaining such code is worth it, considering how often it is used.

I am not sure what to mean what replace to build a union. Can you please elaborate?

class Array
  def union(*other)
    ret = self
    other.each{|a| ret |= a}
    replace(ret)
  end
  # or
  def union(*other)
    tmp = other.unshift(self)
    tmp.flatten!(1)
    tmp.uniq!
    replace(tmp)
  end
end

In similar cases in the past, the recommendation has often been to build the functionality as a gem, and if the gem gets popular and widely used, then it can be considered for inclusion in core.

#11 [ruby-core:83786] Updated by ana06 (Ana Maria Martinez Gomez) 9 months ago

The argument against the | operator could potentially apply to any operator. Most things are unclear until they are learned. Someone with no knowledge of English might find the | operator more clear than the union method.

Ruby is the language, which claims to have an elegant syntax that is natural to read and easy to write. And that's why it have readable method for operators. And that's what people love about Ruby. But I am not saying that we should remove the operator, so you can keep using it. ;)

You could build union without a nested application of |. No doubt you could get the maximum performance by implementing it in C, but I don't believe the cost of maintaining such code is worth it, considering how often it is used.

It is some really simple code. And I will implement it, why do you care about the effort to do it? It is someone else effort, who is really happy to do it.

In similar cases in the past, the recommendation has often been to build the functionality as a gem, and if the gem gets popular and widely used, then it can be considered for inclusion in core.

What I want is to provide an efficient union of several arrays, that need to be implemented in C. Implementing this in a Ruby gem makes no point at all.

#12 [ruby-core:83787] Updated by jeremyevans0 (Jeremy Evans) 9 months ago

ana06 (Ana Maria Martinez Gomez) wrote:

You could build union without a nested application of |. No doubt you could get the maximum performance by implementing it in C, but I don't believe the cost of maintaining such code is worth it, considering how often it is used.

It is some really simple code. And I will implement it, why do you care about the effort to do it? It is someone else effort, who is really happy to do it.

I care because once it is added, it is impossible to remove without breaking backwards compatibility. This is not a one time cost of initial implementation, it's a perpetual maintenance cost.

I'm not in favor of adding methods to the core classes simply because they are useful in certain cases. If we added every method to the core classes that was useful in specific cases, we'd eventually have thousands of methods in each core class.

In similar cases in the past, the recommendation has often been to build the functionality as a gem, and if the gem gets popular and widely used, then it can be considered for inclusion in core.

What I want is to provide an efficient union of several arrays, that need to be implemented in C. Implementing this in a Ruby gem makes no point at all.

It appears you may not be aware that plenty of ruby gems are implemented in C using ruby's C-API and have same performance as if they were part of ruby core.

#13 [ruby-core:84234] Updated by ana06 (Ana Maria Martinez Gomez) 8 months ago

I'm not in favor of adding methods to the core classes simply because they are useful in certain cases. If we added every method to the core classes that was useful in specific cases, we'd eventually have thousands of methods in each core class.

It is

  • useful

  • more efficient

  • consistent with the concat method in the same class

  • readable

  • elegant

  • easy to use and to read

  • follow Ruby principle of allowing to do thing in several ways

  • avoid that people use inefficient methods to make the code understandable

It appears you may not be aware that plenty of ruby gems are implemented in C using ruby's C-API and have same performance as if they were part of ruby core.

Your example was in Ruby... But it makes no sense a Ruby gem just to add two methods in a class where there is already a similar method for another operator. For something bigger I would agree with you... but this is just a method which makes sense to add to the class...

#14 Updated by ana06 (Ana Maria Martinez Gomez) 3 months ago

  • Backport deleted (2.3: UNKNOWN, 2.4: UNKNOWN)
  • Tracker changed from Bug to Feature

#15 Updated by matz (Yukihiro Matsumoto) 3 months ago

#16 [ruby-core:87140] Updated by matz (Yukihiro Matsumoto) 3 months ago

Thank you for the proposal.

I am not sure your real intention. Do you want mutating variation of or-operator?
Or just more readable alias of or-operator?

Matz.

#17 [ruby-core:87170] Updated by duerst (Martin Dürst) 3 months ago

matz (Yukihiro Matsumoto) wrote:

I am not sure your real intention. Do you want mutating variation of or-operator?
Or just more readable alias of or-operator?

mame (Yusuke Endoh) wrote:

I'm neutral to your proposal itself. My two cents: Array#union should return a new array instead of modifying self, and Array#union! should be its modifying version.

I would definitely prefer Yusuke's version to a version where Array#union is not modifying. While the modifying version will occasionally be useful, in general, we should gently push people towards using non-modifying code.

#18 [ruby-core:87246] Updated by ana06 (Ana Maria Martinez Gomez) 3 months ago

matz (Yukihiro Matsumoto)

I am not sure your real intention. Do you want mutating variation of or-operator?
Or just more readable alias of or-operator?

Thanks for taking a look at the issue. What I am proposing is a new union method that it is an alias for | in the case of two arrays but that it is also more efficient in the case of more than two arrays. Exactly as it happens with + and concat. concat, apart from modifying the first array (which maybe shouldn't be the case) is more readable, but in the case on more than two arrays is more efficient as well.

I also send a PR with a possible implementation: https://github.com/ruby/ruby/pull/1747

#19 [ruby-core:87247] Updated by ana06 (Ana Maria Martinez Gomez) 3 months ago

and this is not necessarily related to Feature #14105. I would say that they are two different topics even if both of them aim for readability. In the case of set is only an alias and there is not a similar case as it happens here with concat

#20 [ruby-core:87257] Updated by Student (Nathan Zook) 3 months ago

I cannot say that I am a fan of this proposal. To be fair, I'm not a fan of #|.

Arrays are not sets. Trying to treat them as if they are is an error, and will create subtle problems.

What should be the result of the following operations?
[1, 1] | [1]
[1] | [1, 1]

Of course, there are more interesting examples. These two are to get you started.

I don't care what the results currently are. I don't care what you think they should be. I can present extremely strong arguments for various answers. For this reason, I believe that #| is an ill-defined concept.

Generalizing an ill-defined concept is a world of pain.

If you insist on treating objects of one class as if they were members of a different class, there should be bumps in the road to at least warn you that maybe this is a bad idea.

I'm not going to argue that we should remove or deprecate #|. I don't think of myself as a fanatic. But encouraging this sort of abuse of the type system just creates problems.

Also available in: Atom PDF