Feature #14097
closedAdd union and difference to Array
Description
Currently there is a concat method in ruby which does the same as +, but modifying the object. We could introduce a union and difference methods, which would be the equivalent for the |
and -
operators. This operators are normally less used due to lack of readability and expressivity. You end seeing thinks like:
array.concat(array2).uniq!
just because it is more readable. When it could be written like:
array |= array2
But, as this is not clear for some people, the new method will solve this problem:
array.union(array2)
And now this is clean and readable, as everybody expect from Ruby, the language focused on simplicity and productivity. ;)
Can I send a PR? :)
Updated by ana06 (Ana Maria Martinez Gomez) about 7 years ago
This will also allow to add multiple arguments to the union, which is currently not possible:
array.union(array1, array2)
Updated by mame (Yusuke Endoh) about 7 years ago
I'm neutral to your proposal itself. My two cents: Array#union
should return a new array instead of modifying self, and Array#union!
should be its modifying version.
Updated by ana06 (Ana Maria Martinez Gomez) about 7 years ago
I think it is a great idea. I do not understand why concat
modify the array, as most of the method of the Array class has a !
method for that. Should I also introduced a concat!
method?
Updated by k0kubun (Takashi Kokubun) about 7 years ago
I think it is a great idea. I do not understand why concat modify the array, as most of the method of the Array class has a ! method for that. Should I also introduced a concat! method?
Probably changing #concat to be non-destructive is too breaking for backward compatibility. We can use #+ for that purpose. And having #concat and #concat! in the same behavior would be just confusing.
Updated by ana06 (Ana Maria Martinez Gomez) about 7 years ago
What about introducing concat!
with the same behaviour as concat
and deprecating concat
. Then we could in the feature give concat
the behaviour it deserves. It is confusing as well that this method modify the object and I think we should fix this.
Updated by jeremyevans0 (Jeremy Evans) about 7 years ago
ana06 (Ana Maria Martinez Gomez) wrote:
What about introducing
concat!
with the same behaviour asconcat
and deprecatingconcat
. Then we could in the feature giveconcat
the behaviour it deserves. It is confusing as well that this method modify the object and I think we should fix this.
Regarding concat!
, there seems to be a misunderstanding that methods should end with !
to be mutating. That is not the convention in the core classes. The core classes have many methods that are mutating but do not end in !
. The convention regarding !
is if there is both a method with !
and a method without, the version with !
mutates and the version without returns a potentially modified copy.
The following Array methods that do not end in !
are mutating, so if you want to change concat
for "consistency", you would have to change all of them:
- clear
- concat
- delete
- delete_at
- delete_if
- fill
- insert
- keep_if
- pop
- push
- replace
- shift
- unshift
You'd also probably have to change String and Hash similarly if you wanted this "consistency" in regards to !
.
In regards to union
, not all arrays are sets, and I'm not in favor of introducing additional set-specific methods to Array. Set#union
is already implemented.
Updated by ana06 (Ana Maria Martinez Gomez) about 7 years ago
I would that the difference is that there are some method where is not expected that the Array is modified, and some others where you expect it. So, for example, with pop
and push
I don't think we should have two methods, one which modify the object and another one which does it, but at least I would keep the Array operators methods consistent.
@jeremyevans0 (Jeremy Evans) do you find it coherent having you union method and only one concat? Won't be that confusing? you will always need to check the documentation as you won't know when the object is modified and when not.
Also, union
is an operation with Arrays when you when to use then as set for any reason. The main difference is that in the Array the order of the elements matter, so that is unrelated to the Set
class.
Updated by jeremyevans0 (Jeremy Evans) about 7 years ago
ana06 (Ana Maria Martinez Gomez) wrote:
@jeremyevans0 (Jeremy Evans) do you find it coherent having you union method and only one concat? Won't be that confusing? you will always need to check the documentation as you won't know when the object is modified and when not.
I don't think renaming concat
to concat!
makes things more coherent. We already have +
for a concat that returns a new array. Yes, if you are unfamiliar with the methods you will probably need to read the documentation.
The array class already has a union operator (|
) which returns a new array, and in combination with replace
you can easily build union
. union
doesn't seem a common enough need to warrant adding as a separate core method.
Updated by ana06 (Ana Maria Martinez Gomez) about 7 years ago
The array class already has a union operator (|) which returns a new array, and in combination with replace you can easily build union. union doesn't seem a common enough need to warrant adding as a separate core method.
Yes, a operator that is not clear for many people. I think Ruby deserve something more readable and elegant. Moreover, union
would allow to make the union of more than 2 arrays at the same time in a much more efficient way than applying |
several times. So it is not only an "stetic" change, it is also a performance improvement.
I am not sure what to mean what replace to build a union. Can you please elaborate?
Updated by jeremyevans0 (Jeremy Evans) about 7 years ago
ana06 (Ana Maria Martinez Gomez) wrote:
The array class already has a union operator (|) which returns a new array, and in combination with replace you can easily build union. union doesn't seem a common enough need to warrant adding as a separate core method.
Yes, a operator that is not clear for many people. I think Ruby deserve something more readable and elegant.
The argument against the |
operator could potentially apply to any operator. Most things are unclear until they are learned. Someone with no knowledge of English might find the |
operator more clear than the union
method.
Moreover,
union
would allow to make the union of more than 2 arrays at the same time in a much more efficient way than applying|
several times. So it is not only an "stetic" change, it is also a performance improvement.
You could build union
without a nested application of |
. No doubt you could get the maximum performance by implementing it in C, but I don't believe the cost of maintaining such code is worth it, considering how often it is used.
I am not sure what to mean what replace to build a union. Can you please elaborate?
class Array
def union(*other)
ret = self
other.each{|a| ret |= a}
replace(ret)
end
# or
def union(*other)
tmp = other.unshift(self)
tmp.flatten!(1)
tmp.uniq!
replace(tmp)
end
end
In similar cases in the past, the recommendation has often been to build the functionality as a gem, and if the gem gets popular and widely used, then it can be considered for inclusion in core.
Updated by ana06 (Ana Maria Martinez Gomez) about 7 years ago
The argument against the | operator could potentially apply to any operator. Most things are unclear until they are learned. Someone with no knowledge of English might find the | operator more clear than the union method.
Ruby is the language, which claims to have an elegant syntax that is natural to read and easy to write. And that's why it have readable method for operators. And that's what people love about Ruby. But I am not saying that we should remove the operator, so you can keep using it. ;)
You could build union without a nested application of |. No doubt you could get the maximum performance by implementing it in C, but I don't believe the cost of maintaining such code is worth it, considering how often it is used.
It is some really simple code. And I will implement it, why do you care about the effort to do it? It is someone else effort, who is really happy to do it.
In similar cases in the past, the recommendation has often been to build the functionality as a gem, and if the gem gets popular and widely used, then it can be considered for inclusion in core.
What I want is to provide an efficient union of several arrays, that need to be implemented in C. Implementing this in a Ruby gem makes no point at all.
Updated by jeremyevans0 (Jeremy Evans) about 7 years ago
ana06 (Ana Maria Martinez Gomez) wrote:
You could build union without a nested application of |. No doubt you could get the maximum performance by implementing it in C, but I don't believe the cost of maintaining such code is worth it, considering how often it is used.
It is some really simple code. And I will implement it, why do you care about the effort to do it? It is someone else effort, who is really happy to do it.
I care because once it is added, it is impossible to remove without breaking backwards compatibility. This is not a one time cost of initial implementation, it's a perpetual maintenance cost.
I'm not in favor of adding methods to the core classes simply because they are useful in certain cases. If we added every method to the core classes that was useful in specific cases, we'd eventually have thousands of methods in each core class.
In similar cases in the past, the recommendation has often been to build the functionality as a gem, and if the gem gets popular and widely used, then it can be considered for inclusion in core.
What I want is to provide an efficient union of several arrays, that need to be implemented in C. Implementing this in a Ruby gem makes no point at all.
It appears you may not be aware that plenty of ruby gems are implemented in C using ruby's C-API and have same performance as if they were part of ruby core.
Updated by ana06 (Ana Maria Martinez Gomez) about 7 years ago
I'm not in favor of adding methods to the core classes simply because they are useful in certain cases. If we added every method to the core classes that was useful in specific cases, we'd eventually have thousands of methods in each core class.
It is
-
useful
-
more efficient
-
consistent with the
concat
method in the same class -
readable
-
elegant
-
easy to use and to read
-
follow Ruby principle of allowing to do thing in several ways
-
avoid that people use inefficient methods to make the code understandable
It appears you may not be aware that plenty of ruby gems are implemented in C using ruby's C-API and have same performance as if they were part of ruby core.
Your example was in Ruby... But it makes no sense a Ruby gem just to add two methods in a class where there is already a similar method for another operator. For something bigger I would agree with you... but this is just a method which makes sense to add to the class...
Updated by ana06 (Ana Maria Martinez Gomez) over 6 years ago
- Tracker changed from Bug to Feature
- Backport deleted (
2.3: UNKNOWN, 2.4: UNKNOWN)
Updated by matz (Yukihiro Matsumoto) over 6 years ago
- Related to Feature #14105: Introduce xor as alias for Set#^ added
Updated by matz (Yukihiro Matsumoto) over 6 years ago
Thank you for the proposal.
I am not sure your real intention. Do you want mutating variation of or-operator?
Or just more readable alias of or-operator?
Matz.
Updated by duerst (Martin Dürst) over 6 years ago
matz (Yukihiro Matsumoto) wrote:
I am not sure your real intention. Do you want mutating variation of or-operator?
Or just more readable alias of or-operator?
mame (Yusuke Endoh) wrote:
I'm neutral to your proposal itself. My two cents:
Array#union
should return a new array instead of modifying self, andArray#union!
should be its modifying version.
I would definitely prefer Yusuke's version to a version where Array#union is not modifying. While the modifying version will occasionally be useful, in general, we should gently push people towards using non-modifying code.
Updated by ana06 (Ana Maria Martinez Gomez) over 6 years ago
I am not sure your real intention. Do you want mutating variation of or-operator?
Or just more readable alias of or-operator?
Thanks for taking a look at the issue. What I am proposing is a new union method that it is an alias for |
in the case of two arrays but that it is also more efficient in the case of more than two arrays. Exactly as it happens with +
and concat
. concat
, apart from modifying the first array (which maybe shouldn't be the case) is more readable, but in the case on more than two arrays is more efficient as well.
I also send a PR with a possible implementation: https://github.com/ruby/ruby/pull/1747
Updated by ana06 (Ana Maria Martinez Gomez) over 6 years ago
and this is not necessarily related to Feature #14105. I would say that they are two different topics even if both of them aim for readability. In the case of set is only an alias and there is not a similar case as it happens here with concat
Updated by Student (Nathan Zook) over 6 years ago
I cannot say that I am a fan of this proposal. To be fair, I'm not a fan of #|.
Arrays are not sets. Trying to treat them as if they are is an error, and will create subtle problems.
What should be the result of the following operations?
[1, 1] | [1]
[1] | [1, 1]
Of course, there are more interesting examples. These two are to get you started.
I don't care what the results currently are. I don't care what you think they should be. I can present extremely strong arguments for various answers. For this reason, I believe that #| is an ill-defined concept.
Generalizing an ill-defined concept is a world of pain.
If you insist on treating objects of one class as if they were members of a different class, there should be bumps in the road to at least warn you that maybe this is a bad idea.
I'm not going to argue that we should remove or deprecate #|. I don't think of myself as a fanatic. But encouraging this sort of abuse of the type system just creates problems.
Updated by mame (Yusuke Endoh) over 6 years ago
Ana, I watched the video of your EuRuKo talk. I could understand the background of your proposal.
However, I'm now getting confused. I thought you agreed that Array#union
should be non-destructive, but in your talk you explained it was a destructive union operator. What are you driving at?
Updated by ana06 (Ana Maria Martinez Gomez) over 6 years ago
However, I'm now getting confused. I thought you agreed that Array#union should be non-destructive, but in your talk you explained it was a destructive union operator. What are you driving at?
In the talk I wanted to focus on the necessity of having an union
method without entering in the discussion of if it should be non-destructive or not. It didn't seem that important to me if it is needed to add a !
or not. But you are right, having union!
in the voting would have been better. @matz (Yukihiro Matsumoto) gave me the same feedback after the talk. ;)
Updated by mame (Yusuke Endoh) over 6 years ago
- Status changed from Open to Assigned
- Assignee set to matz (Yukihiro Matsumoto)
Thank you for the answer. I'm assigning this ticket to matz.
Updated by ana06 (Ana Maria Martinez Gomez) over 6 years ago
@Student as I just answered in the PR about the same concern, I'll answer here to:
I think there are some uses of case. First, it could be that you have an array in your application, would it be worthwhile to convert it to set just to be able to do a union? Second, what happen if you want to add the elements of an array to another array only if they were not already there but keeping the order? you can not do this with a set. Third, what if after the union you want to apply a method in the Array class that is not in the Set class (like for example
sort_by
)?
Updated by ana06 (Ana Maria Martinez Gomez) over 6 years ago
As @mame (Yusuke Endoh) already said, I gave a talk at Euruko about this issue:
https://youtu.be/jUc8InwoA-E?t=2m54s
I did an standing up voting. As in the video you can not see the audience, here is the picture I took during the talk with the result of the voting:
https://user-images.githubusercontent.com/16052290/44840875-5cd3bf00-ac42-11e8-84e9-815bc6473719.jpg
The quality of the picture is not really good, but I have pixelized the faces of the recognizable people wearing a red cord (as they don't want to appear on pictures). I hope is fine.
There are more picture from the conference organization in better quality, which I will add as soon as they send them to me.
Updated by mame (Yusuke Endoh) over 6 years ago
I'm now for the addition. By the way, don't you need Array#intersection
?
Updated by matz (Yukihiro Matsumoto) over 6 years ago
The final proposal seems reasonable. Accepted.
Matz.
Updated by shevegen (Robert A. Heiler) over 6 years ago
Funny idea with the picture. :D
Updated by ana06 (Ana Maria Martinez Gomez) over 6 years ago
The final proposal seems reasonable. Accepted.
Great to read that! 🎉
I have just rebased in the PR, looking forward to see the Array#union
method in Ruby!
@shevegen
Funny idea with the picture. :D
Thanks ;)
Updated by nobu (Nobuyoshi Nakada) over 6 years ago
- Status changed from Assigned to Closed
Applied in changeset trunk|r64787.
Add union method to Array
I introduce a union
method equivalent to the |
operator, but which
accept more than array as argument. This improved readability, and it
is also coherent with the +
operator, which has a similar concat
method. The method doesn't modify the original object and return a new
object instead. It is plan to introduce a union!
method as well.
Tests and documentation are included.
It solves partially https://bugs.ruby-lang.org/issues/14097
[Fix GH-1747] [Feature #14097]
From: Ana María Martínez Gómez ammartinez@suse.de
Updated by ana06 (Ana Maria Martinez Gomez) over 6 years ago
@nobu (Nobuyoshi Nakada) this is not closed yet as https://github.com/ruby/ruby/pull/1758 is not merged yet. I have just rebased. ;)
Updated by ana06 (Ana Maria Martinez Gomez) over 6 years ago
Now it is closed! ;)