Feature #19787
openAdd Enumerable#uniq_map, Enumerable::Lazy#uniq_map, Array#uniq_map and Array#uniq_map!
Description
I would like to propose a collection of new methods, Enumerable#uniq_map
, Enumerable::Lazy#uniq_map
, Array#uniq_map
and Array#uniq_map!
.
TL;DR: It's a drop in replacement for .map { ... }.uniq
, with (hopefully) better performance.
I've quite often had to map over an array and get its unique elements. It occurred to me when doing so recently that Ruby doesn't have a short form method for doing that, similar to how .flat_map { ... }
replaces .map { ... }.flatten
and .filter_map { ... }
replaces .map { ... }.compact
(with minor differences). I think these new methods could be beneficial both in terms of better performance and writing more succinct code.
I've got a draft PR up with some initial benchmarks in the description: https://github.com/ruby/ruby/pull/8140.
Updated by mame (Yusuke Endoh) 2 months ago
It does not make sense to me to provide .foo_map { ... }
for all patterns like .map { ... }.foo
.
flat_map
was introduced for some reasons: https://blade.ruby-lang.org/ruby-core/26287
Is .map { ... }.uniq
such a very frequent idiom? .uniq_map { ... }
is not as concise as .map { ... }.uniq
. Scala doesn't seem to provide uniqMap
.
I think the only reason for introducing uniq_map
is to avoid creating an intermediate array. However, there is a .lazy
for exactly that purpose: .lazy.map { ... }.uniq.to_a
.
Considering the above, I think the motivation is too weak to provide uniq_map
.
Updated by joshuay03 (Joshua Young) 2 months ago
mame (Yusuke Endoh) wrote in #note-3:
Is
.map { ... }.uniq
such a very frequent idiom?
I work on a Rails codebase and it's most commonly used to iterate through foreign keys / associated records and get the unique values (quite often in tests). That's the only real frequent use case I've come across, but I would say that outside of that, it's not an uncommon chaining pattern either.
.uniq_map { ... }
is not as concise as.map { ... }.uniq
.
I'm a bit mixed on this point. I see where you're coming from, and I think the same argument can be made about #flat_map
and #filter_map
. But it follows a similar naming pattern where the alternative chained methods are in the name, so I personally feel like it's it's equally as concise?
I would also like to point out that with #uniq_map
, you don't need to read all the way to .uniq
before inferring the output. This might help when the body of the #map
is quite complex, but you could argue that this is a code quality / style problem...
some_array.map do |item|
if some_condition
some_method(item)
else
some_other_method(item)
end
end.uniq
# vs
some_array.uniq_map do |item|
if some_condition
some_method(item)
else
some_other_method(item)
end
end
Scala doesn't seem to provide
uniqMap
.
Sorry, this is the first Ruby issue I've created or being involved with, so I'm not sure why this was pointed out. Is this a usual consideration for new features?
Considering the above, I think the motivation is too weak to provide
uniq_map
.
Your points are very valid, and I appreciate the response. What is the usual process for deciding on whether or not to accept a feature?
Updated by austin (Austin Ziegler) 2 months ago
joshuay03 (Joshua Young) wrote in #note-4:
Is
.map { ... }.uniq
such a very frequent idiom?I work on a Rails codebase and it's most commonly used to iterate through foreign keys / associated records and get the unique values (quite often in tests). That's the only real frequent use case I've come across, but I would say that outside of that, it's not an uncommon chaining pattern either.
Wouldn’t it make more sense, then, to do uniq { … }.map { … }
? Yes, there’s a small bit of extra code, but it means that you’re in most cases going to be performing less work than either .map { … }.uniq
or uniq_map { … }
, because you’re reducing to unique instances before mapping them. If your computations are complex enough that they should be done before #uniq
, I would bypass the need for #uniq
altogether and use #reduce
.
.uniq_map { ... }
is not as concise as.map { ... }.uniq
.I'm a bit mixed on this point. I see where you're coming from, and I think the same argument can be made about
#flat_map
and#filter_map
. But it follows a similar naming pattern where the alternative chained methods are in the name, so I personally feel like it's it's equally as concise?I would also like to point out that with
#uniq_map
, you don't need to read all the way to.uniq
before inferring the output. This might help when the body of the#map
is quite complex, but you could argue that this is a code quality / style problem...
The #map
block being complex is even more of an argument for reversing the flow or using #reduce
.
some_array.map do |item| if some_condition some_method(item) else some_other_method(item) end end.uniq # vs some_array.uniq_map do |item| if some_condition some_method(item) else some_other_method(item) end end
Both of these could be shorter if you use brace blocks (which should be preferred when using the output of a method like #map
).
some_array.map { some_condition ? some_method(_1) : some_other_method(_1) }.uniq
From a reduced operations perspective, #reduce
is going to be faster than most anything else, and there are multiple options for the operation, but Set
is likely to be your best case:
some_array.reduce(Set.new) { _2.add(some_condition ? some_method(_2) : some_other_method(_2)) }.to_a
Uniquely mapped in one pass, although I think still less efficient than uniq {}.map {}
because you have to map the items before determining that they are unique (via Set#add
). One could implement this without Set
, but that would require a bit more work:
some_array.reduce({}) {
v = some_condition ? some_method(_2) : some_other_method(_2)
_1[v] = true unless _1.key?(v)
_1
}.keys
These are both slightly less efficient than your C code because I don’t believe that there’s a way to preallocate a Hash size from Ruby (there have been proposals, but I don’t believe any have been accepted).
Scala doesn't seem to provide
uniqMap
.Sorry, this is the first Ruby issue I've created or being involved with, so I'm not sure why this was pointed out. Is this a usual consideration for new features?
When trying to add a functional shorthand, it is common to compare it against other functional languages to see if it or a close synonym is commonly used because many people working with functional operations find it useful to have such a shorthand. So, not unusual.
Considering the above, I think the motivation is too weak to provide
uniq_map
.Your points are very valid, and I appreciate the response. What is the usual process for deciding on whether or not to accept a feature?
Consensus-building mostly.
I don’t think that #uniq_map
is a good addition because it is only sugar over .map {}.uniq
and cannot sugar over .map {}.uniq {}
, and I think that — with the exception of the intermediate hash-size preallocation — #reduce
or .uniq {}.map {}
will be as or more efficient than #uniq_map
. I don’t have benchmarks, though.
Updated by rubyFeedback (robert heiler) 2 months ago
joshuay03 wrote:
What is the usual process for deciding on whether or not to accept a feature?
Ultimately you only have to convince matz. :)
However had, matz may also request additional information and/or use case and
"usefulness" of suggestions. Once features are added it is difficult to remove
them due to backwards compatibility.
I am not really invested in the proposal here, so I will not comment much at
all. The way how I use ruby I use .map {} a lot, and .uniq sometimes, but I
don't think I really had major use cases for combining the above into one
method call. Note that I also don't use .flat_map either - I kind of prefer
to stay with one-word methods when possible. They seem to make more sense to
my brain. (I understand a rationale for e. g. library authors where efficiency
may be more important, but personally I use ruby as kind of "syntax sugar"
over C, the operating system and everyday tasks - ruby is really like the
ultimate glue language the way how I use it. But that's just a side comment,
I completely understand different people using ruby differently; just for my
own use cases I don't seem to need .uniq_map or .flat_map. By the way, I also
find it harder to remember the method names for two-word methods, e. g.
.map_uniq or .map_flat; that's also one reason I stick with oldschool method
chaining. Perhaps I am getting old ... .lazy is a bit different in that it
also defers using something at "when it is needed", along with the functional
use cases it has, which I think is different to both .flat_map and .uniq_map.)