Feature #19787
openAdd Enumerable#uniq_map, Enumerable::Lazy#uniq_map, Array#uniq_map and Array#uniq_map!
Added by joshuay03 (Joshua Young) over 1 year ago. Updated 8 months ago.
Description
I would like to propose a collection of new methods, Enumerable#uniq_map
, Enumerable::Lazy#uniq_map
, Array#uniq_map
and Array#uniq_map!
.
TL;DR: It's a drop in replacement for .map { ... }.uniq
, with (hopefully) better performance.
I've quite often had to map over an array and get its unique elements. It occurred to me when doing so recently that Ruby doesn't have a short form method for doing that, similar to how .flat_map { ... }
replaces .map { ... }.flatten
and .filter_map { ... }
replaces .map { ... }.compact
(with minor differences). I think these new methods could be beneficial both in terms of better performance and writing more succinct code.
I've got a draft PR up with some initial benchmarks in the description: https://github.com/ruby/ruby/pull/10269.
Updated by mame (Yusuke Endoh) over 1 year ago
It does not make sense to me to provide .foo_map { ... }
for all patterns like .map { ... }.foo
.
flat_map
was introduced for some reasons: https://blade.ruby-lang.org/ruby-core/26287
Is .map { ... }.uniq
such a very frequent idiom? .uniq_map { ... }
is not as concise as .map { ... }.uniq
. Scala doesn't seem to provide uniqMap
.
I think the only reason for introducing uniq_map
is to avoid creating an intermediate array. However, there is a .lazy
for exactly that purpose: .lazy.map { ... }.uniq.to_a
.
Considering the above, I think the motivation is too weak to provide uniq_map
.
Updated by joshuay03 (Joshua Young) over 1 year ago
mame (Yusuke Endoh) wrote in #note-3:
Is
.map { ... }.uniq
such a very frequent idiom?
I work on a Rails codebase and it's most commonly used to iterate through foreign keys / associated records and get the unique values (quite often in tests). That's the only real frequent use case I've come across, but I would say that outside of that, it's not an uncommon chaining pattern either.
.uniq_map { ... }
is not as concise as.map { ... }.uniq
.
I'm a bit mixed on this point. I see where you're coming from, and I think the same argument can be made about #flat_map
and #filter_map
. But it follows a similar naming pattern where the alternative chained methods are in the name, so I personally feel like it's it's equally as concise?
I would also like to point out that with #uniq_map
, you don't need to read all the way to .uniq
before inferring the output. This might help when the body of the #map
is quite complex, but you could argue that this is a code quality / style problem...
some_array.map do |item|
if some_condition
some_method(item)
else
some_other_method(item)
end
end.uniq
# vs
some_array.uniq_map do |item|
if some_condition
some_method(item)
else
some_other_method(item)
end
end
Scala doesn't seem to provide
uniqMap
.
Sorry, this is the first Ruby issue I've created or being involved with, so I'm not sure why this was pointed out. Is this a usual consideration for new features?
Considering the above, I think the motivation is too weak to provide
uniq_map
.
Your points are very valid, and I appreciate the response. What is the usual process for deciding on whether or not to accept a feature?
Updated by austin (Austin Ziegler) over 1 year ago
joshuay03 (Joshua Young) wrote in #note-4:
Is
.map { ... }.uniq
such a very frequent idiom?I work on a Rails codebase and it's most commonly used to iterate through foreign keys / associated records and get the unique values (quite often in tests). That's the only real frequent use case I've come across, but I would say that outside of that, it's not an uncommon chaining pattern either.
Wouldn’t it make more sense, then, to do uniq { … }.map { … }
? Yes, there’s a small bit of extra code, but it means that you’re in most cases going to be performing less work than either .map { … }.uniq
or uniq_map { … }
, because you’re reducing to unique instances before mapping them. If your computations are complex enough that they should be done before #uniq
, I would bypass the need for #uniq
altogether and use #reduce
.
.uniq_map { ... }
is not as concise as.map { ... }.uniq
.I'm a bit mixed on this point. I see where you're coming from, and I think the same argument can be made about
#flat_map
and#filter_map
. But it follows a similar naming pattern where the alternative chained methods are in the name, so I personally feel like it's it's equally as concise?I would also like to point out that with
#uniq_map
, you don't need to read all the way to.uniq
before inferring the output. This might help when the body of the#map
is quite complex, but you could argue that this is a code quality / style problem...
The #map
block being complex is even more of an argument for reversing the flow or using #reduce
.
some_array.map do |item| if some_condition some_method(item) else some_other_method(item) end end.uniq # vs some_array.uniq_map do |item| if some_condition some_method(item) else some_other_method(item) end end
Both of these could be shorter if you use brace blocks (which should be preferred when using the output of a method like #map
).
some_array.map { some_condition ? some_method(_1) : some_other_method(_1) }.uniq
From a reduced operations perspective, #reduce
is going to be faster than most anything else, and there are multiple options for the operation, but Set
is likely to be your best case:
some_array.reduce(Set.new) { _2.add(some_condition ? some_method(_2) : some_other_method(_2)) }.to_a
Uniquely mapped in one pass, although I think still less efficient than uniq {}.map {}
because you have to map the items before determining that they are unique (via Set#add
). One could implement this without Set
, but that would require a bit more work:
some_array.reduce({}) {
v = some_condition ? some_method(_2) : some_other_method(_2)
_1[v] = true unless _1.key?(v)
_1
}.keys
These are both slightly less efficient than your C code because I don’t believe that there’s a way to preallocate a Hash size from Ruby (there have been proposals, but I don’t believe any have been accepted).
Scala doesn't seem to provide
uniqMap
.Sorry, this is the first Ruby issue I've created or being involved with, so I'm not sure why this was pointed out. Is this a usual consideration for new features?
When trying to add a functional shorthand, it is common to compare it against other functional languages to see if it or a close synonym is commonly used because many people working with functional operations find it useful to have such a shorthand. So, not unusual.
Considering the above, I think the motivation is too weak to provide
uniq_map
.Your points are very valid, and I appreciate the response. What is the usual process for deciding on whether or not to accept a feature?
Consensus-building mostly.
I don’t think that #uniq_map
is a good addition because it is only sugar over .map {}.uniq
and cannot sugar over .map {}.uniq {}
, and I think that — with the exception of the intermediate hash-size preallocation — #reduce
or .uniq {}.map {}
will be as or more efficient than #uniq_map
. I don’t have benchmarks, though.
Updated by rubyFeedback (robert heiler) over 1 year ago
joshuay03 wrote:
What is the usual process for deciding on whether or not to accept a feature?
Ultimately you only have to convince matz. :)
However had, matz may also request additional information and/or use case and
"usefulness" of suggestions. Once features are added it is difficult to remove
them due to backwards compatibility.
I am not really invested in the proposal here, so I will not comment much at
all. The way how I use ruby I use .map {} a lot, and .uniq sometimes, but I
don't think I really had major use cases for combining the above into one
method call. Note that I also don't use .flat_map either - I kind of prefer
to stay with one-word methods when possible. They seem to make more sense to
my brain. (I understand a rationale for e. g. library authors where efficiency
may be more important, but personally I use ruby as kind of "syntax sugar"
over C, the operating system and everyday tasks - ruby is really like the
ultimate glue language the way how I use it. But that's just a side comment,
I completely understand different people using ruby differently; just for my
own use cases I don't seem to need .uniq_map or .flat_map. By the way, I also
find it harder to remember the method names for two-word methods, e. g.
.map_uniq or .map_flat; that's also one reason I stick with oldschool method
chaining. Perhaps I am getting old ... .lazy is a bit different in that it
also defers using something at "when it is needed", along with the functional
use cases it has, which I think is different to both .flat_map and .uniq_map.)
Updated by joshuay03 (Joshua Young) 8 months ago · Edited
austin (Austin Ziegler) wrote in #note-5:
Wouldn’t it make more sense, then, to do
uniq { … }.map { … }
? Yes, there’s a small bit of extra code, but it means that you’re in most cases going to be performing less work than either.map { … }.uniq
oruniq_map { … }
, because you’re reducing to unique instances before mapping them.
That would be very confusing code to read imo given that the body of both blocks would need to be the same. E.g.
# your suggestion
> [1, 1, 2, 3, 4].uniq { |i| i % 2 }.map { |i| i % 2 }
=> [1, 0]
# vs the more common approach
> [1, 1, 2, 3, 4].map { |i| i % 2 }.uniq
=> [1, 0]
# vs my proposal
> [1, 1, 2, 3, 4].uniq_map { |i| i % 2 }
=> [1, 0]
And sure you can abstract it into a proc but I don't see why one would pick that syntax in the first place.
If your computations are complex enough that they should be done before
#uniq
, I would bypass the need for#uniq
altogether and use#reduce
.
Fair, but the beauty of #uniq
is that it's declarative. With #reduce
you'd need to infer the intent as well as the result, like with your examples, which is why I'm suggesting an alternative which is still easy to comprehend and also more performant by design.
The
#map
block being complex is even more of an argument for reversing the flow or using#reduce
.
Both of these could be shorter if you use brace blocks (which should be preferred when using the output of a method like
#map
).
I should point out that that was a very contrived example. Obviously not all map operations are this simple (especially in the context of something like Rails). There's also the varying style preferences to keep in mind (multiline block delimiters, ternaries and numbered block params).
From a reduced operations perspective,
#reduce
is going to be faster than most anything else, and there are multiple options for the operation, butSet
is likely to be your best case:
Uniquely mapped in one pass, although I think still less efficient than
uniq {}.map {}
because you have to map the items before determining that they are unique (viaSet#add
). One could implement this withoutSet
, but that would require a bit more work:
These are both slightly less efficient than your C code because I don’t believe that there’s a way to preallocate a Hash size from Ruby (there have been proposals, but I don’t believe any have been accepted).
These are great examples, but I stand by my comment above that the mental overhead is not worth it in most cases unless your goal is simply performance. The primary reason for my request is succinctness, with performance coming second.
When trying to add a functional shorthand, it is common to compare it against other functional languages to see if it or a close synonym is commonly used because many people working with functional operations find it useful to have such a shorthand. So, not unusual.
That makes sense to me, thanks.
Consensus-building mostly.
I don’t think that
#uniq_map
is a good addition because it is only sugar over.map {}.uniq
and cannot sugar over.map {}.uniq {}
, and I think that — with the exception of the intermediate hash-size preallocation —#reduce
or.uniq {}.map {}
will be as or more efficient than#uniq_map
. I don’t have benchmarks, though.
I appreciate your response.
Updated by joshuay03 (Joshua Young) 8 months ago
rubyFeedback (robert heiler) wrote in #note-6:
Ultimately you only have to convince matz. :)
However had, matz may also request additional information and/or use case and
"usefulness" of suggestions. Once features are added it is difficult to remove
them due to backwards compatibility.
Thanks, that's a good point re: backwards compatibility.
I am not really invested in the proposal here, so I will not comment much at
all. The way how I use ruby I use .map {} a lot, and .uniq sometimes, but I
don't think I really had major use cases for combining the above into one
method call. Note that I also don't use .flat_map either - I kind of prefer
to stay with one-word methods when possible. They seem to make more sense to
my brain. (I understand a rationale for e. g. library authors where efficiency
may be more important, but personally I use ruby as kind of "syntax sugar"
over C, the operating system and everyday tasks - ruby is really like the
ultimate glue language the way how I use it. But that's just a side comment,
I completely understand different people using ruby differently; just for my
own use cases I don't seem to need .uniq_map or .flat_map. By the way, I also
find it harder to remember the method names for two-word methods, e. g.
.map_uniq or .map_flat; that's also one reason I stick with oldschool method
chaining. Perhaps I am getting old ... .lazy is a bit different in that it
also defers using something at "when it is needed", along with the functional
use cases it has, which I think is different to both .flat_map and .uniq_map.)
I appreciate the feedback.