Project

General

Profile

Actions

Feature #19787

open

Add Enumerable#uniq_map, Enumerable::Lazy#uniq_map, Array#uniq_map and Array#uniq_map!

Feature #19787: Add Enumerable#uniq_map, Enumerable::Lazy#uniq_map, Array#uniq_map and Array#uniq_map!

Added by joshuay03 (Joshua Young) about 2 years ago. Updated over 1 year ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:114309]

Description

I would like to propose a collection of new methods, Enumerable#uniq_map, Enumerable::Lazy#uniq_map, Array#uniq_map and Array#uniq_map!.

TL;DR: It's a drop in replacement for .map { ... }.uniq, with (hopefully) better performance.

I've quite often had to map over an array and get its unique elements. It occurred to me when doing so recently that Ruby doesn't have a short form method for doing that, similar to how .flat_map { ... } replaces .map { ... }.flatten and .filter_map { ... } replaces .map { ... }.compact (with minor differences). I think these new methods could be beneficial both in terms of better performance and writing more succinct code.

I've got a draft PR up with some initial benchmarks in the description: https://github.com/ruby/ruby/pull/10269.

Updated by joshuay03 (Joshua Young) about 2 years ago Actions #1

  • Description updated (diff)

Updated by joshuay03 (Joshua Young) about 2 years ago Actions #2

  • Description updated (diff)

Updated by mame (Yusuke Endoh) about 2 years ago Actions #3 [ruby-core:114313]

It does not make sense to me to provide .foo_map { ... } for all patterns like .map { ... }.foo.

flat_map was introduced for some reasons: https://blade.ruby-lang.org/ruby-core/26287

Is .map { ... }.uniq such a very frequent idiom? .uniq_map { ... } is not as concise as .map { ... }.uniq. Scala doesn't seem to provide uniqMap.

I think the only reason for introducing uniq_map is to avoid creating an intermediate array. However, there is a .lazy for exactly that purpose: .lazy.map { ... }.uniq.to_a.

Considering the above, I think the motivation is too weak to provide uniq_map.

Updated by joshuay03 (Joshua Young) about 2 years ago Actions #4 [ruby-core:114314]

mame (Yusuke Endoh) wrote in #note-3:

Is .map { ... }.uniq such a very frequent idiom?

I work on a Rails codebase and it's most commonly used to iterate through foreign keys / associated records and get the unique values (quite often in tests). That's the only real frequent use case I've come across, but I would say that outside of that, it's not an uncommon chaining pattern either.

.uniq_map { ... } is not as concise as .map { ... }.uniq.

I'm a bit mixed on this point. I see where you're coming from, and I think the same argument can be made about #flat_map and #filter_map. But it follows a similar naming pattern where the alternative chained methods are in the name, so I personally feel like it's it's equally as concise?

I would also like to point out that with #uniq_map, you don't need to read all the way to .uniq before inferring the output. This might help when the body of the #map is quite complex, but you could argue that this is a code quality / style problem...

some_array.map do |item|
  if some_condition
    some_method(item)
  else
    some_other_method(item)
  end
end.uniq

# vs

some_array.uniq_map do |item|
  if some_condition
    some_method(item)
  else
    some_other_method(item)
  end
end

Scala doesn't seem to provide uniqMap.

Sorry, this is the first Ruby issue I've created or being involved with, so I'm not sure why this was pointed out. Is this a usual consideration for new features?

Considering the above, I think the motivation is too weak to provide uniq_map.

Your points are very valid, and I appreciate the response. What is the usual process for deciding on whether or not to accept a feature?

Updated by austin (Austin Ziegler) about 2 years ago Actions #5 [ruby-core:114320]

joshuay03 (Joshua Young) wrote in #note-4:

Is .map { ... }.uniq such a very frequent idiom?

I work on a Rails codebase and it's most commonly used to iterate through foreign keys / associated records and get the unique values (quite often in tests). That's the only real frequent use case I've come across, but I would say that outside of that, it's not an uncommon chaining pattern either.

Wouldn’t it make more sense, then, to do uniq { … }.map { … }? Yes, there’s a small bit of extra code, but it means that you’re in most cases going to be performing less work than either .map { … }.uniq or uniq_map { … }, because you’re reducing to unique instances before mapping them. If your computations are complex enough that they should be done before #uniq, I would bypass the need for #uniq altogether and use #reduce.

.uniq_map { ... } is not as concise as .map { ... }.uniq.

I'm a bit mixed on this point. I see where you're coming from, and I think the same argument can be made about #flat_map and #filter_map. But it follows a similar naming pattern where the alternative chained methods are in the name, so I personally feel like it's it's equally as concise?

I would also like to point out that with #uniq_map, you don't need to read all the way to .uniq before inferring the output. This might help when the body of the #map is quite complex, but you could argue that this is a code quality / style problem...

The #map block being complex is even more of an argument for reversing the flow or using #reduce.

some_array.map do |item|
  if some_condition
    some_method(item)
  else
    some_other_method(item)
  end
end.uniq

# vs

some_array.uniq_map do |item|
  if some_condition
    some_method(item)
  else
    some_other_method(item)
  end
end

Both of these could be shorter if you use brace blocks (which should be preferred when using the output of a method like #map).

some_array.map { some_condition ? some_method(_1) : some_other_method(_1) }.uniq

From a reduced operations perspective, #reduce is going to be faster than most anything else, and there are multiple options for the operation, but Set is likely to be your best case:

some_array.reduce(Set.new) { _2.add(some_condition ? some_method(_2) : some_other_method(_2)) }.to_a

Uniquely mapped in one pass, although I think still less efficient than uniq {}.map {} because you have to map the items before determining that they are unique (via Set#add). One could implement this without Set, but that would require a bit more work:

some_array.reduce({}) {
  v = some_condition ? some_method(_2) : some_other_method(_2)
 _1[v] = true unless _1.key?(v)
 _1
}.keys

These are both slightly less efficient than your C code because I don’t believe that there’s a way to preallocate a Hash size from Ruby (there have been proposals, but I don’t believe any have been accepted).

Scala doesn't seem to provide uniqMap.

Sorry, this is the first Ruby issue I've created or being involved with, so I'm not sure why this was pointed out. Is this a usual consideration for new features?

When trying to add a functional shorthand, it is common to compare it against other functional languages to see if it or a close synonym is commonly used because many people working with functional operations find it useful to have such a shorthand. So, not unusual.

Considering the above, I think the motivation is too weak to provide uniq_map.

Your points are very valid, and I appreciate the response. What is the usual process for deciding on whether or not to accept a feature?

Consensus-building mostly.

I don’t think that #uniq_map is a good addition because it is only sugar over .map {}.uniq and cannot sugar over .map {}.uniq {}, and I think that — with the exception of the intermediate hash-size preallocation — #reduce or .uniq {}.map {} will be as or more efficient than #uniq_map. I don’t have benchmarks, though.

Updated by rubyFeedback (robert heiler) about 2 years ago Actions #6 [ruby-core:114321]

joshuay03 wrote:

What is the usual process for deciding on whether or not to accept a feature?

Ultimately you only have to convince matz. :)

However had, matz may also request additional information and/or use case and
"usefulness" of suggestions. Once features are added it is difficult to remove
them due to backwards compatibility.

I am not really invested in the proposal here, so I will not comment much at
all. The way how I use ruby I use .map {} a lot, and .uniq sometimes, but I
don't think I really had major use cases for combining the above into one
method call. Note that I also don't use .flat_map either - I kind of prefer
to stay with one-word methods when possible. They seem to make more sense to
my brain. (I understand a rationale for e. g. library authors where efficiency
may be more important, but personally I use ruby as kind of "syntax sugar"
over C, the operating system and everyday tasks - ruby is really like the
ultimate glue language the way how I use it. But that's just a side comment,
I completely understand different people using ruby differently; just for my
own use cases I don't seem to need .uniq_map or .flat_map. By the way, I also
find it harder to remember the method names for two-word methods, e. g.
.map_uniq or .map_flat; that's also one reason I stick with oldschool method
chaining. Perhaps I am getting old ... .lazy is a bit different in that it
also defers using something at "when it is needed", along with the functional
use cases it has, which I think is different to both .flat_map and .uniq_map.)

Updated by joshuay03 (Joshua Young) over 1 year ago Actions #7

  • Description updated (diff)

Updated by joshuay03 (Joshua Young) over 1 year ago · Edited Actions #8 [ruby-core:117198]

austin (Austin Ziegler) wrote in #note-5:

Wouldn’t it make more sense, then, to do uniq { … }.map { … }? Yes, there’s a small bit of extra code, but it means that you’re in most cases going to be performing less work than either .map { … }.uniq or uniq_map { … }, because you’re reducing to unique instances before mapping them.

That would be very confusing code to read imo given that the body of both blocks would need to be the same. E.g.

# your suggestion

 > [1, 1, 2, 3, 4].uniq { |i| i % 2 }.map { |i| i % 2 }
=> [1, 0]

# vs the more common approach

 > [1, 1, 2, 3, 4].map { |i| i % 2 }.uniq
=> [1, 0]

# vs my proposal

 > [1, 1, 2, 3, 4].uniq_map { |i| i % 2 }
=> [1, 0]

And sure you can abstract it into a proc but I don't see why one would pick that syntax in the first place.

If your computations are complex enough that they should be done before #uniq, I would bypass the need for #uniq altogether and use #reduce.

Fair, but the beauty of #uniq is that it's declarative. With #reduce you'd need to infer the intent as well as the result, like with your examples, which is why I'm suggesting an alternative which is still easy to comprehend and also more performant by design.

The #map block being complex is even more of an argument for reversing the flow or using #reduce.

Both of these could be shorter if you use brace blocks (which should be preferred when using the output of a method like #map).

I should point out that that was a very contrived example. Obviously not all map operations are this simple (especially in the context of something like Rails). There's also the varying style preferences to keep in mind (multiline block delimiters, ternaries and numbered block params).

From a reduced operations perspective, #reduce is going to be faster than most anything else, and there are multiple options for the operation, but Set is likely to be your best case:

Uniquely mapped in one pass, although I think still less efficient than uniq {}.map {} because you have to map the items before determining that they are unique (via Set#add). One could implement this without Set, but that would require a bit more work:

These are both slightly less efficient than your C code because I don’t believe that there’s a way to preallocate a Hash size from Ruby (there have been proposals, but I don’t believe any have been accepted).

These are great examples, but I stand by my comment above that the mental overhead is not worth it in most cases unless your goal is simply performance. The primary reason for my request is succinctness, with performance coming second.

When trying to add a functional shorthand, it is common to compare it against other functional languages to see if it or a close synonym is commonly used because many people working with functional operations find it useful to have such a shorthand. So, not unusual.

That makes sense to me, thanks.

Consensus-building mostly.

I don’t think that #uniq_map is a good addition because it is only sugar over .map {}.uniq and cannot sugar over .map {}.uniq {}, and I think that — with the exception of the intermediate hash-size preallocation — #reduce or .uniq {}.map {} will be as or more efficient than #uniq_map. I don’t have benchmarks, though.

I appreciate your response.

Updated by joshuay03 (Joshua Young) over 1 year ago Actions #9 [ruby-core:117199]

rubyFeedback (robert heiler) wrote in #note-6:

Ultimately you only have to convince matz. :)

However had, matz may also request additional information and/or use case and
"usefulness" of suggestions. Once features are added it is difficult to remove
them due to backwards compatibility.

Thanks, that's a good point re: backwards compatibility.

I am not really invested in the proposal here, so I will not comment much at
all. The way how I use ruby I use .map {} a lot, and .uniq sometimes, but I
don't think I really had major use cases for combining the above into one
method call. Note that I also don't use .flat_map either - I kind of prefer
to stay with one-word methods when possible. They seem to make more sense to
my brain. (I understand a rationale for e. g. library authors where efficiency
may be more important, but personally I use ruby as kind of "syntax sugar"
over C, the operating system and everyday tasks - ruby is really like the
ultimate glue language the way how I use it. But that's just a side comment,
I completely understand different people using ruby differently; just for my
own use cases I don't seem to need .uniq_map or .flat_map. By the way, I also
find it harder to remember the method names for two-word methods, e. g.
.map_uniq or .map_flat; that's also one reason I stick with oldschool method
chaining. Perhaps I am getting old ... .lazy is a bit different in that it
also defers using something at "when it is needed", along with the functional
use cases it has, which I think is different to both .flat_map and .uniq_map.)

I appreciate the feedback.

Actions

Also available in: PDF Atom