Feature #8887
closedmin(n), max(n), min_by(n), max_by(n)
Description
How about adding an optional argument, n, for Enumerable#{min,max,min_by,max_by} to
return minimum/maximum n elements as an array.
Example:
- [6, 0, 3, 3, 8, 3, 5, 0, 6].min(4) #=> [0, 0, 3, 3]
- [6, 0, 3, 3, 8, 3, 5, 0, 6].max(4) #=> [5, 6, 6, 8]
- [6, 0, 3, 3, 8, 3, 5, 0, 6].min_by(4) {|v| (v-5)**2 } #=> [5, 6, 6, 3]
- [6, 0, 3, 3, 8, 3, 5, 0, 6].max_by(4) {|v| (v-5)**2 } #=> [3, 8, 0, 0]
These methods are similar to sort follows first or last.
- e.min(n) is similar to e.sort.first(n)
- e.max(n) is similar to e.sort.last(n)
- e.min_by(n) {...} is similar to e.sort_by {...}.first(n)
- e.max_by(n) {...} is similar to e.sort_by {...}.last(n)
However e.min(n), e.max(n), e.min_by(n), e.max_by(n) are
less memory consuming and can be faster.
They use memory proportional to n, not e.
They doesn't sort whole e.
I feel their use is not rare.
I found several use after searching.
[ruby-talk:123508], [ruby-list:40939], [ruby-talk:273980]
http://d.hatena.ne.jp/mjh/20101024/1287901875
http://stackoverflow.com/questions/11094874/get-top-n-elements-from-ruby-array-of-hash-values
http://www.math.kobe-u.ac.jp/~kodama/tips-ruby-sized_pqueue.html
https://bitbucket.org/sterlingcamden/topn
Also, e.max(n) can be used to implement weighted random sampling.
Pavlos S. Efraimidis, Paul G. Spirakis
Weighted random sampling with a reservoir
Information Processing Letters
Volume 97, Issue 5 (16 March 2006)
% ./ruby -e '
module Enumerable
def wsample(n)
self.max_by(n) {|v| rand ** (1.0/yield(v)) }
end
end
e = (-20..20).to_a*10000
a = e.wsample(20000) {|x|
Math.exp(-(x/5.0)**2) # normal distribution
}
# a is 20000 samples from e.
p a.length
h = a.group_by {|x| x }
-10.upto(10) {|x| puts "*" * (h[x].length/30.0).to_i if h[x] }
'
20000
*
***
******
***********
******************
*****************************
*****************************************
****************************************************
***************************************************************
********************************************************************
***********************************************************************
***********************************************************************
**************************************************************
****************************************************
***************************************
***************************
******************
***********
*******
***
*
Any comments?
Files
Updated by Hanmac (Hans Mackowiak) about 11 years ago
hm i am curios in what order max(n)
and max_by(n)
should return the elements?
like [6, 0, 3, 3, 8, 3, 5, 0, 6].max(4)
:
should it return sort.last(n) #> [5, 6, 6, 8]
or is that better? [8, 6, 6, 5]
because 8 is bigger than 6 ?
PS: min_by(n)
and max_by(n)
(and maybe sort_by(n)
) should maybe implemented different without take, so that it max_by(n)
picks only the first n
biggest objects without sorting the entire test of the Enumerator
?
also about your patch, e.min_by(n).size
should respect also n
and also enum_size
, so it needs to return n
when n < as size
, and size
when n > size
(for size != nil
)
Updated by akr (Akira Tanaka) about 11 years ago
2013/9/10 Hanmac (Hans Mackowiak) hanmac@gmx.de:
Issue #8887 has been updated by Hanmac (Hans Mackowiak).
hm i am curios in what order max(n) and max_by(n) should return the elements?
like [6, 0, 3, 3, 8, 3, 5, 0, 6].max(4):
should it return sort.last(n) #> [5, 6, 6, 8]
or is that better? [8, 6, 6, 5] because 8 is bigger than 6 ?
I think sorted order (minimum element first, maximum element last) is intuitive.
PS: min_by(n) and max_by(n) (and maybe sort_by(n)) should maybe implemented different without take, so that it max_by(n) picks only the first n biggest objects without sorting the entire test of the Enumerator?
The methods accumelates 4n elements, "sort" them and drop 3n elements
repeatedly.
Also, the "sort" is based on quick sort but doesn't sort a half of
divided elements.
also about your patch, e.min_by(n).size should respect also n and also enum_size, so it needs to return n when n < as size, and size when n > size (for size != nil)
As far as I understand, the size method returns number of yield
invocaitons of the enumerator.
The enumerators, e.min_by(n) and e.max_by(n), yields e.size times.
So they returns e.size which is not related to n.
For example, e.reject.size returns e.size as follows.
(1..10).to_a.reject.size #=> 10
This doesn't mean the result array size of reject method.¶
Tanaka Akira
Updated by akr (Akira Tanaka) about 11 years ago
slide updated
Updated by matz (Yukihiro Matsumoto) almost 11 years ago
Agreed. I think 2.2 is a good time to add these enhancement.
Matz.
Updated by nobu (Nobuyoshi Nakada) almost 11 years ago
- Category set to core
- Status changed from Open to Assigned
- Assignee set to akr (Akira Tanaka)
Updated by nobu (Nobuyoshi Nakada) almost 11 years ago
- Description updated (diff)
Updated by akr (Akira Tanaka) almost 11 years ago
- Status changed from Assigned to Closed
- % Done changed from 0 to 100
Applied in changeset r44958.
-
enum.c: Enumerable#{min,min_by,max,max_by} extended to take an
optional argument.
(nmin_cmp): New function.
(nmin_block_cmp): Ditto
(nmin_filter): Ditto.
(nmin_i): Ditto.
(nmin_run): Ditto.
(enum_min): Call nmin_run if the optional argument is given.
(nmin_max): Ditto.
(nmin_min_by): Ditto.
(nmin_max_by): Ditto. -
range.c: Range#{min,max} extended to take an optional argument.
(range_min): Call range_first if the optional argument is given.
(range_max): Call rb_call_super if the optional argument is given.
[ruby-core:57111] [Feature #8887]
Updated by DavidEGrayson (David Grayson) about 10 years ago
Hello. I think the Ruby team should reconsider the ordering of the array returned by max
and max_by
when the n
argument is provided. It makes much more sense to me that it would be sorted in descending order, meaning that the most extreme/special element would be first. For example, I would expect max
to behave like this:
[1, 2, 10, 22, 23].max(3) # => [23, 22, 10]
It seems that min/min_by and max/max_by should be mirrors of each other. For example, I would expect that max_by(n) { |x| x.foo }
would be equivalent to min_by(n) { |x| -x.foo }
, assuming that foo
returns a number.
That leaves us with two choices for the sorting: put the most extreme element first, or put the most extreme element last. I think putting the most extreme element first makes the most sense. Whenever I see any list of top-ranked items, the highest-ranked item is usually first. Here are some examples from the internet: http://imgur.com/a/MPsJl
I looked around on StackOverflow.com for Ruby users who might benefit from the new argument to min/min_by/max/max_by. Almost everyone asking a question about it wanted to use something like max_by
and wanted to have the most extreme element be listed first:
- http://stackoverflow.com/questions/19804696
- http://stackoverflow.com/questions/11094874
- http://stackoverflow.com/questions/9390444
However, I should note that in one case, the questioner wanted the order of the original enumerable to be preserved (http://stackoverflow.com/questions/9459447), and in another case they didn't specify (http://stackoverflow.com/questions/12241165). I didn't find anyone who wanted the most extreme element to be last.
I also looked for similar features in other programming languages. The closest thing I found was Python's heapq.nlargest
method, which does return the most extreme element first:
https://docs.python.org/3/library/heapq.html#heapq.nlargest
In SQL, if you are making a query to get the top 10 rows, I think you must do something like this, which puts the most extreme row first:
select * from players order by score desc limit 10
In conclusion, I think it is more natural for the max
and max_by
methods to return arrays with the most extreme element first. I hope the Ruby team will do this before Ruby 2.2.0 is released, since it will be hard to change it later without breaking a lot of people's code.
--David Grayson
P.S. Some other Ruby developers I know here in Las Vegas have agreed with this letter and wanted their names to be included:
- Jason Arhart
- Paul Grayson
Updated by jhettich (Jan Hettich) about 10 years ago
A strong case can be made for returning the elements in their original order. That ordering information is otherwise lost; whereas, the client can always sort the resulting elements in whatever order she wants. This behavior would also be more consistent with other enumerable methods, such as select.
Updated by sawa (Tsuyoshi Sawada) about 10 years ago
If descending order is to adopted, a thing that has to be made clear is how to order different objects that are in a tie with respect to the measure in question.
[0, 1, 2, 3, 4, 5, 6, 7, 8].max_by(6){|e| e % 3}
Which of the following should this return?
[1, 4, 7, 2, 5, 8] # ascending, stable
[2, 5, 8, 1, 4, 7] # descending, stable
[8, 5, 2, 7, 4, 1] # descending, reversed stable
Updated by akr (Akira Tanaka) about 10 years ago
David Grayson wrote:
Hello. I think the Ruby team should reconsider the ordering of the array returned by
max
andmax_by
when then
argument is provided. It makes much more sense to me that it would be sorted in descending order, meaning that the most extreme/special element would be first. For example, I would expectmax
to behave like this:[1, 2, 10, 22, 23].max(3) # => [23, 22, 10]
It seems reasonable.
Other opinions?
Note that the max(n) implementation doesn't maintain the order of original enumerable.
So it is difficult to preserve the original order or stablility.