Actions

## Feature #8887

closed

### min(n), max(n), min_by(n), max_by(n)

Status:
Closed
Priority:
Normal
Target version:
-
[ruby-core:57111]

Description

return minimum/maximum n elements as an array.

Example:

• [6, 0, 3, 3, 8, 3, 5, 0, 6].min(4) #=> [0, 0, 3, 3]
• [6, 0, 3, 3, 8, 3, 5, 0, 6].max(4) #=> [5, 6, 6, 8]
• [6, 0, 3, 3, 8, 3, 5, 0, 6].min_by(4) {|v| (v-5)**2 } #=> [5, 6, 6, 3]
• [6, 0, 3, 3, 8, 3, 5, 0, 6].max_by(4) {|v| (v-5)**2 } #=> [3, 8, 0, 0]

These methods are similar to sort follows first or last.

• e.min(n) is similar to e.sort.first(n)
• e.max(n) is similar to e.sort.last(n)
• e.min_by(n) {...} is similar to e.sort_by {...}.first(n)
• e.max_by(n) {...} is similar to e.sort_by {...}.last(n)

However e.min(n), e.max(n), e.min_by(n), e.max_by(n) are
less memory consuming and can be faster.
They use memory proportional to n, not e.
They doesn't sort whole e.

I feel their use is not rare.
I found several use after searching.

Also, e.max(n) can be used to implement weighted random sampling.

Pavlos S. Efraimidis, Paul G. Spirakis
Weighted random sampling with a reservoir
Information Processing Letters
Volume 97, Issue 5 (16 March 2006)

```  % ./ruby -e '
module Enumerable
def wsample(n)
self.max_by(n) {|v| rand ** (1.0/yield(v)) }
end
end
e = (-20..20).to_a*10000
a = e.wsample(20000) {|x|
Math.exp(-(x/5.0)**2) # normal distribution
}
# a is 20000 samples from e.
p a.length
h = a.group_by {|x| x }
-10.upto(10) {|x| puts "*" * (h[x].length/30.0).to_i if h[x] }
'
20000
*
***
******
***********
******************
*****************************
*****************************************
****************************************************
***************************************************************
********************************************************************
***********************************************************************
***********************************************************************
**************************************************************
****************************************************
***************************************
***************************
******************
***********
*******
***
*
```

Files

#### Updated by Hanmac (Hans Mackowiak)over 7 years ago

hm i am curios in what order `max(n)` and `max_by(n)` should return the elements?

like `[6, 0, 3, 3, 8, 3, 5, 0, 6].max(4)`:
should it return `sort.last(n) #> [5, 6, 6, 8]`
or is that better? `[8, 6, 6, 5]` because 8 is bigger than 6 ?

PS: `min_by(n)` and `max_by(n)` (and maybe `sort_by(n)`) should maybe implemented different without take, so that it `max_by(n)` picks only the first `n` biggest objects without sorting the entire test of the `Enumerator`?

also about your patch, `e.min_by(n).size` should respect also `n` and also `enum_size`, so it needs to return `n` when `n < as size`, and `size` when `n > size` (for `size != nil`)

#### Updated by akr (Akira Tanaka)over 7 years ago

2013/9/10 Hanmac (Hans Mackowiak) hanmac@gmx.de:

Issue #8887 has been updated by Hanmac (Hans Mackowiak).

hm i am curios in what order max(n) and max_by(n) should return the elements?

like [6, 0, 3, 3, 8, 3, 5, 0, 6].max(4):
should it return sort.last(n) #> [5, 6, 6, 8]
or is that better? [8, 6, 6, 5] because 8 is bigger than 6 ?

I think sorted order (minimum element first, maximum element last) is intuitive.

PS: min_by(n) and max_by(n) (and maybe sort_by(n)) should maybe implemented different without take, so that it max_by(n) picks only the first n biggest objects without sorting the entire test of the Enumerator?

The methods accumelates 4n elements, "sort" them and drop 3n elements
repeatedly.
Also, the "sort" is based on quick sort but doesn't sort a half of
divided elements.

also about your patch, e.min_by(n).size should respect also n and also enum_size, so it needs to return n when n < as size, and size when n > size (for size != nil)

As far as I understand, the size method returns number of yield
invocaitons of the enumerator.
The enumerators, e.min_by(n) and e.max_by(n), yields e.size times.
So they returns e.size which is not related to n.

For example, e.reject.size returns e.size as follows.

(1..10).to_a.reject.size #=> 10

This doesn't mean the result array size of reject method.
--
Tanaka Akira

Actions #4

#### Updated by akr (Akira Tanaka)over 7 years ago

• File deleted (maxn.pdf)

slide updated

#### Updated by matz (Yukihiro Matsumoto)about 7 years ago

Agreed. I think 2.2 is a good time to add these enhancement.

Matz.

• Category set to core
• Status changed from Open to Assigned
• Assignee set to akr (Akira Tanaka)

• Description updated (diff)

#### Updated by akr (Akira Tanaka)about 7 years ago

• Status changed from Assigned to Closed
• % Done changed from 0 to 100

Applied in changeset r44958.

• enum.c: Enumerable#{min,min_by,max,max_by} extended to take an
optional argument.
(nmin_cmp): New function.
(nmin_block_cmp): Ditto
(nmin_filter): Ditto.
(nmin_i): Ditto.
(nmin_run): Ditto.
(enum_min): Call nmin_run if the optional argument is given.
(nmin_max): Ditto.
(nmin_min_by): Ditto.
(nmin_max_by): Ditto.

• range.c: Range#{min,max} extended to take an optional argument.
(range_min): Call range_first if the optional argument is given.
(range_max): Call rb_call_super if the optional argument is given.

[ruby-core:57111] [Feature #8887]

#### Updated by DavidEGrayson (David Grayson)over 6 years ago

Hello. I think the Ruby team should reconsider the ordering of the array returned by `max` and `max_by` when the `n` argument is provided. It makes much more sense to me that it would be sorted in descending order, meaning that the most extreme/special element would be first. For example, I would expect `max` to behave like this:

```[1, 2, 10, 22, 23].max(3)  # => [23, 22, 10]
```

It seems that min/min_by and max/max_by should be mirrors of each other. For example, I would expect that `max_by(n) { |x| x.foo }` would be equivalent to `min_by(n) { |x| -x.foo }`, assuming that `foo` returns a number.

That leaves us with two choices for the sorting: put the most extreme element first, or put the most extreme element last. I think putting the most extreme element first makes the most sense. Whenever I see any list of top-ranked items, the highest-ranked item is usually first. Here are some examples from the internet: http://imgur.com/a/MPsJl

I looked around on StackOverflow.com for Ruby users who might benefit from the new argument to min/min_by/max/max_by. Almost everyone asking a question about it wanted to use something like `max_by` and wanted to have the most extreme element be listed first:

However, I should note that in one case, the questioner wanted the order of the original enumerable to be preserved (http://stackoverflow.com/questions/9459447), and in another case they didn't specify (http://stackoverflow.com/questions/12241165). I didn't find anyone who wanted the most extreme element to be last.

I also looked for similar features in other programming languages. The closest thing I found was Python's `heapq.nlargest` method, which does return the most extreme element first:

https://docs.python.org/3/library/heapq.html#heapq.nlargest

In SQL, if you are making a query to get the top 10 rows, I think you must do something like this, which puts the most extreme row first:

```select * from players order by score desc limit 10
```

In conclusion, I think it is more natural for the `max` and `max_by` methods to return arrays with the most extreme element first. I hope the Ruby team will do this before Ruby 2.2.0 is released, since it will be hard to change it later without breaking a lot of people's code.

--David Grayson

P.S. Some other Ruby developers I know here in Las Vegas have agreed with this letter and wanted their names to be included:

• Jason Arhart
• Paul Grayson

#### Updated by jhettich (Jan Hettich)over 6 years ago

A strong case can be made for returning the elements in their original order. That ordering information is otherwise lost; whereas, the client can always sort the resulting elements in whatever order she wants. This behavior would also be more consistent with other enumerable methods, such as select.

#### Updated by sawa (Tsuyoshi Sawada)over 6 years ago

If descending order is to adopted, a thing that has to be made clear is how to order different objects that are in a tie with respect to the measure in question.

``````[0, 1, 2, 3, 4, 5, 6, 7, 8].max_by(6){|e| e % 3}
``````

Which of the following should this return?

``````[1, 4, 7, 2, 5, 8] # ascending, stable
[2, 5, 8, 1, 4, 7] # descending, stable
[8, 5, 2, 7, 4, 1] # descending, reversed stable
``````

#### Updated by akr (Akira Tanaka)over 6 years ago

David Grayson wrote:

Hello. I think the Ruby team should reconsider the ordering of the array returned by `max` and `max_by` when the `n` argument is provided. It makes much more sense to me that it would be sorted in descending order, meaning that the most extreme/special element would be first. For example, I would expect `max` to behave like this:

```[1, 2, 10, 22, 23].max(3)  # => [23, 22, 10]
```

It seems reasonable.

Other opinions?

Note that the max(n) implementation doesn't maintain the order of original enumerable.
So it is difficult to preserve the original order or stablility.

Actions

Also available in: Atom PDF