Feature #8887

min(n), max(n), min_by(n), max_by(n)

Added by Akira Tanaka over 1 year ago. Updated 8 months ago.

[ruby-core:57111]
Status:Closed
Priority:Normal
Assignee:Akira Tanaka

Description

How about adding an optional argument, n, for Enumerable#{min,max,min_by,max_by} to
return minimum/maximum n elements as an array.

Example:
* [6, 0, 3, 3, 8, 3, 5, 0, 6].min(4) #=> [0, 0, 3, 3]
* [6, 0, 3, 3, 8, 3, 5, 0, 6].max(4) #=> [5, 6, 6, 8]
* [6, 0, 3, 3, 8, 3, 5, 0, 6].min_by(4) {|v| (v-5)2 } #=> [5, 6, 6, 3]
* [6, 0, 3, 3, 8, 3, 5, 0, 6].max_by(4) {|v| (v-5)
2 } #=> [3, 8, 0, 0]

These methods are similar to sort follows first or last.

  • e.min(n) is similar to e.sort.first(n)
  • e.max(n) is similar to e.sort.last(n)
  • e.min_by(n) {...} is similar to e.sort_by {...}.first(n)
  • e.max_by(n) {...} is similar to e.sort_by {...}.last(n)

However e.min(n), e.max(n), e.min_by(n), e.max_by(n) are
less memory consuming and can be faster.
They use memory proportional to n, not e.
They doesn't sort whole e.

I feel their use is not rare.
I found several use after searching.

, ,
http://d.hatena.ne.jp/mjh/20101024/1287901875
http://stackoverflow.com/questions/11094874/get-top-n-elements-from-ruby-array-of-hash-values
http://www.math.kobe-u.ac.jp/~kodama/tips-ruby-sized_pqueue.html
https://bitbucket.org/sterlingcamden/topn

Also, e.max(n) can be used to implement weighted random sampling.

Pavlos S. Efraimidis, Paul G. Spirakis
Weighted random sampling with a reservoir
Information Processing Letters
Volume 97, Issue 5 (16 March 2006)

  % ./ruby -e '
  module Enumerable
    def wsample(n)
      self.max_by(n) {|v| rand ** (1.0/yield(v)) }
    end
  end
  e = (-20..20).to_a*10000
  a = e.wsample(20000) {|x|
    Math.exp(-(x/5.0)**2) # normal distribution
  }
  # a is 20000 samples from e.
  p a.length
  h = a.group_by {|x| x }
  -10.upto(10) {|x| puts "*" * (h[x].length/30.0).to_i if h[x] }
  '
  20000
  *
  ***
  ******
  ***********
  ******************
  *****************************
  *****************************************
  ****************************************************
  ***************************************************************
  ********************************************************************
  ***********************************************************************
  ***********************************************************************
  **************************************************************
  ****************************************************
  ***************************************
  ***************************
  ******************
  ***********
  *******
  ***
  *

Any comments?

min-n-and-max-n.patch Magnifier (15.2 KB) Akira Tanaka, 09/10/2013 10:28 PM

maxn.pdf (20.3 KB) Akira Tanaka, 09/28/2013 11:10 AM

Associated revisions

Revision 44958
Added by Akira Tanaka over 1 year ago

  • enum.c: Enumerable#{min,min_by,max,max_by} extended to take an
    optional argument.
    (nmin_cmp): New function.
    (nmin_block_cmp): Ditto
    (nmin_filter): Ditto.
    (nmin_i): Ditto.
    (nmin_run): Ditto.
    (enum_min): Call nmin_run if the optional argument is given.
    (nmin_max): Ditto.
    (nmin_min_by): Ditto.
    (nmin_max_by): Ditto.

  • range.c: Range#{min,max} extended to take an optional argument.
    (range_min): Call range_first if the optional argument is given.
    (range_max): Call rb_call_super if the optional argument is given.

[Feature #8887]

Revision 44958
Added by Akira Tanaka over 1 year ago

  • enum.c: Enumerable#{min,min_by,max,max_by} extended to take an
    optional argument.
    (nmin_cmp): New function.
    (nmin_block_cmp): Ditto
    (nmin_filter): Ditto.
    (nmin_i): Ditto.
    (nmin_run): Ditto.
    (enum_min): Call nmin_run if the optional argument is given.
    (nmin_max): Ditto.
    (nmin_min_by): Ditto.
    (nmin_max_by): Ditto.

  • range.c: Range#{min,max} extended to take an optional argument.
    (range_min): Call range_first if the optional argument is given.
    (range_max): Call rb_call_super if the optional argument is given.

[Feature #8887]

History

#1 Updated by Hans Mackowiak over 1 year ago

hm i am curios in what order max(n) and max_by(n) should return the elements?

like [6, 0, 3, 3, 8, 3, 5, 0, 6].max(4):
should it return sort.last(n) #> [5, 6, 6, 8]
or is that better? [8, 6, 6, 5] because 8 is bigger than 6 ?

PS: min_by(n) and max_by(n) (and maybe sort_by(n)) should maybe implemented different without take, so that it max_by(n) picks only the first n biggest objects without sorting the entire test of the Enumerator?

also about your patch, e.min_by(n).size should respect also n and also enum_size, so it needs to return n when n < as size, and size when n > size (for size != nil)

#2 Updated by Akira Tanaka over 1 year ago

2013/9/10 Hanmac (Hans Mackowiak) hanmac@gmx.de:

Issue #8887 has been updated by Hanmac (Hans Mackowiak).

hm i am curios in what order max(n) and max_by(n) should return the elements?

like [6, 0, 3, 3, 8, 3, 5, 0, 6].max(4):
should it return sort.last(n) #> [5, 6, 6, 8]
or is that better? [8, 6, 6, 5] because 8 is bigger than 6 ?

I think sorted order (minimum element first, maximum element last) is intuitive.

PS: min_by(n) and max_by(n) (and maybe sort_by(n)) should maybe implemented different without take, so that it max_by(n) picks only the first n biggest objects without sorting the entire test of the Enumerator?

The methods accumelates 4n elements, "sort" them and drop 3n elements
repeatedly.
Also, the "sort" is based on quick sort but doesn't sort a half of
divided elements.

also about your patch, e.min_by(n).size should respect also n and also enum_size, so it needs to return n when n < as size, and size when n > size (for size != nil)

As far as I understand, the size method returns number of yield
invocaitons of the enumerator.
The enumerators, e.min_by(n) and e.max_by(n), yields e.size times.
So they returns e.size which is not related to n.

For example, e.reject.size returns e.size as follows.

(1..10).to_a.reject.size #=> 10

This doesn't mean the result array size of reject method.
--
Tanaka Akira

#3 Updated by Akira Tanaka over 1 year ago

  • File maxn.pdf added

slide added

#4 Updated by Akira Tanaka over 1 year ago

  • File deleted (maxn.pdf)

#5 Updated by Akira Tanaka over 1 year ago

slide updated

#6 Updated by Yukihiro Matsumoto over 1 year ago

Agreed. I think 2.2 is a good time to add these enhancement.

Matz.

#7 Updated by Nobuyoshi Nakada over 1 year ago

  • Category set to core
  • Assignee set to Akira Tanaka
  • Status changed from Open to Assigned

#8 Updated by Nobuyoshi Nakada over 1 year ago

  • Description updated (diff)

#9 Updated by Akira Tanaka over 1 year ago

  • % Done changed from 0 to 100
  • Status changed from Assigned to Closed

Applied in changeset r44958.


  • enum.c: Enumerable#{min,min_by,max,max_by} extended to take an
    optional argument.
    (nmin_cmp): New function.
    (nmin_block_cmp): Ditto
    (nmin_filter): Ditto.
    (nmin_i): Ditto.
    (nmin_run): Ditto.
    (enum_min): Call nmin_run if the optional argument is given.
    (nmin_max): Ditto.
    (nmin_min_by): Ditto.
    (nmin_max_by): Ditto.

  • range.c: Range#{min,max} extended to take an optional argument.
    (range_min): Call range_first if the optional argument is given.
    (range_max): Call rb_call_super if the optional argument is given.

[Feature #8887]

#10 Updated by David Grayson 8 months ago

Hello. I think the Ruby team should reconsider the ordering of the array returned by max and max_by when the n argument is provided. It makes much more sense to me that it would be sorted in descending order, meaning that the most extreme/special element would be first. For example, I would expect max to behave like this:

[1, 2, 10, 22, 23].max(3)  # => [23, 22, 10]

It seems that min/min_by and max/max_by should be mirrors of each other. For example, I would expect that max_by(n) { |x| x.foo } would be equivalent to min_by(n) { |x| -x.foo }, assuming that foo returns a number.

That leaves us with two choices for the sorting: put the most extreme element first, or put the most extreme element last. I think putting the most extreme element first makes the most sense. Whenever I see any list of top-ranked items, the highest-ranked item is usually first. Here are some examples from the internet: http://imgur.com/a/MPsJl

I looked around on StackOverflow.com for Ruby users who might benefit from the new argument to min/min_by/max/max_by. Almost everyone asking a question about it wanted to use something like max_by and wanted to have the most extreme element be listed first:

However, I should note that in one case, the questioner wanted the order of the original enumerable to be preserved (http://stackoverflow.com/questions/9459447), and in another case they didn't specify (http://stackoverflow.com/questions/12241165). I didn't find anyone who wanted the most extreme element to be last.

I also looked for similar features in other programming languages. The closest thing I found was Python's heapq.nlargest method, which does return the most extreme element first:

https://docs.python.org/3/library/heapq.html#heapq.nlargest

In SQL, if you are making a query to get the top 10 rows, I think you must do something like this, which puts the most extreme row first:

select * from players order by score desc limit 10

In conclusion, I think it is more natural for the max and max_by methods to return arrays with the most extreme element first. I hope the Ruby team will do this before Ruby 2.2.0 is released, since it will be hard to change it later without breaking a lot of people's code.

--David Grayson

P.S. Some other Ruby developers I know here in Las Vegas have agreed with this letter and wanted their names to be included:

  • Jason Arhart
  • Paul Grayson

#11 Updated by Jan Hettich 8 months ago

A strong case can be made for returning the elements in their original order. That ordering information is otherwise lost; whereas, the client can always sort the resulting elements in whatever order she wants. This behavior would also be more consistent with other enumerable methods, such as select.

#12 Updated by Tsuyoshi Sawada 8 months ago

If descending order is to adopted, a thing that has to be made clear is how to order different objects that are in a tie with respect to the measure in question.

[0, 1, 2, 3, 4, 5, 6, 7, 8].max_by(6){|e| e % 3}

Which of the following should this return?

[1, 4, 7, 2, 5, 8] # ascending, stable
[2, 5, 8, 1, 4, 7] # descending, stable
[8, 5, 2, 7, 4, 1] # descending, reversed stable

#13 Updated by Akira Tanaka 8 months ago

David Grayson wrote:

Hello. I think the Ruby team should reconsider the ordering of the array returned by max and max_by when the n argument is provided. It makes much more sense to me that it would be sorted in descending order, meaning that the most extreme/special element would be first. For example, I would expect max to behave like this:

[1, 2, 10, 22, 23].max(3)  # => [23, 22, 10]

It seems reasonable.

Other opinions?

Note that the max(n) implementation doesn't maintain the order of original enumerable.
So it is difficult to preserve the original order or stablility.

Also available in: Atom PDF