Feature #8887

min(n), max(n), min_by(n), max_by(n)

Added by Akira Tanaka 7 months ago. Updated 2 months ago.

[ruby-core:57111]
Status:Closed
Priority:Normal
Assignee:Akira Tanaka
Category:core
Target version:-

Description

How about adding an optional argument, n, for Enumerable#{min,max,minby,maxby} to
return minimum/maximum n elements as an array.

Example:
* [6, 0, 3, 3, 8, 3, 5, 0, 6].min(4) #=> [0, 0, 3, 3]
* [6, 0, 3, 3, 8, 3, 5, 0, 6].max(4) #=> [5, 6, 6, 8]
* [6, 0, 3, 3, 8, 3, 5, 0, 6].minby(4) {|v| (v-5)**2 } #=> [5, 6, 6, 3]
* [6, 0, 3, 3, 8, 3, 5, 0, 6].max
by(4) {|v| (v-5)**2 } #=> [3, 8, 0, 0]

These methods are similar to sort follows first or last.

  • e.min(n) is similar to e.sort.first(n)
  • e.max(n) is similar to e.sort.last(n)
  • e.minby(n) {...} is similar to e.sortby {...}.first(n)
  • e.maxby(n) {...} is similar to e.sortby {...}.last(n)

However e.min(n), e.max(n), e.minby(n), e.maxby(n) are
less memory consuming and can be faster.
They use memory proportional to n, not e.
They doesn't sort whole e.

I feel their use is not rare.
I found several use after searching.

, ,
http://d.hatena.ne.jp/mjh/20101024/1287901875
http://stackoverflow.com/questions/11094874/get-top-n-elements-from-ruby-array-of-hash-values
http://www.math.kobe-u.ac.jp/~kodama/tips-ruby-sized_pqueue.html
https://bitbucket.org/sterlingcamden/topn

Also, e.max(n) can be used to implement weighted random sampling.

Pavlos S. Efraimidis, Paul G. Spirakis
Weighted random sampling with a reservoir
Information Processing Letters
Volume 97, Issue 5 (16 March 2006)

  % ./ruby -e '
  module Enumerable
    def wsample(n)
      self.max_by(n) {|v| rand ** (1.0/yield(v)) }
    end
  end
  e = (-20..20).to_a*10000
  a = e.wsample(20000) {|x|
    Math.exp(-(x/5.0)**2) # normal distribution
  }
  # a is 20000 samples from e.
  p a.length
  h = a.group_by {|x| x }
  -10.upto(10) {|x| puts "*" * (h[x].length/30.0).to_i if h[x] }
  '
  20000
  *
  ***
  ******
  ***********
  ******************
  *****************************
  *****************************************
  ****************************************************
  ***************************************************************
  ********************************************************************
  ***********************************************************************
  ***********************************************************************
  **************************************************************
  ****************************************************
  ***************************************
  ***************************
  ******************
  ***********
  *******
  ***
  *

Any comments?

min-n-and-max-n.patch Magnifier (15.2 KB) Akira Tanaka, 09/10/2013 10:28 PM

maxn.pdf (20.3 KB) Akira Tanaka, 09/28/2013 11:10 AM

Associated revisions

Revision 44958
Added by Akira Tanaka 2 months ago

  • enum.c: Enumerable#{min,minby,max,maxby} extended to take an
    optional argument.
    (nmincmp): New function.
    (nmin
    blockcmp): Ditto
    (nmin
    filter): Ditto.
    (nmini): Ditto.
    (nmin
    run): Ditto.
    (enummin): Call nminrun if the optional argument is given.
    (nminmax): Ditto.
    (nmin
    minby): Ditto.
    (nmin
    max_by): Ditto.

  • range.c: Range#{min,max} extended to take an optional argument.
    (rangemin): Call rangefirst if the optional argument is given.
    (rangemax): Call rbcall_super if the optional argument is given.

[Feature #8887]

History

#1 Updated by Hans Mackowiak 7 months ago

hm i am curios in what order max(n) and max_by(n) should return the elements?

like [6, 0, 3, 3, 8, 3, 5, 0, 6].max(4):
should it return sort.last(n) #> [5, 6, 6, 8]
or is that better? [8, 6, 6, 5] because 8 is bigger than 6 ?

PS: minby(n) and maxby(n) (and maybe sortby(n)) should maybe implemented different without take, so that it maxby(n) picks only the first n biggest objects without sorting the entire test of the Enumerator?

also about your patch, e.minby(n).size should respect also n and also enumsize, so it needs to return n when n < as size, and size when n > size (for size != nil)

#2 Updated by Akira Tanaka 7 months ago

2013/9/10 Hanmac (Hans Mackowiak) hanmac@gmx.de:

Issue #8887 has been updated by Hanmac (Hans Mackowiak).

hm i am curios in what order max(n) and max_by(n) should return the elements?

like [6, 0, 3, 3, 8, 3, 5, 0, 6].max(4):
should it return sort.last(n) #> [5, 6, 6, 8]
or is that better? [8, 6, 6, 5] because 8 is bigger than 6 ?

I think sorted order (minimum element first, maximum element last) is intuitive.

PS: minby(n) and maxby(n) (and maybe sortby(n)) should maybe implemented different without take, so that it maxby(n) picks only the first n biggest objects without sorting the entire test of the Enumerator?

The methods accumelates 4n elements, "sort" them and drop 3n elements
repeatedly.
Also, the "sort" is based on quick sort but doesn't sort a half of
divided elements.

also about your patch, e.minby(n).size should respect also n and also enumsize, so it needs to return n when n < as size, and size when n > size (for size != nil)

As far as I understand, the size method returns number of yield
invocaitons of the enumerator.
The enumerators, e.minby(n) and e.maxby(n), yields e.size times.
So they returns e.size which is not related to n.

For example, e.reject.size returns e.size as follows.

(1..10).to_a.reject.size #=> 10

This doesn't mean the result array size of reject method.
--
Tanaka Akira

#3 Updated by Akira Tanaka 7 months ago

  • File maxn.pdf added

slide added

#4 Updated by Akira Tanaka 7 months ago

  • File deleted (maxn.pdf)

#5 Updated by Akira Tanaka 7 months ago

slide updated

#6 Updated by Yukihiro Matsumoto 2 months ago

Agreed. I think 2.2 is a good time to add these enhancement.

Matz.

#7 Updated by Nobuyoshi Nakada 2 months ago

  • Category set to core
  • Status changed from Open to Assigned
  • Assignee set to Akira Tanaka

#8 Updated by Nobuyoshi Nakada 2 months ago

  • Description updated (diff)

#9 Updated by Akira Tanaka 2 months ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

Applied in changeset r44958.


  • enum.c: Enumerable#{min,minby,max,maxby} extended to take an
    optional argument.
    (nmincmp): New function.
    (nmin
    blockcmp): Ditto
    (nmin
    filter): Ditto.
    (nmini): Ditto.
    (nmin
    run): Ditto.
    (enummin): Call nminrun if the optional argument is given.
    (nminmax): Ditto.
    (nmin
    minby): Ditto.
    (nmin
    max_by): Ditto.

  • range.c: Range#{min,max} extended to take an optional argument.
    (rangemin): Call rangefirst if the optional argument is given.
    (rangemax): Call rbcall_super if the optional argument is given.

[Feature #8887]

Also available in: Atom PDF