Project

General

Profile

Feature #3715

Enumerator#size and #size=

Added by marcandre (Marc-Andre Lafortune) almost 10 years ago. Updated over 8 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Target version:
[ruby-core:31762]

Description

=begin
It would be useful to be able to ask an Enumerator for the number of times it will yield, without having to actually iterate it.

For example:

(1..1000).to_a.permutation(4).size # => 994010994000 (instantly)

It would allow nice features like:

class Enumerator
def with_progress
return to_enum :with_progress unless block_given?
out_of = size || "..."
each_with_index do |obj, i|
puts "Progress: #{i} / #{out_of}"
yield obj
end
puts "Done"
end
end

# To display the progress of any iterator, one can daisy-chain with_progress:
20.times.with_progress.map do
# do stuff here...
end

This would print out "Progress: 1 / 20", etc..., while doing the stuff.

*** Proposed changes ***

  • Enumerator#size *

call-seq:
e.size -> int, Float::INFINITY or nil
e.size {block} -> int

Returns the size of the enumerator.
The form with no block given will do a lazy evaluation of the size without going through the enumeration. If the size can not be determined then +nil+ is returned.
The form with a block will always iterate through the enumerator and return the number of times it yielded.

(1..100).to_a.permutation(4).size # => 94109400
loop.size # => Float::INFINITY

a = [1, 2, 3]
a.keep_if.size # => 3
a # => [1, 2, 3]
a.keep_if.size{false} # => 3
a # => []

[1, 2, 3].drop_while.size # => nil
[1, 2, 3].drop_while.size{|i| i < 3} # => 2

  • Enumerator#size= *

call-seq:
e.size = sz

Sets the size of the enumerator. If +sz+ is a Proc or a Method, it will be called each time +size+ is requested, otherwise +sz+ is returned.

first = [1, 2, 3]
second = [4, 5]
enum = Enumerator.new do |y|
first.each{|o| y << o}
second.each{|o| y << o}
end
enum.size # => nil
enum.size = ->(e){first.size + second.size}
enum.size # => 5
first << 42
enum.size # => 6

  • Kerne#to_enum / enum_for *

The only other API change is for #to_enum/#enum_for, which can accept a block for size calculation:

class Date
def step(limit, step=1)
unless block_given?
return to_enum(:step, limit, step){|date| (limit - date).div(step) + 1}
end
# ...
end
end

*** Implementation ***

I implemented the support for #size for most builtin enumerator producing methods (63 in all).

It is broken down in about 20 commits: http://github.com/marcandre/ruby/commits/enum_size

It begins with the implementation of Enumerator#size{=}: http://github.com/marcandre/ruby/commit/a92feb0

A combined patch is available here: http://gist.github.com/535974

Still missing are Dir#each, Dir.foreach, ObjectSpace.each_object, Range#step, Range#each, String#upto, String#gsub, String#each_line.

The enumerators whose #size returns +nil+ are:
Array#{r}index, {take|drop}_while
Enumerable#find{_index}, {take|drop}_while
IO: all methods

*** Notes ***

  • Returning +nil+ *

I feel it is best if IO.each_line.size and similar return +nil+ to avoid side effects.

We could have Array#find_index.size return the size of the array with the understanding that this is the maximum number of times the enumerator will yield. Since a block can always contain a break statement, size could be understood as a maximum anyways, so it can definitely be argued that the definition should be the maximum number of times.

  • Arguments to size proc/lambda *

My implementation currently passes the object that the enumerator will call followed with any arguments given when building the enumerator.

If Enumerator had getters (say Enumerator#base, Enumerator#call, Enumerator#args, see feature request #3714), passing the enumerator itself might be a better idea.

  • Does not dispatch through name *

It might be worth noting that the size dispatch is decided when creating the enumerator, not afterwards in function of the class & method name:

[1,2,3].permutation(2).size # => 6
[1,2,3].to_enum(:permutation, 2).size # => nil

  • Size setter *

Although I personally like the idea that #size= can accept a Proc/Lambda for later call, this has the downside that there is no getter, i.e. no way to get the Proc/Lambda back. I feel this is not an issue, but an alternative would be to have a #size_proc and #size_proc= setters too (like Hash).

I believe this addresses feature request #2673, although maybe in a different fashion. http://redmine.ruby-lang.org/issues/show/2673
=end


Related issues

Related to Ruby master - Feature #2673: the length for an enumerator generated by Array#permutation and Array#combinationClosedmatz (Yukihiro Matsumoto)Actions
Related to Ruby master - Feature #3714: Add getters for EnumeratorClosedknu (Akinori MUSHA)Actions
Related to Ruby master - Feature #6636: Enumerable#sizeClosedmarcandre (Marc-Andre Lafortune)06/24/2012Actions

Also available in: Atom PDF