Feature #6636

Enumerable#size

Added by Marc-Andre Lafortune almost 2 years ago. Updated over 1 year ago.

[ruby-core:45805]
Status:Closed
Priority:Normal
Assignee:Marc-Andre Lafortune
Category:core
Target version:2.0.0

Description

Now that it has been made clear that Enumerable#count never calls #size and that we have Enumerable#lazy, let me propose again an API for a lazy way to get the size of an Enumerable: Enumerable#size.

  • call-seq:
  • enum.size # => nil, Integer or Float::INFINITY
  • Returns the number of elements that will be yielded, without going through
  • the iteration (i.e. lazy), or +nil+ if it can't be calculated lazily.
  • perm = (1..100).to_a.permutation(4)
  • perm.size # => 94109400
  • perm.each_cons(2).size # => 94109399
  • loop.size # => Float::INFINITY
  • [42].drop_while.size # => nil

About 66 core methods returning enumerators would have a lazy size, like each_slice, permutation or lazy.take.

A few would have size return nil:
Array#{r}index, {take|drop}while
Enumerable#find{
index}, {take|drop}_while
IO: all methods

Sized enumerators can also be created naturally by providing a block to to_enum/enum_for or a lambda to Enumerator.new.

Example for to_enum:

class Integer
  def composition
    return to_enum(:composition){ 1 << (self - 1) } unless block_given?
    yield [] if zero?
    downto(1) do |i|
      (self - i).composition do |comp|
        yield [i, *comp]
      end
    end
  end
end

4.composition.to_a
# => [[4], [3, 1], [2, 2], [2, 1, 1], [1, 3], [1, 2, 1], [1, 1, 2], [1, 1, 1, 1]]
42.composition.size # => 2199023255552

Example for Enumerator.new:

def lazy_product(*enums)
  sizer = ->{
    enums.inject(1) do |product, e|
      break if (size = e.size).nil?
      product * size
    end
  }
  Enumerator.new(sizer) do |yielder|
    # ... generate combinations
  end
end

lazy_product(1..4, (1..3).each_cons(2)).size # => 8
lazy_product(1..4, (1..3).cycle).size # => Float::INFINITY

enumsize.pdf (131 KB) Marc-Andre Lafortune, 07/01/2012 05:26 AM


Related issues

Related to ruby-trunk - Feature #3715: Enumerator#size and #size= Rejected 08/19/2010
Related to ruby-trunk - Bug #7298: Behavior of Enumerator.new different between 1.9.3 and 2.0.0 Closed 11/07/2012

Associated revisions

Revision 37495
Added by Marc-Andre Lafortune over 1 year ago

  • enumerator.c (enumerator_initialize): Warn when using deprecated form [Feature #6636]

Revision 37497
Added by Marc-Andre Lafortune over 1 year ago

  • enumerator: New method #size; constructor accepts size
    [Feature #6636]

  • include/ruby/intern.h: RETURNSIZEDENUMERATOR for support of
    sized enumerators

Revision 37498
Added by Marc-Andre Lafortune over 1 year ago

  • enumerator.c (objtoenum): Have #to_enum accept a block [Feature #6636]

Revision 37499
Added by Marc-Andre Lafortune over 1 year ago

  • enumerator.c: Support #size for enumerators created from enumerators [Feature #6636]

Revision 37500
Added by Marc-Andre Lafortune over 1 year ago

  • array.c: Support for Enumerator#size in trivial cases: each, eachindex, reverseeach, sortby, collect, collect!, select, select!, keepif, reject, reject!, delete_if [Feature #6636]

Revision 37501
Added by Marc-Andre Lafortune over 1 year ago

  • array.c (rbarypermutation): Support for Array#permutation.size [Feature #6636]

Revision 37502
Added by Marc-Andre Lafortune over 1 year ago

  • array.c (rbarycombination): Support for Array#combination.size [Feature #6636]

Revision 37503
Added by Marc-Andre Lafortune over 1 year ago

  • array.c (rbaryrepeatedpermutation): Support for repeatedpermutation.size [Feature #6636]

Revision 37504
Added by Marc-Andre Lafortune over 1 year ago

  • array.c (rbaryrepeatedcombination): Support for repeatedcombination.size [Feature #6636]

Revision 37505
Added by Marc-Andre Lafortune over 1 year ago

  • array.c (rbarycycle): Support for Array#cycle.size [Feature #6636]

Revision 37506
Added by Marc-Andre Lafortune over 1 year ago

  • vmeval.c (rbf_loop): Support for loop.size [Feature #6636]

Revision 37507
Added by Marc-Andre Lafortune over 1 year ago

  • enum.c: Support for enumerators created by Enumerable with forwarding: find_all, reject, ... [Feature #6636]

Revision 37508
Added by Marc-Andre Lafortune over 1 year ago

  • enum.c (enumeachslice): Support for Enumerable#each_slice.size [Feature #6636]

Revision 37509
Added by Marc-Andre Lafortune over 1 year ago

  • enum.c (enumeachcons): Support for Enumerable#each_cons.size [Feature #6636]

Revision 37510
Added by Marc-Andre Lafortune over 1 year ago

  • enum.c (enum_cycle): Support for Enumerable#cycle.size [Feature #6636]

Revision 37511
Added by Marc-Andre Lafortune over 1 year ago

  • hash.c: Support for enumerators created by Hash: delete_if, reject!, ... [Feature #6636]

Revision 37512
Added by Marc-Andre Lafortune over 1 year ago

  • hash.c: Support for enumerators created by ENV: each, each_value, ... [Feature #6636]

Revision 37513
Added by Marc-Andre Lafortune over 1 year ago

  • struct.c: Support for Struct's enumerators #size [Feature #6636]

Revision 37514
Added by Marc-Andre Lafortune over 1 year ago

  • numeric.c: Extract rubyfloatstep_size [Feature #6636]

Revision 37515
Added by Marc-Andre Lafortune over 1 year ago

  • numeric.c (num_step): Support for Numeric#step.size [Feature #6636]

Revision 37516
Added by Marc-Andre Lafortune over 1 year ago

  • range.c: Support for Range#size and Range#each.size [Feature #6636]

Revision 37517
Added by Marc-Andre Lafortune over 1 year ago

  • range.c: Support for range.step.size [Feature #6636]

Revision 37518
Added by Marc-Andre Lafortune over 1 year ago

  • numeric.c (intupto, intdownto): Support for Integer#{down|up}to.size [Feature #6636]

Revision 37519
Added by Marc-Andre Lafortune over 1 year ago

  • numeric.c (int_dotimes): Support for Integer#times.size [Feature #6636]

Revision 37520
Added by Marc-Andre Lafortune over 1 year ago

  • string.c: Support for String#{eachbyte,eachchar,each_codepoint}.size [Feature #6636]

Revision 37521
Added by Marc-Andre Lafortune over 1 year ago

  • enumerator.c: Support for lazy.size [Feature #6636]

Revision 37522
Added by Marc-Andre Lafortune over 1 year ago

  • enumerator.c: Support for lazy.{map|flat_map|...}.size [Feature #6636]

Revision 37523
Added by Marc-Andre Lafortune over 1 year ago

  • enumerator.c: Support for lazy.take.size [Feature #6636]

Revision 37524
Added by Marc-Andre Lafortune over 1 year ago

  • enumerator.c: Add support for lazy.drop.size [Feature #6636]

Revision 37525
Added by Marc-Andre Lafortune over 1 year ago

  • enumerator.c: Support for lazy.cycle.size [Feature #6636]

Revision 37526
Added by Marc-Andre Lafortune over 1 year ago

  • NEWS: Update for lazy size evaluation [Feature #6636]

History

#1 Updated by Marc-Andre Lafortune almost 2 years ago

Attaching one-minute slide

#2 Updated by Yusuke Endoh almost 2 years ago

  • Status changed from Open to Assigned

Received. Thank you!

Yusuke Endoh mame@tsg.ne.jp

#3 Updated by Yui NARUSE over 1 year ago

How about adding Enumerator#receiver and define yourself with it.

diff --git a/enumerator.c b/enumerator.c
index f01ddd5..8e3ae9a 100644
--- a/enumerator.c
+++ b/enumerator.c
@@ -942,6 +942,32 @@ enumerator_inspect(VALUE obj)
}

/*
+ * call-seq:
+ * e.receiver -> object
+ *
+ * Returns the receiver of this enumerator.
+ /
+
+static VALUE
+enumeratorreceiver(VALUE obj)
+{
+ struct enumerator *e;
+ VALUE eobj;
+
+ TypedData
GetStruct(obj, struct enumerator, &enumeratordatatype, e);
+ if (!e || e->obj == Qundef) {
+ return Qnil;
+ }
+
+ eobj = rb
attrget(obj, idreceiver);
+ if (NIL_P(eobj)) {
+ eobj = e->obj;
+ }
+
+ return eobj;
+}
+
+/

* Yielder
*/
static void
@@ -1748,6 +1774,7 @@ InitVMEnumerator(void)
rb
definemethod(rbcEnumerator, "feed", enumeratorfeed, 1);
rb
definemethod(rbcEnumerator, "rewind", enumeratorrewind, 0);
rb
definemethod(rbcEnumerator, "inspect", enumeratorinspect, 0);
+ rb
definemethod(rbcEnumerator, "receiver", enumerator_receiver, 0);

 /* Lazy */
 rb_cLazy = rb_define_class_under(rb_cEnumerator, "Lazy", rb_cEnumerator);

irb(main):007:0> e="abcde".enumfor(:eachbyte)
=> #
irb(main):009:0> def e.size; receiver.bytesize; end
=> nil
irb(main):010:0> e.size
=> 5

#4 Updated by Marc-Andre Lafortune over 1 year ago

Hi,

On Sat, Jul 21, 2012 at 2:43 AM, naruse (Yui NARUSE) naruse@airemix.jp wrote:

How about adding Enumerator#receiver and define yourself with it.

I agree receiver could be helpful. One would also need the method and the arguments, as in my request #3714.

Still, it doesn't really address the issue.

If someone wants to write a library to output the progression, for example, it is still not possible for a general enumerable/enumerator.

The proposal is so that:
- it is standard so anyone can depend on it and also create their own enumerables/enumerators
- it can help in some calculations
- it can help to have generic progression reports
- etc.

Thanks

#5 Updated by Yusuke Endoh over 1 year ago

  • Status changed from Assigned to Feedback

Marc-Andre Lafortune,

We discussed your slide at the developer meeting (7/21).

Matz was positive to the spec of return value: Integer, Float::
INFINITY, and nil.
However, we couldn't understand what API is proposed for creating
an Enumeartor with size.

So, please revise and elaborate your API according to these two
points:

  • Enumerator.new(size) is not acceptable because of compatibility:

    p Enumerator.new([1,2,3]).take(2) #=> [1, 2]

  • We cannot determine the size of enumerator when creating it:

    a = [1]
    e = a.permutation
    a << 2
    p e.to_a #=> [[1, 2], [2, 1]]

    So, the API may need to receive a code fragment that calculates
    size, such as a Proc.

Yusuke Endoh mame@tsg.ne.jp

#6 Updated by Marc-Andre Lafortune over 1 year ago

  • Status changed from Feedback to Open

Hi,

mame (Yusuke Endoh) wrote:

Matz was positive to the spec of return value: Integer, Float::
INFINITY, and nil.

:-)

However, we couldn't understand what API is proposed for creating
an Enumeartor with size.

So, please revise and elaborate your API according to these two
points:

  • Enumerator.new(size) is not acceptable because of compatibility:

    p Enumerator.new([1,2,3]).take(2) #=> [1, 2]

Agreed.
I am proposing Enumerator.new(size_lambda){ block }, i.e. only if a block is given, then the first argument can be a lambda/proc that can lazily compute the size.

The old syntax of Enumerator.new without a block does not change meaning.

  • We cannot determine the size of enumerator when creating it:

    a = [1]
    e = a.permutation
    a << 2
    p e.to_a #=> [[1, 2], [2, 1]]

    So, the API may need to receive a code fragment that calculates
    size, such as a Proc.

Agreed.
This is why I propose that to_enum accepts a block that can calculate the size, and Enumerator.new with a block can accept a lambda/proc for the same.

Does this address the concerns?

I will be glad to propose a set of patches so we can experiment with this.

Marc-André

#7 Updated by Yusuke Endoh over 1 year ago

Hello Marc-Andre

2012/7/24, marcandre (Marc-Andre Lafortune) ruby-core@marc-andre.ca:

  • Enumerator.new(size) is not acceptable because of compatibility:

    p Enumerator.new([1,2,3]).take(2) #=> [1, 2]

Agreed.
I am proposing Enumerator.new(size_lambda){ block }, i.e. only if a block
is given, then the first argument can be a lambda/proc that can lazily
compute the size.

This is just my guess, but matz will not like such a method whose
meaning of its argument varies depending on whether block is given
or not.

The old syntax of Enumerator.new without a block does not change meaning.

Is it okay that there is no way to specify size in this case?

  • We cannot determine the size of enumerator when creating it:

    a = [1]
    e = a.permutation
    a << 2
    p e.to_a #=> [[1, 2], [2, 1]]

    So, the API may need to receive a code fragment that calculates
    size, such as a Proc.

Agreed.
This is why I propose that to_enum accepts a block that can calculate the
size, and Enumerator.new with a block can accept a lambda/proc for the
same.

What argument(s) will the lambda/proc receive?

--
Yusuke Endoh mame@tsg.ne.jp

#8 Updated by Marc-Andre Lafortune over 1 year ago

Hi,

mame (Yusuke Endoh) wrote:

I am proposing Enumerator.new(size_lambda){ block }, i.e. only if a block
is given, then the first argument can be a lambda/proc that can lazily
compute the size.

This is just my guess, but matz will not like such a method whose
meaning of its argument varies depending on whether block is given
or not.

I understand the concern.

It could still be acceptable here because the other form is already documented as 'discouraged'. Maybe we should deprecate it?

Other possibility would be to add a different creator, e.g. Enumerator.sized(size_lambda){|yielder| ... }.

The old syntax of Enumerator.new without a block does not change meaning.

Is it okay that there is no way to specify size in this case?

This old syntax is already discouraged and to_enum/enum_for should be used instead.

This is why I propose that to_enum accepts a block that can calculate the
size, and Enumerator.new with a block can accept a lambda/proc for the
same.

What argument(s) will the lambda/proc receive?

We could consider passing the receiver and/or any arguments passed to to_enum, but I would propose to keep it simple and pass no arguments.

This is because enumerators are immutable and all information held in Enumerators should be accessible from the block/lambda anyways.

Marc-André

#9 Updated by Koichi Sasada over 1 year ago

  • Assignee changed from Yukihiro Matsumoto to Yusuke Endoh

#10 Updated by Yusuke Endoh over 1 year ago

  • Status changed from Open to Assigned
  • Assignee changed from Yusuke Endoh to Yukihiro Matsumoto
  • Target version changed from 2.0.0 to next minor

#11 Updated by Marc-Andre Lafortune over 1 year ago

Hi.

My understanding was this new feature would make it into Ruby 2.0. Did I misunderstand?

The implementation can be seen here: https://github.com/marcandre/ruby/compare/marcandre:trunk...marcandre:enum_size

Although the combined diff (https://github.com/marcandre/ruby/compare/marcandre:trunk...marcandre:enum_size.diff ) is pretty big, there are really just two interesting commits. The second adds #size and extends constructor. The third extends to_enum to accept a block. See https://github.com/marcandre/ruby/commit/add_enumerator_size and https://github.com/marcandre/ruby/commit/sized_to_enum

The first commit only warns on using the deprecated form with no block Enumerator.new(obj, *args), but compatibility is maintained.

The remaining commits add support for the different ways of creating enumerators that can evaluate lazily their size. A few remain to be implemented, in particular the lazy ones.

#12 Updated by Marc-Andre Lafortune over 1 year ago

I added support for lazy enumerators.

Yusuke, Matz, did I address your questions/concerns?

#13 Updated by Yukihiro Matsumoto over 1 year ago

After skimming your modifies, I feel they are decent.
Sorry for being late to check.

Matz.

#14 Updated by Marc-Andre Lafortune over 1 year ago

  • Assignee changed from Yukihiro Matsumoto to Marc-Andre Lafortune
  • Target version changed from next minor to 2.0.0

Hi,

matz (Yukihiro Matsumoto) wrote:

After skimming your modifies, I feel they are decent.

Thanks for looking at them. Very happy to read this :-)

I'll then commit these and finish implementing size for ranges of strings.

Marc-André

#15 Updated by Marc-Andre Lafortune over 1 year ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r37495.
Marc-Andre, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • enumerator.c (enumerator_initialize): Warn when using deprecated form [Feature #6636]

Also available in: Atom PDF