Project

General

Profile

Actions

Feature #6636

closed

Enumerable#size

Added by marcandre (Marc-Andre Lafortune) almost 12 years ago. Updated over 11 years ago.

Status:
Closed
Target version:
[ruby-core:45805]

Description

Now that it has been made clear that Enumerable#count never calls #size and that we have Enumerable#lazy, let me propose again an API for a lazy way to get the size of an Enumerable: Enumerable#size.

  • call-seq:
  • enum.size # => nil, Integer or Float::INFINITY
  • Returns the number of elements that will be yielded, without going through
  • the iteration (i.e. lazy), or +nil+ if it can't be calculated lazily.
  • perm = (1..100).to_a.permutation(4)
  • perm.size # => 94109400
  • perm.each_cons(2).size # => 94109399
  • loop.size # => Float::INFINITY
  • [42].drop_while.size # => nil

About 66 core methods returning enumerators would have a lazy size, like each_slice, permutation or lazy.take.

A few would have size return nil:
Array#{r}index, {take|drop}_while
Enumerable#find{_index}, {take|drop}_while
IO: all methods

Sized enumerators can also be created naturally by providing a block to to_enum/enum_for or a lambda to Enumerator.new.

Example for to_enum:

class Integer
  def composition
    return to_enum(:composition){ 1 << (self - 1) } unless block_given?
    yield [] if zero?
    downto(1) do |i|
      (self - i).composition do |comp|
        yield [i, *comp]
      end
    end
  end
end

4.composition.to_a
# => [[4], [3, 1], [2, 2], [2, 1, 1], [1, 3], [1, 2, 1], [1, 1, 2], [1, 1, 1, 1]]
42.composition.size # => 2199023255552

Example for Enumerator.new:

def lazy_product(*enums)
  sizer = ->{
    enums.inject(1) do |product, e|
      break if (size = e.size).nil?
      product * size
    end
  }
  Enumerator.new(sizer) do |yielder|
    # ... generate combinations
  end
end

lazy_product(1..4, (1..3).each_cons(2)).size # => 8
lazy_product(1..4, (1..3).cycle).size # => Float::INFINITY

Files

enumsize.pdf (131 KB) enumsize.pdf marcandre (Marc-Andre Lafortune), 07/01/2012 05:26 AM

Related issues 2 (0 open2 closed)

Related to Ruby master - Feature #3715: Enumerator#size and #size=Rejected08/19/2010Actions
Related to Ruby master - Bug #7298: Behavior of Enumerator.new different between 1.9.3 and 2.0.0Closedayumin (Ayumu AIZAWA)11/07/2012Actions

Updated by marcandre (Marc-Andre Lafortune) almost 12 years ago

Attaching one-minute slide

Updated by mame (Yusuke Endoh) almost 12 years ago

  • Status changed from Open to Assigned

Received. Thank you!

--
Yusuke Endoh

Updated by naruse (Yui NARUSE) over 11 years ago

How about adding Enumerator#receiver and define yourself with it.

diff --git a/enumerator.c b/enumerator.c
index f01ddd5..8e3ae9a 100644
--- a/enumerator.c
+++ b/enumerator.c
@@ -942,6 +942,32 @@ enumerator_inspect(VALUE obj)
}

/*

    • call-seq:
    • e.receiver -> object
    • Returns the receiver of this enumerator.
  • */

+static VALUE
+enumerator_receiver(VALUE obj)
+{

  • struct enumerator *e;
  • VALUE eobj;
  • TypedData_Get_Struct(obj, struct enumerator, &enumerator_data_type, e);
  • if (!e || e->obj == Qundef) {
  •   return Qnil;
    
  • }
  • eobj = rb_attr_get(obj, id_receiver);
  • if (NIL_P(eobj)) {
  •   eobj = e->obj;
    
  • }
  • return eobj;
    +}

+/*

  • Yielder
    */
    static void
    @@ -1748,6 +1774,7 @@ InitVM_Enumerator(void)
    rb_define_method(rb_cEnumerator, "feed", enumerator_feed, 1);
    rb_define_method(rb_cEnumerator, "rewind", enumerator_rewind, 0);
    rb_define_method(rb_cEnumerator, "inspect", enumerator_inspect, 0);
  • rb_define_method(rb_cEnumerator, "receiver", enumerator_receiver, 0);

    /* Lazy */
    rb_cLazy = rb_define_class_under(rb_cEnumerator, "Lazy", rb_cEnumerator);

irb(main):007:0> e="abcde".enum_for(:each_byte)
=> #<Enumerator: "abcde":each_byte>
irb(main):009:0> def e.size; receiver.bytesize; end
=> nil
irb(main):010:0> e.size
=> 5

Updated by marcandre (Marc-Andre Lafortune) over 11 years ago

Hi,

On Sat, Jul 21, 2012 at 2:43 AM, naruse (Yui NARUSE) wrote:

How about adding Enumerator#receiver and define yourself with it.

I agree receiver could be helpful. One would also need the method and the arguments, as in my request #3714.

Still, it doesn't really address the issue.

If someone wants to write a library to output the progression, for example, it is still not possible for a general enumerable/enumerator.

The proposal is so that:

  • it is standard so anyone can depend on it and also create their own enumerables/enumerators
  • it can help in some calculations
  • it can help to have generic progression reports
  • etc.

Thanks

Updated by mame (Yusuke Endoh) over 11 years ago

  • Status changed from Assigned to Feedback

Marc-Andre Lafortune,

We discussed your slide at the developer meeting (7/21).

Matz was positive to the spec of return value: Integer, Float::
INFINITY, and nil.
However, we couldn't understand what API is proposed for creating
an Enumeartor with size.

So, please revise and elaborate your API according to these two
points:

  • Enumerator.new(size) is not acceptable because of compatibility:

    p Enumerator.new([1,2,3]).take(2) #=> [1, 2]

  • We cannot determine the size of enumerator when creating it:

    a = [1]
    e = a.permutation
    a << 2
    p e.to_a #=> [[1, 2], [2, 1]]

    So, the API may need to receive a code fragment that calculates
    size, such as a Proc.

--
Yusuke Endoh

Updated by marcandre (Marc-Andre Lafortune) over 11 years ago

  • Status changed from Feedback to Open

Hi,

mame (Yusuke Endoh) wrote:

Matz was positive to the spec of return value: Integer, Float::
INFINITY, and nil.

:-)

However, we couldn't understand what API is proposed for creating
an Enumeartor with size.

So, please revise and elaborate your API according to these two
points:

  • Enumerator.new(size) is not acceptable because of compatibility:

    p Enumerator.new([1,2,3]).take(2) #=> [1, 2]

Agreed.
I am proposing Enumerator.new(size_lambda){ block }, i.e. only if a block is given, then the first argument can be a lambda/proc that can lazily compute the size.

The old syntax of Enumerator.new without a block does not change meaning.

  • We cannot determine the size of enumerator when creating it:

    a = [1]
    e = a.permutation
    a << 2
    p e.to_a #=> [[1, 2], [2, 1]]

    So, the API may need to receive a code fragment that calculates
    size, such as a Proc.

Agreed.
This is why I propose that to_enum accepts a block that can calculate the size, and Enumerator.new with a block can accept a lambda/proc for the same.

Does this address the concerns?

I will be glad to propose a set of patches so we can experiment with this.

Marc-André

Updated by mame (Yusuke Endoh) over 11 years ago

Hello Marc-Andre

2012/7/24, marcandre (Marc-Andre Lafortune) :

  • Enumerator.new(size) is not acceptable because of compatibility:

    p Enumerator.new([1,2,3]).take(2) #=> [1, 2]

Agreed.
I am proposing Enumerator.new(size_lambda){ block }, i.e. only if a block
is given, then the first argument can be a lambda/proc that can lazily
compute the size.

This is just my guess, but matz will not like such a method whose
meaning of its argument varies depending on whether block is given
or not.

The old syntax of Enumerator.new without a block does not change meaning.

Is it okay that there is no way to specify size in this case?

  • We cannot determine the size of enumerator when creating it:

    a = [1]
    e = a.permutation
    a << 2
    p e.to_a #=> [[1, 2], [2, 1]]

    So, the API may need to receive a code fragment that calculates
    size, such as a Proc.

Agreed.
This is why I propose that to_enum accepts a block that can calculate the
size, and Enumerator.new with a block can accept a lambda/proc for the
same.

What argument(s) will the lambda/proc receive?

--
Yusuke Endoh

Updated by marcandre (Marc-Andre Lafortune) over 11 years ago

Hi,

mame (Yusuke Endoh) wrote:

I am proposing Enumerator.new(size_lambda){ block }, i.e. only if a block
is given, then the first argument can be a lambda/proc that can lazily
compute the size.

This is just my guess, but matz will not like such a method whose
meaning of its argument varies depending on whether block is given
or not.

I understand the concern.

It could still be acceptable here because the other form is already documented as 'discouraged'. Maybe we should deprecate it?

Other possibility would be to add a different creator, e.g. Enumerator.sized(size_lambda){|yielder| ... }.

The old syntax of Enumerator.new without a block does not change meaning.

Is it okay that there is no way to specify size in this case?

This old syntax is already discouraged and to_enum/enum_for should be used instead.

This is why I propose that to_enum accepts a block that can calculate the
size, and Enumerator.new with a block can accept a lambda/proc for the
same.

What argument(s) will the lambda/proc receive?

We could consider passing the receiver and/or any arguments passed to to_enum, but I would propose to keep it simple and pass no arguments.

This is because enumerators are immutable and all information held in Enumerators should be accessible from the block/lambda anyways.

Marc-André

Updated by ko1 (Koichi Sasada) over 11 years ago

  • Assignee changed from matz (Yukihiro Matsumoto) to mame (Yusuke Endoh)

Updated by mame (Yusuke Endoh) over 11 years ago

  • Status changed from Open to Assigned
  • Assignee changed from mame (Yusuke Endoh) to matz (Yukihiro Matsumoto)
  • Target version changed from 2.0.0 to 2.6

Updated by marcandre (Marc-Andre Lafortune) over 11 years ago

Hi.

My understanding was this new feature would make it into Ruby 2.0. Did I misunderstand?

The implementation can be seen here: https://github.com/marcandre/ruby/compare/marcandre:trunk...marcandre:enum_size

Although the combined diff (https://github.com/marcandre/ruby/compare/marcandre:trunk...marcandre:enum_size.diff ) is pretty big, there are really just two interesting commits. The second adds #size and extends constructor. The third extends to_enum to accept a block. See https://github.com/marcandre/ruby/commit/add_enumerator_size and https://github.com/marcandre/ruby/commit/sized_to_enum

The first commit only warns on using the deprecated form with no block Enumerator.new(obj, *args), but compatibility is maintained.

The remaining commits add support for the different ways of creating enumerators that can evaluate lazily their size. A few remain to be implemented, in particular the lazy ones.

Updated by marcandre (Marc-Andre Lafortune) over 11 years ago

I added support for lazy enumerators.

Yusuke, Matz, did I address your questions/concerns?

Updated by matz (Yukihiro Matsumoto) over 11 years ago

After skimming your modifies, I feel they are decent.
Sorry for being late to check.

Matz.

Updated by marcandre (Marc-Andre Lafortune) over 11 years ago

  • Assignee changed from matz (Yukihiro Matsumoto) to marcandre (Marc-Andre Lafortune)
  • Target version changed from 2.6 to 2.0.0

Hi,

matz (Yukihiro Matsumoto) wrote:

After skimming your modifies, I feel they are decent.

Thanks for looking at them. Very happy to read this :-)

I'll then commit these and finish implementing size for ranges of strings.

Marc-André

Actions #15

Updated by marcandre (Marc-Andre Lafortune) over 11 years ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r37495.
Marc-Andre, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • enumerator.c (enumerator_initialize): Warn when using deprecated form
    [Feature #6636]
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0