Project

General

Profile

Actions

Bug #7696

closed

Lazy enumerators with state can't be rewound

Added by marcandre (Marc-Andre Lafortune) almost 12 years ago. Updated about 11 years ago.

Status:
Closed
Target version:
ruby -v:
r38800
Backport:
[ruby-core:51430]

Description

The 4 lazy enumerators requiring internal state, i.e. {take|drop}{_while}, don't work as expected after a couple next and a call to rewind.

For example:

e=(1..42).lazy.take(2)
e.next # => 1
e.next # => 2
e.rewind
e.next # => 1
e.next # => StopIteration: iteration reached an end, expected 2

This is related to #7691; the current API does not give an easy way to handle state.

Either there's a dedicated callback to rewind, or data must be attached to the yielder.


Related issues 2 (0 open2 closed)

Related to Ruby master - Bug #7691: 3 bugs with Lazy enumerators with stateClosed01/14/2013Actions
Related to Ruby master - Feature #8840: Yielder#stateRejectedmatz (Yukihiro Matsumoto)Actions
Actions #1

Updated by shugo (Shugo Maeda) almost 12 years ago

marcandre (Marc-Andre Lafortune) wrote:

The 4 lazy enumerators requiring internal state, i.e. {take|drop}{_while}, don't work as expected after a couple next and a call to rewind.

For example:

e=(1..42).lazy.take(2)
e.next # => 1
e.next # => 2
e.rewind
e.next # => 1
e.next # => StopIteration: iteration reached an end, expected 2

This is related to #7691; the current API does not give an easy way to handle state.

#7691 is the basically same issue as #6142.

Do you have a draft API spec in mind?
It might be too late to introduce a new API for 2.0.0, though.

Updated by marcandre (Marc-Andre Lafortune) almost 12 years ago

I had not see #6142. Yes, it's the same.

  1. Add attr_accessor to Yielder and document the fact that the yielder is unique to a single enumeration run.

    def drop(n)
    n = Backports::coerce_to_int(n)
    Lazy.new(self) do |yielder, *values|
    yielder.memo ||= n
    if yielder.memo > 0
    yielder.memo -= 1
    else
    yielder.yield(*values)
    end
    end
    end

  2. Optional named argument prepare for Lazy.new to be called before each:

    def drop(n)
    remain = nil
    Lazy.new(self, prepare: ->{remain = n}) do |yielder, *values|
    if remain > 0
    remain -= 1
    else
    yielder.yield(*values)
    end
    end
    end

  3. Add a method 'prepare' method to Enumerator::Lazy and encourage subclassing. prepare would be called before each

    class LazyDropper < Enumerator::Lazy
    def initialize(obj, n)
    @n = n
    super do |yielder, *values|
    if @remain > 0
    @remain -= 1
    else
    yielder.yield(*values)
    end
    end

    def prepare
      @remain = @n
    end
    

    end

MRI should use the same kind of mechanism for the enums requiring state

Updated by marcandre (Marc-Andre Lafortune) almost 12 years ago

Oh oh.

Let me add a third example to problems of handling states:

e = (1..6).lazy.drop(3)
e.flat_map{e}.force # => [*(1..6)]*3, should be  [*(1..3)]*3

I believe only the first solution would work for this.

Updated by marcandre (Marc-Andre Lafortune) almost 12 years ago

Here's a patch for take. Does it look ok?

diff --git a/enumerator.c b/enumerator.c
index b65712f..1522a3f 100644
--- a/enumerator.c
+++ b/enumerator.c
@@ -105,7 +105,7 @@
VALUE rb_cEnumerator;
VALUE rb_cLazy;
static ID id_rewind, id_each, id_new, id_initialize, id_yield, id_call, id_size;
-static ID id_eqq, id_next, id_result, id_lazy, id_receiver, id_arguments, id_method, id_force;
+static ID id_eqq, id_next, id_result, id_lazy, id_receiver, id_arguments, id_memo, id_method, id_force;
static VALUE sym_each, sym_cycle;

VALUE rb_eStopIteration;
@@ -1641,14 +1641,18 @@ lazy_zip(int argc, VALUE *argv, VALUE obj)
static VALUE
lazy_take_func(VALUE val, VALUE args, int argc, VALUE *argv)
{

  • NODE *memo = RNODE(args);
  • long remain;

  • VALUE memo = rb_ivar_get(argv[0], id_memo);

  • if (NIL_P(memo)) {

  •   memo = args;
    
  • }

    rb_funcall2(argv[0], id_yield, argc - 1, argv + 1);

  • if (--memo->u3.cnt == 0) {
  •   memo->u3.cnt = memo->u2.argc;
    
  • if ((remain = NUM2LONG(memo)-1) == 0) {
    return Qundef;
    }
    else {
  •   rb_ivar_set(argv[0], id_memo, LONG2NUM(remain));
      return Qnil;
    
    }
    }
    @@ -1666,7 +1670,6 @@ lazy_take_size(VALUE lazy)
    static VALUE
    lazy_take(VALUE obj, VALUE n)
    {
  • NODE *memo;
    long len = NUM2LONG(n);
    int argc = 1;
    VALUE argv[3];
    @@ -1680,9 +1683,8 @@ lazy_take(VALUE obj, VALUE n)
    argv[2] = INT2NUM(0);
    argc = 3;
    }
  • memo = NEW_MEMO(0, len, len);
    return lazy_set_method(rb_block_call(rb_cLazy, id_new, argc, argv,
  •                                    lazy_take_func, (VALUE) memo),
    
  •                                    lazy_take_func, n),
                         rb_ary_new3(1, n), lazy_take_size);
    

}

@@ -1955,6 +1957,7 @@ Init_Enumerator(void)
id_eqq = rb_intern("===");
id_receiver = rb_intern("receiver");
id_arguments = rb_intern("arguments");

  • id_memo = rb_intern("memo");
    id_method = rb_intern("method");
    id_force = rb_intern("force");
    sym_each = ID2SYM(id_each);
    diff --git a/test/ruby/test_lazy_enumerator.rb b/test/ruby/test_lazy_enumerator.rb
    index acd4843..35e92c9 100644
    --- a/test/ruby/test_lazy_enumerator.rb
    +++ b/test/ruby/test_lazy_enumerator.rb
    @@ -243,6 +243,23 @@ class TestLazyEnumerator < Test::Unit::TestCase
    assert_equal((1..5).to_a, take5.force, bug6428)
    end

  • def test_take_nested

  • bug7696 = '[ruby-core:51470]'

  • a = Step.new(1..10)

  • take5 = a.lazy.take(5)

  • assert_equal([*(1..5)]*5, take5.flat_map{take5}.force, bug7696)

  • end

  • def test_take_rewound

  • bug7696 = '[ruby-core:51470]'

  • e=(1..42).lazy.take(2)

  • assert_equal 1, e.next

  • assert_equal 2, e.next

  • e.rewind

  • assert_equal 1, e.next

  • assert_equal 2, e.next

  • end

  • def test_take_while
    a = Step.new(1..10)
    assert_equal(1, a.take_while {|i| i < 5}.first)

Updated by shugo (Shugo Maeda) almost 12 years ago

marcandre (Marc-Andre Lafortune) wrote:

Here's a patch for take. Does it look ok?

Does it work well for zip?
I wonder how arguments are rewound.

Updated by marcandre (Marc-Andre Lafortune) almost 12 years ago

The same idea will work for zip too; the arguments must be converted to enumerators only when yielder.memo is not set, i.e. every new yielder.

The arguments are never really rewound, but the first yielder holding them will not be reused after enum.rewind; a new yielder is given.
I haven't checked if that's the current behavior in other implementations, but if yielder is to be the state holder, that's the way it should work.

enum = (1..2).lazy.zip(1..2)
enum.next # => yielder.memo was nil, so (1..2).each is called and stored in the memo
enum.rewind # => the original yielder is discarded
enum.next # => the second yielder.memo is nil, so (1..2).each is called again and stored in the memo

I'm using the same idea in my backports gem to implement this in pure Ruby:

https://github.com/marcandre/backports/blob/master/lib/backports/2.0.0/enumerator/lazy.rb

The only other possibility that would work is to pass yield extra 'state' argument when required, like:

def drop(n)
Lazy.new(self, state: ->{{remain: n}}) do |yielder, state, *values|
if state[:remain] > 0
state[:remain] -= 1
else
yielder.yield(*values)
end
end
end

Maybe the :state option could be an object instead of a lmabda, which would be dupped before each enumerations:

def drop(n)
Lazy.new(self, state: {remain: n}) do |yielder, state, *values|
if state[:remain] > 0
state[:remain] -= 1
else
yielder.yield(*values)
end
end
end

Both solutions seem complex to me, but would definitely put the correct emphasis on how to deal with state.

So, does the patch look acceptable as far as MRI is concerned?
For the public api, should there be a public Yielder#memo and a guarantee that there is exactly one yielder object per iteration? or instead an extra state yielded when required?

Updated by shugo (Shugo Maeda) almost 12 years ago

marcandre (Marc-Andre Lafortune) wrote:

The same idea will work for zip too; the arguments must be converted to enumerators only when yielder.memo is not set, i.e. every new yielder.

The arguments are never really rewound, but the first yielder holding them will not be reused after enum.rewind; a new yielder is given.
I haven't checked if that's the current behavior in other implementations, but if yielder is to be the state holder, that's the way it should work.

So, the following behavior is intended, right?

$ ruby1.9.3 -I ~/src/backports/lib -r backports -r backports/2.0.0/enumerable -e "a = (1..3).lazy.zip(('a'..'z').each); p a.to_a; p a.to_a"
[[1, "a"], [2, "b"], [3, "c"]]
[[1, "d"], [2, "e"], [3, "f"]]

So, does the patch look acceptable as far as MRI is concerned?

If the above behavior is intended, the patch looks acceptable.

For the public api, should there be a public Yielder#memo and a guarantee that there is exactly one yielder object per iteration? or instead an extra state yielded when required?

It might be too late to introduce a new API for 2.0.0, so how about to move it to next minor?

Updated by marcandre (Marc-Andre Lafortune) almost 12 years ago

shugo (Shugo Maeda) wrote:

So, the following behavior is intended, right?

$ ruby1.9.3 -I ~/src/backports/lib -r backports -r backports/2.0.0/enumerable -e "a = (1..3).lazy.zip(('a'..'z').each); p a.to_a; p a.to_a"
[[1, "a"], [2, "b"], [3, "c"]]
[[1, "d"], [2, "e"], [3, "f"]]

That's a very good question.

It probably would be best to call to_enum instead of each. Calling next|rewind on an enumerator should really only affect other calls to next. With to_enum, we'll get the same result every time.

Similarly, we expect (1..3).zip(('a'..'z').each.tap(&:next)) to return [[1, 'a'], ..., and not [[1, 'b'], ... even though next was called on the given enumerator.

If the above behavior is intended, the patch looks acceptable.

Thanks for reviewing it. I'll commit it, changing the call to each to to_enum (unless there are objections). I'll use the same technique to fix the other 3 lazy enumerators with state.

For the public api, should there be a public Yielder#memo and a guarantee that there is exactly one yielder object per iteration? or instead an extra state yielded when required?

It might be too late to introduce a new API for 2.0.0, so how about to move it to next minor?

I understand. On the other hand, the API for Lazy.new was never decided upon and really should be finalized before 2.0.0!

If we opt for the Yielder#memo way (and agree on the name), maybe we can convince mame to accept such a trivial change. In that case, the biggest "change" is the explicit guarantee of a different yielder per iteration. It's already the case (also for JRuby and rubinius), but AFAIK it's never been official.

With the modified yielding way, it would be nice to include it in Lazy.new's api in this version, especially since it is still being finalized.

At the very least, a note in the documentation about handling state would be needed for 2.0.0.

Updated by shugo (Shugo Maeda) almost 12 years ago

  • Assignee set to matz (Yukihiro Matsumoto)

marcandre (Marc-Andre Lafortune) wrote:

shugo (Shugo Maeda) wrote:

So, the following behavior is intended, right?

$ ruby1.9.3 -I ~/src/backports/lib -r backports -r backports/2.0.0/enumerable -e "a = (1..3).lazy.zip(('a'..'z').each); p a.to_a; p a.to_a"
[[1, "a"], [2, "b"], [3, "c"]]
[[1, "d"], [2, "e"], [3, "f"]]

That's a very good question.

It probably would be best to call to_enum instead of each. Calling next|rewind on an enumerator should really only affect other calls to next. With to_enum, we'll get the same result every time.

Even if to_enum is called, IO instances never be rewound, but I guess IO instances need not be rewound.

It might be too late to introduce a new API for 2.0.0, so how about to move it to next minor?

I understand. On the other hand, the API for Lazy.new was never decided upon and really should be finalized before 2.0.0!

If we opt for the Yielder#memo way (and agree on the name), maybe we can convince mame to accept such a trivial change. In that case, the biggest "change" is the explicit guarantee of a different yielder per iteration. It's already the case (also for JRuby and rubinius), but AFAIK it's never been official.

`The explicit guarantee of a different yielder per iteration' sounds acceptable for me, but I'm not sure whether Yielder#memo is a good name.

With the modified yielding way, it would be nice to include it in Lazy.new's api in this version, especially since it is still being finalized.

At the very least, a note in the documentation about handling state would be needed for 2.0.0.

I believe Matz should decide it.

Updated by shugo (Shugo Maeda) almost 12 years ago

  • Status changed from Open to Assigned
Actions #11

Updated by marcandre (Marc-Andre Lafortune) almost 12 years ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r38920.
Marc-Andre, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • enumerator.c: Fix state handling for Lazy#take
    [bug #7696]

  • test/ruby/test_lazy_enumerator.rb: test for above

Updated by marcandre (Marc-Andre Lafortune) almost 12 years ago

  • Status changed from Closed to Open

Bugs in MRI are fixed, but keeping open so Matz can decide how users should handle state.

Updated by mame (Yusuke Endoh) over 11 years ago

  • Subject changed from Lazy enumerators with state can't be rewound to Lazy enumerators with state can&#x27;t be rewound
  • Status changed from Open to Assigned
  • Target version changed from 2.0.0 to 2.6

Updated by naruse (Yui NARUSE) about 11 years ago

  • Target version changed from 2.6 to 2.1.0

Updated by marcandre (Marc-Andre Lafortune) about 11 years ago

  • Status changed from Assigned to Closed

I created a new feature request #8840, so I'm closing this.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0