Bug #7696

Lazy enumerators with state can't be rewound

Added by Marc-Andre Lafortune over 1 year ago. Updated 8 months ago.

[ruby-core:51430]
Status:Closed
Priority:High
Assignee:Yukihiro Matsumoto
Category:core
Target version:2.1.0
ruby -v:r38800 Backport:

Description

The 4 lazy enumerators requiring internal state, i.e. {take|drop}{_while}, don't work as expected after a couple next and a call to rewind.

For example:

e=(1..42).lazy.take(2)
e.next # => 1
e.next # => 2
e.rewind
e.next # => 1
e.next # => StopIteration: iteration reached an end, expected 2

This is related to #7691; the current API does not give an easy way to handle state.

Either there's a dedicated callback to rewind, or data must be attached to the yielder.


Related issues

Related to ruby-trunk - Bug #7691: 3 bugs with Lazy enumerators with state Closed 01/14/2013
Related to ruby-trunk - Feature #8840: Yielder#state Feedback 08/31/2013

Associated revisions

Revision 38920
Added by Marc-Andre Lafortune about 1 year ago

  • enumerator.c: Fix state handling for Lazy#take
    [bug #7696]

  • test/ruby/testlazyenumerator.rb: test for above

Revision 38921
Added by Marc-Andre Lafortune about 1 year ago

  • enumerator.c: Fix state handling for Lazy#drop_while
    [bug #7696] [bug #7691]

  • test/ruby/testlazyenumerator.rb: test for above

Revision 38922
Added by Marc-Andre Lafortune about 1 year ago

  • enumerator.c: Fix state handling for Lazy#drop
    [bug #7696] [bug #7691]

  • test/ruby/testlazyenumerator.rb: test for above

Revision 38923
Added by Marc-Andre Lafortune about 1 year ago

  • enumerator.c: Fix state handling for Lazy#zip
    [bug #7696] [bug #7691]

  • test/ruby/testlazyenumerator.rb: test for above

History

#1 Updated by Shugo Maeda over 1 year ago

marcandre (Marc-Andre Lafortune) wrote:

The 4 lazy enumerators requiring internal state, i.e. {take|drop}{_while}, don't work as expected after a couple next and a call to rewind.

For example:

e=(1..42).lazy.take(2)
e.next # => 1
e.next # => 2
e.rewind
e.next # => 1
e.next # => StopIteration: iteration reached an end, expected 2

This is related to #7691; the current API does not give an easy way to handle state.

#7691 is the basically same issue as #6142.

Do you have a draft API spec in mind?
It might be too late to introduce a new API for 2.0.0, though.

#2 Updated by Marc-Andre Lafortune over 1 year ago

I had not see #6142. Yes, it's the same.

1) Add attr_accessor to Yielder and document the fact that the yielder is unique to a single enumeration run.

def drop(n)
  n = Backports::coerce_to_int(n)
  Lazy.new(self) do |yielder, *values|
    yielder.memo ||= n
    if yielder.memo > 0
      yielder.memo -= 1
    else
      yielder.yield(*values)
    end
  end
end

2) Optional named argument prepare for Lazy.new to be called before each:

def drop(n)
  remain = nil
  Lazy.new(self, prepare: ->{remain = n}) do |yielder, *values|
    if remain > 0
      remain -= 1
    else
      yielder.yield(*values)
    end
  end
end

3) Add a method 'prepare' method to Enumerator::Lazy and encourage subclassing. prepare would be called before each

class LazyDropper < Enumerator::Lazy
  def initialize(obj, n)
    @n = n
    super do |yielder, *values|
      if @remain > 0
        @remain -= 1
      else
        yielder.yield(*values)
      end
   end

   def prepare
     @remain = @n
   end
end

MRI should use the same kind of mechanism for the enums requiring state

#3 Updated by Marc-Andre Lafortune over 1 year ago

Oh oh.

Let me add a third example to problems of handling states:

e = (1..6).lazy.drop(3)
e.flat_map{e}.force # => [*(1..6)]*3, should be  [*(1..3)]*3

I believe only the first solution would work for this.

#4 Updated by Marc-Andre Lafortune over 1 year ago

Here's a patch for take. Does it look ok?

diff --git a/enumerator.c b/enumerator.c
index b65712f..1522a3f 100644
--- a/enumerator.c
+++ b/enumerator.c
@@ -105,7 +105,7 @@
VALUE rbcEnumerator;
VALUE rb
cLazy;
static ID idrewind, ideach, idnew, idinitialize, idyield, idcall, idsize;
-static ID id
eqq, idnext, idresult, idlazy, idreceiver, idarguments, idmethod, idforce;
+static ID id
eqq, idnext, idresult, idlazy, idreceiver, idarguments, idmemo, idmethod, idforce;
static VALUE symeach, symcycle;

VALUE rbeStopIteration;
@@ -1641,14 +1641,18 @@ lazy
zip(int argc, VALUE *argv, VALUE obj)
static VALUE
lazytakefunc(VALUE val, VALUE args, int argc, VALUE *argv)
{
- NODE *memo = RNODE(args);
+ long remain;
+ VALUE memo = rbivarget(argv[0], idmemo);
+ if (NIL
P(memo)) {
+ memo = args;
+ }

 rb_funcall2(argv[0], id_yield, argc - 1, argv + 1);
  • if (--memo->u3.cnt == 0) {
  • memo->u3.cnt = memo->u2.argc;
  • if ((remain = NUM2LONG(memo)-1) == 0) { return Qundef; } else {
  • rbivarset(argv[0], idmemo, LONG2NUM(remain)); return Qnil; } } @@ -1666,7 +1670,6 @@ lazytakesize(VALUE lazy) static VALUE lazytake(VALUE obj, VALUE n) {
  • NODE *memo; long len = NUM2LONG(n); int argc = 1; VALUE argv[3]; @@ -1680,9 +1683,8 @@ lazy_take(VALUE obj, VALUE n) argv[2] = INT2NUM(0); argc = 3; }
  • memo = NEWMEMO(0, len, len); return lazysetmethod(rbblockcall(rbcLazy, id_new, argc, argv,
  • lazytakefunc, (VALUE) memo),
  • lazytakefunc, n), rbarynew3(1, n), lazytakesize); }

@@ -1955,6 +1957,7 @@ InitEnumerator(void)
id
eqq = rbintern("===");
id
receiver = rbintern("receiver");
id
arguments = rbintern("arguments");
+ id
memo = rbintern("memo");
id
method = rbintern("method");
id
force = rbintern("force");
sym
each = ID2SYM(ideach);
diff --git a/test/ruby/test
lazyenumerator.rb b/test/ruby/testlazyenumerator.rb
index acd4843..35e92c9 100644
--- a/test/ruby/test
lazyenumerator.rb
+++ b/test/ruby/test
lazyenumerator.rb
@@ -243,6 +243,23 @@ class TestLazyEnumerator < Test::Unit::TestCase
assert
equal((1..5).to_a, take5.force, bug6428)
end

  • def testtakenested
  • bug7696 = ''
  • a = Step.new(1..10)
  • take5 = a.lazy.take(5)
  • assertequal([(1..5)]5, take5.flatmap{take5}.force, bug7696)
  • end +
  • def testtakerewound
  • bug7696 = ''
  • e=(1..42).lazy.take(2)
  • assert_equal 1, e.next
  • assert_equal 2, e.next
  • e.rewind
  • assert_equal 1, e.next
  • assert_equal 2, e.next
  • end + def testtakewhile a = Step.new(1..10) assertequal(1, a.takewhile {|i| i < 5}.first)

#5 Updated by Shugo Maeda over 1 year ago

marcandre (Marc-Andre Lafortune) wrote:

Here's a patch for take. Does it look ok?

Does it work well for zip?
I wonder how arguments are rewound.

#6 Updated by Marc-Andre Lafortune over 1 year ago

The same idea will work for zip too; the arguments must be converted to enumerators only when yielder.memo is not set, i.e. every new yielder.

The arguments are never really rewound, but the first yielder holding them will not be reused after enum.rewind; a new yielder is given.
I haven't checked if that's the current behavior in other implementations, but if yielder is to be the state holder, that's the way it should work.

enum = (1..2).lazy.zip(1..2)
enum.next # => yielder.memo was nil, so (1..2).each is called and stored in the memo
enum.rewind # => the original yielder is discarded
enum.next # => the second yielder.memo is nil, so (1..2).each is called again and stored in the memo

I'm using the same idea in my backports gem to implement this in pure Ruby:

https://github.com/marcandre/backports/blob/master/lib/backports/2.0.0/enumerator/lazy.rb

The only other possibility that would work is to pass yield extra 'state' argument when required, like:

def drop(n)
Lazy.new(self, state: ->{{remain: n}}) do |yielder, state, values|
if state[:remain] > 0
state[:remain] -= 1
else
yielder.yield(
values)
end
end
end

Maybe the :state option could be an object instead of a lmabda, which would be dupped before each enumerations:

def drop(n)
Lazy.new(self, state: {remain: n}) do |yielder, state, values|
if state[:remain] > 0
state[:remain] -= 1
else
yielder.yield(
values)
end
end
end

Both solutions seem complex to me, but would definitely put the correct emphasis on how to deal with state.

So, does the patch look acceptable as far as MRI is concerned?
For the public api, should there be a public Yielder#memo and a guarantee that there is exactly one yielder object per iteration? or instead an extra state yielded when required?

#7 Updated by Shugo Maeda over 1 year ago

marcandre (Marc-Andre Lafortune) wrote:

The same idea will work for zip too; the arguments must be converted to enumerators only when yielder.memo is not set, i.e. every new yielder.

The arguments are never really rewound, but the first yielder holding them will not be reused after enum.rewind; a new yielder is given.
I haven't checked if that's the current behavior in other implementations, but if yielder is to be the state holder, that's the way it should work.

So, the following behavior is intended, right?

$ ruby1.9.3 -I ~/src/backports/lib -r backports -r backports/2.0.0/enumerable -e "a = (1..3).lazy.zip(('a'..'z').each); p a.toa; p a.toa"
[[1, "a"], [2, "b"], [3, "c"]]
[[1, "d"], [2, "e"], [3, "f"]]

So, does the patch look acceptable as far as MRI is concerned?

If the above behavior is intended, the patch looks acceptable.

For the public api, should there be a public Yielder#memo and a guarantee that there is exactly one yielder object per iteration? or instead an extra state yielded when required?

It might be too late to introduce a new API for 2.0.0, so how about to move it to next minor?

#8 Updated by Marc-Andre Lafortune over 1 year ago

shugo (Shugo Maeda) wrote:

So, the following behavior is intended, right?

$ ruby1.9.3 -I ~/src/backports/lib -r backports -r backports/2.0.0/enumerable -e "a = (1..3).lazy.zip(('a'..'z').each); p a.toa; p a.toa"
[[1, "a"], [2, "b"], [3, "c"]]
[[1, "d"], [2, "e"], [3, "f"]]

That's a very good question.

It probably would be best to call to_enum instead of each. Calling next|rewind on an enumerator should really only affect other calls to next. With to_enum, we'll get the same result every time.

Similarly, we expect (1..3).zip(('a'..'z').each.tap(&:next)) to return [[1, 'a'], ..., and not [[1, 'b'], ... even though next was called on the given enumerator.

If the above behavior is intended, the patch looks acceptable.

Thanks for reviewing it. I'll commit it, changing the call to each to to_enum (unless there are objections). I'll use the same technique to fix the other 3 lazy enumerators with state.

For the public api, should there be a public Yielder#memo and a guarantee that there is exactly one yielder object per iteration? or instead an extra state yielded when required?

It might be too late to introduce a new API for 2.0.0, so how about to move it to next minor?

I understand. On the other hand, the API for Lazy.new was never decided upon and really should be finalized before 2.0.0!

If we opt for the Yielder#memo way (and agree on the name), maybe we can convince mame to accept such a trivial change. In that case, the biggest "change" is the explicit guarantee of a different yielder per iteration. It's already the case (also for JRuby and rubinius), but AFAIK it's never been official.

With the modified yielding way, it would be nice to include it in Lazy.new's api in this version, especially since it is still being finalized.

At the very least, a note in the documentation about handling state would be needed for 2.0.0.

#9 Updated by Shugo Maeda over 1 year ago

  • Assignee set to Yukihiro Matsumoto

marcandre (Marc-Andre Lafortune) wrote:

shugo (Shugo Maeda) wrote:

So, the following behavior is intended, right?

$ ruby1.9.3 -I ~/src/backports/lib -r backports -r backports/2.0.0/enumerable -e "a = (1..3).lazy.zip(('a'..'z').each); p a.toa; p a.toa"
[[1, "a"], [2, "b"], [3, "c"]]
[[1, "d"], [2, "e"], [3, "f"]]

That's a very good question.

It probably would be best to call to_enum instead of each. Calling next|rewind on an enumerator should really only affect other calls to next. With to_enum, we'll get the same result every time.

Even if to_enum is called, IO instances never be rewound, but I guess IO instances need not be rewound.

It might be too late to introduce a new API for 2.0.0, so how about to move it to next minor?

I understand. On the other hand, the API for Lazy.new was never decided upon and really should be finalized before 2.0.0!

If we opt for the Yielder#memo way (and agree on the name), maybe we can convince mame to accept such a trivial change. In that case, the biggest "change" is the explicit guarantee of a different yielder per iteration. It's already the case (also for JRuby and rubinius), but AFAIK it's never been official.

`The explicit guarantee of a different yielder per iteration' sounds acceptable for me, but I'm not sure whether Yielder#memo is a good name.

With the modified yielding way, it would be nice to include it in Lazy.new's api in this version, especially since it is still being finalized.

At the very least, a note in the documentation about handling state would be needed for 2.0.0.

I believe Matz should decide it.

#10 Updated by Shugo Maeda over 1 year ago

  • Status changed from Open to Assigned

#11 Updated by Marc-Andre Lafortune about 1 year ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r38920.
Marc-Andre, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • enumerator.c: Fix state handling for Lazy#take
    [bug #7696]

  • test/ruby/testlazyenumerator.rb: test for above

#12 Updated by Marc-Andre Lafortune about 1 year ago

  • Status changed from Closed to Open

Bugs in MRI are fixed, but keeping open so Matz can decide how users should handle state.

#13 Updated by Yusuke Endoh about 1 year ago

  • Subject changed from Lazy enumerators with state can't be rewound to Lazy enumerators with state can&#x27;t be rewound
  • Status changed from Open to Assigned
  • Target version changed from 2.0.0 to next minor

#14 Updated by Yui NARUSE 9 months ago

  • Target version changed from next minor to 2.1.0

#15 Updated by Marc-Andre Lafortune 8 months ago

  • Status changed from Assigned to Closed

I created a new feature request #8840, so I'm closing this.

Also available in: Atom PDF