Bug #7715

Lazy enumerators should want to stay lazy.

Added by Marc-Andre Lafortune over 1 year ago. Updated about 1 year ago.

[ruby-core:51510]
Status:Closed
Priority:Normal
Assignee:Marc-Andre Lafortune
Category:core
Target version:2.0.0
ruby -v:r38825 Backport:

Description

I'm just waking up to the fact that many methods turn a lazy enumerator in a non-lazy one.

Here's an example from Benoit Daloze in :

lines = File.foreach('averylargefile').lazy
.select {|line| line.length < 10 }
.map {|line| line.chomp!; line }
.each
slice(3)
.map {|lines| lines.join(';').downcase }
.take_while {|line| line.length > 20 }

That code will produce the right result but will read the whole file, which is not what is desired

Indeed, each_slice currently does not return a lazy enumerator :-(

To make the above code as intended, one must call .lazy right after the each_slice(3). I feel this is dangerous and counter intuitive.

Is there a valid reason for this behavior? Otherwise, I would like us to consider returning a lazy enumerator for the following methods:
(when called without a block)
eachwithobject
eachwithindex
eachslice
each
entry
eachcons
(always)
chunk
slice
before

The arguments are:
* fail early (much easier to realize one needs to call a final force, to_a or each than realizing that a lazy enumerator chain isn't actually lazy)
* easier to remember (every method normally returning an enumerator returns a lazy enumerator). basically this makes Lazy covariant
* I'd expect that if you get lazy at some point, you typically want to remain lazy until the very end

Associated revisions

Revision 39058
Added by Marc-Andre Lafortune about 1 year ago

  • enumerator.c: Use toenum for Enumerable methods returning Enumerators.
    This makes Lazy#cycle no longer needed, so it was removed.
    Make Enumerator#chunk and slice
    before return lazy Enumerators.
    [Bug #7715]

  • internal.h: Remove ref to rbenumcycle_size; no longer needed

  • enum.c: Make enumcyclesize static.

  • test/ruby/testlazyenumerator.rb: Test for above

History

#1 Updated by Koichi Sasada about 1 year ago

  • Target version set to 2.0.0

Who's ball?

#2 Updated by Marc-Andre Lafortune about 1 year ago

  • Status changed from Open to Assigned
  • Assignee set to Marc-Andre Lafortune

I can do it, unless there are objections.

#3 Updated by Shugo Maeda about 1 year ago

marcandre (Marc-Andre Lafortune) wrote:

I can do it, unless there are objections.

Your proposal sounds reasonable.
I guess these methods were forgotten to change when lazy was implemented.

#4 Updated by Yutaka HARA about 1 year ago

shugo (Shugo Maeda) wrote:

I guess these methods were forgotten to change when lazy was implemented.

That's right :-( I thought these methods does not need to be overriden
because they return Enumerator, but they should return Enumerator::Lazy for such cases.

#5 Updated by Marc-Andre Lafortune about 1 year ago

I believe I have found the key to resolve this issue, Lazy.new issue [#7248] and others.

We simply need to specialize to_enum/enum_for for lazy enumerators.

In the same way, RETURNSIZEDENUMERATOR should return a lazy enumerator, when called for a lazy enumerator.

With this in mind:
* Lazy.eachwithobject, etc..., will correctly return lazy enumerators [#7715] without being overriden.
* Lazy#cycle can be removed. It no longer needs to be overriden.
* Lazy.new really has no need to accept (method, *args) and can be modified as proposed in [#7248]
* Any user method of Enumerable that returns an Enumerator using to_enum will conserve laziness.

None of this could create a regression, since Lazy & RETURNSIZEDENUMERATOR are both new to 2.0.0

I'm working on a patch...

#6 Updated by Marc-Andre Lafortune about 1 year ago

Patch almost done, which also fixes #7248

https://github.com/marcandre/ruby/compare/marcandre:master...marcandre:lazy

Still missing:
- tweak inspect
- fix .lazy.size
- couple more tests

#7 Updated by Marc-Andre Lafortune about 1 year ago

Patch updated, rdoc improved too.

Makes for a clean API for Lazy#new also, and there's even less code (~20 loc).

I'll review the patch one last time before committing it (in about 5 hours).

#8 Updated by Marc-Andre Lafortune about 1 year ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r39058.
Marc-Andre, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • enumerator.c: Use toenum for Enumerable methods returning Enumerators.
    This makes Lazy#cycle no longer needed, so it was removed.
    Make Enumerator#chunk and slice
    before return lazy Enumerators.
    [Bug #7715]

  • internal.h: Remove ref to rbenumcycle_size; no longer needed

  • enum.c: Make enumcyclesize static.

  • test/ruby/testlazyenumerator.rb: Test for above

Also available in: Atom PDF