Feature #6183

Enumerator::Lazy performance issue

Added by Innokenty Mikhailov about 2 years ago. Updated 10 months ago.

[ruby-core:43529]
Status:Assigned
Priority:Low
Assignee:Nobuyoshi Nakada
Category:-
Target version:-

Description

I benchmarked Enumerator::Lazy and that's what I got:
user system total real
Lazy: 0.690000 0.010000 0.700000 ( 0.733160)
Normal: 0.160000 0.010000 0.170000 ( 0.186695)

It seems like even with 4 chain links and 3000 elements in initial array, Lazy enumerator is almost 4(!) times slower than the normal case.

Instead of performance benefit we've got 4 times performance drawback.

See test file attached.

lazy_test.rb Magnifier (428 Bytes) Innokenty Mikhailov, 03/21/2012 10:24 PM

bench.rb Magnifier - benchmark used for measuring (389 Bytes) Innokenty Mikhailov, 04/03/2012 05:43 PM

new_bench.rb Magnifier (957 Bytes) Innokenty Mikhailov, 05/16/2012 05:15 PM

16.05.2013.diff Magnifier (23.8 KB) Innokenty Mikhailov, 05/17/2013 05:48 AM

26_07_2013.diff Magnifier (24.6 KB) Innokenty Mikhailov, 06/26/2013 07:20 PM

26_06_2013_2.diff Magnifier (24.4 KB) Innokenty Mikhailov, 06/27/2013 05:24 AM


Related issues

Duplicated by ruby-trunk - Bug #6250: Enumerator::Lazy performance increased Rejected 04/03/2012

History

#1 Updated by Thomas Sawyer about 2 years ago

Granted that seems like a bit too much overhead, but the advantage to lazy is not in a 1-to-1 comparison with non-lazy. Try removing the .each call at the end and adding .take(10) instead.

#2 Updated by Yusuke Endoh about 2 years ago

  • Status changed from Open to Rejected

Hello,

Enumerator::Lazy is not a silver bullet; it removes the overhead for
creating an intermediate array, but brings the drawback for calling
a block. Unfortunately, the latter is much bigger than the former.
Thus, in general, Lazy does bring performance drawback.

The worth of Enumerator::Lazy is to extract first some elements from
big array, especially, infinite sequence. For example:

Prime.lazy.select {|x| x % 4 == 3 }.take(10).to_a

The code becomes much complex without Lazy:

a = []
Prime.each do |x|
next if x % 4 != 3
a << x
break if a.size == 10
end

Anyway, this is not a bug. If you have any concrete idea to "fix"
this issue, please reopen the ticket.

Thank you,

Yusuke Endoh mame@tsg.ne.jp

#3 Updated by Innokenty Mikhailov about 2 years ago

  • File bench.rbMagnifier added
  • File lazy_enumerator.diff added
  • File code added

=begin
Finally come up with a concrete idea how to "fix" (based on my first PR https://github.com/ruby/ruby/pull/100).

The idea is to keep all blocks (passed with lazy methods like map or select) as (({Proc})) objects inside the enumerator and apply them one by one when value requested ((({toa})), (({next})), etc)
This strategy avoids enumerator chaining on each lazy method call and eliminates fair amount of 'calling the block' with (({rb
block_call})) operations.
Here's benchmark results:

2.0.0| ~/projects/ruby(trunk)$ rvm ruby-head
2.0.0| ~/projects/ruby(trunk)$ ruby bench.rb
user system total real
Lazy enumerator 1.460000 0.000000 1.460000 ( 1.465739)
Simple array 0.420000 0.000000 0.420000 ( 0.421446)
0.287671 NaN NaN ( 0.287531)
2.0.0| ~/projects/ruby(trunk)$ rvm system
2.0.0| ~/projects/ruby(trunk)$ ruby bench.rb
user system total real
Lazy enumerator 0.770000 0.000000 0.770000 ( 0.764750)
Simple array 0.370000 0.000000 0.370000 ( 0.382653)
0.480519 NaN NaN ( 0.500364)

ruby-head is current trunk compiled, and system ruby - is the same trunk but with my patch applied.
Last row in results is ratio between 'Simple array' time and 'Lazy Enumerator' time.
So, as you can see, with this patch lazy enumerator becomes almost 2 times faster.

It's a 'proof of concept' patch (only map and select added) - let me know if it makes sense.
I believe that using this approach and with your help lazy enumerator performance can be improved significantly.

I'm attaching the diff along with the main part of the source code just in case it's hard to follow the diff.

=end

#4 Updated by Yukihiro Matsumoto about 2 years ago

  • Status changed from Rejected to Assigned
  • Assignee set to Nobuyoshi Nakada
  • Priority changed from Normal to Low

Nobu, could you review the patch? And if you don't see any problem, check it in?

Matz.

#5 Updated by Yusuke Endoh about 2 years ago

Hello,

matz (Yukihiro Matsumoto) wrote:

Nobu, could you review the patch? And if you don't see any problem, check it in?

I didn't read the patch, but I found a weird behavior.
I think the approach is interesting, though.

$ cat t.rb
a = (1..10).lazy
p a.map {|x| x + 1 }.toa
p a.map {|x| x + 1 }.to
a
p a.map {|x| x + 1 }.to_a

# without the patch
$ ./miniruby.org t.rb
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

# with the patch applied
$ ./miniruby.new t.rb
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
[4, 5, 6, 7, 8, 9, 10, 11, 12, 13]

Yusuke Endoh mame@tsg.ne.jp

#6 Updated by Innokenty Mikhailov about 2 years ago

That's because each time you mapping lazy enumerator another proc objected added to procs array, so in your example you effectively mapping 3 times.
I should return new enumerator object rather than modifying existing one while calling lazy map or select (or whatever lazy method).

A lot of work should be done to finish this patch: all other lazy methods should be added.
Also I'm getting an error while working with big arrays (> 104).
But if you are all positive about the approach I'll happily proceed and do my best to make this fully work.

#7 Updated by Thomas Sawyer about 2 years ago

In response to #6250, wouldn't the simplest implementation be to return a new Lazy Enumerator for each lazy call?

See https://github.com/rubyworks/facets/blob/master/lib/core/facets/denumerable.rb for the basic idea.

Maybe storing each block internally in an Enumerator might be a tad more efficient, but I would think the added complexity to Enumerator state would make it not worth it.

#8 Updated by Innokenty Mikhailov about 2 years ago

  • File new_lazy_enumerator.diff added

Here's the new patch attached - problem, mentioned by Yusuke Endoh, fixed - now I'm creating a new copy of enumerator on each lazy method call.
Also I fixed an error for big arrays - forgot to gc_mark procs array.

Thomas, that's the point - current implementation is very simple and hence very inefficient.
It mimics ruby implementations but as soon as we are in the C sources already - we can come up with something more efficient.

#9 Updated by Yusuke Endoh about 2 years ago

Hello,

2012/4/4 gregolsen (Innokenty Mikhailov) anotheroneman@yahoo.com:

Here's the new patch attached - problem, mentioned by Yusuke Endoh, fixed - now I'm creating a new copy of enumerator on each lazy method call.

Okay, the next problem :-)

(1..10).lazy.select {|x| false }.map {|x| p x }.to_a

should print nothing, but it actually prints 1, 2, ..., 10
with your patch applied. It can be fixed easily, though.

I glanced your patch. It will degrade functional modularity
in enumerator.c. Currently, it not so big problem because
it only implements #map and #select. But I guess implementing
other methods, especially, #cycle and #zip, will make some
functions (processelement and lazyiterator_block) complex
and hard to maintain.

Thus, until you create the final patch, it is hard to say
whether we can import your patch or not.

--
Yusuke Endoh mame@tsg.ne.jp

#10 Updated by Nobuyoshi Nakada about 2 years ago

Hi,

(12/04/04 20:59), Yusuke Endoh wrote:

I glanced your patch. It will degrade functional modularity
in enumerator.c. Currently, it not so big problem because
it only implements #map and #select. But I guess implementing
other methods, especially, #cycle and #zip, will make some
functions (processelement and lazyiterator_block) complex
and hard to maintain.

Agree.

Naturally, this approach which chains lazy enumerator processes
directly should be faster than current one. So I want to see this
being merged in an extensible way.

Thus, until you create the final patch, it is hard to say
whether we can import your patch or not.

And, do not comment out existing code with //. It unnecessarily
increases noise in the patch.

--
Nobu Nakada

#11 Updated by Innokenty Mikhailov about 2 years ago

  • File lazy_enumerator_hybrid.diff added

(Yusuke Endoh) wrote:

(1..10).lazy.select {|x| false }.map {|x| p x }.to_a
should print nothing, but it actually prints 1, 2, ..., 10

Fixed, thanks.

But I guess implementing
other methods, especially, #cycle and #zip, will make some
functions (processelement and lazyiterator_block) complex
and hard to maintain.

Agree, those methods (especially #cycle) will be hard to implement in terms of procs chaining approach.

(Nobu Nakada) wrote:

So I want to see this being merged in an extensible way.

I came up with a new hybrid patch.
It uses procs chaining to handle lazy #map and #select, and current enumerator chaining approach for other methods.
I believe this is an extensible way. We can move forward step by step and we can stop any time.
If those tricky #zip and #cycle methods optimization won't worth the code complexity added - we can leave it as it is: based on enumerator chaining.
See new lazyenumeratorhybrid.diff (tests are green except test_inspect).

Please, let me know you thoughts about this.

#12 Updated by Anonymous about 2 years ago

It's a 'proof of concept' patch (only map and select added) - let me know if it makes sense.
I believe that using this approach and with your help lazy enumerator performance can be improved significantly.

That would be awesome.
-r

#13 Updated by Innokenty Mikhailov almost 2 years ago

=begin
Here's an update.
All methods except (({Lazy#cycle})), (({Lazy#zip})) and (({Lazy#flat_map})) are optimized.
Benchmark results shown below.

I was working in (()).
Cycle, zip and flat_map are really tough to convert to procs chaining, however they are working fine in this hybrid solution and can be leave as it is.

All tests pass except testinspect.
If this implementation is acceptable then the next step will be to fix inspect and add more tests to cover different types of chaining:
like chaining of enumerator chained (cycle, zip, flat
map) methods with procs chained optimized methods.

Please, let me know your thoughts.

Thanks.

======================Lazy#map
user system total real
Trunk 1.420000 0.010000 1.430000 ( 1.461111)
Optimized 0.830000 0.000000 0.830000 ( 0.833560)
======================Lazy#select
user system total real
Trunk 1.270000 0.010000 1.280000 ( 1.274074)
Optimized 0.780000 0.000000 0.780000 ( 0.785303)
======================Lazy#grep
user system total real
Trunk 1.540000 0.010000 1.550000 ( 1.552651)
Optimized 0.780000 0.000000 0.780000 ( 0.783526)
======================Lazy#takewhile
user system total real
Trunk 1.260000 0.000000 1.260000 ( 1.257465)
Optimized 0.780000 0.010000 0.790000 ( 0.784682)
======================Lazy#drop
while
user system total real
Trunk 1.030000 0.000000 1.030000 ( 1.030987)
Optimized 0.400000 0.000000 0.400000 ( 0.394112)
======================Lazy#reject
user system total real
Trunk 1.240000 0.000000 1.240000 ( 1.243565)
Optimized 0.780000 0.000000 0.780000 ( 0.781825)
======================Lazy#drop
user system total real
Trunk 4.150000 0.000000 4.150000 ( 4.159758)
Optimized 1.590000 0.000000 1.590000 ( 1.598785)
======================Lazy#take
user system total real
Trunk 0.520000 0.000000 0.520000 ( 0.517902)
Optimized 0.010000 0.000000 0.010000 ( 0.003274)
=end

#14 Updated by Innokenty Mikhailov almost 2 years ago

  • File 2012-07-14.diff added

Here's an update:
Finally I've fixed the last test for Enumerator::Lazy#inspect - now it supports procs chaining too.
Nobuyoshi Nakada, please, see diff attached and let me know your thoughts about this.
Thanks.

#15 Updated by Yusuke Endoh over 1 year ago

Your patch makes the following code stuck.

p [1,2,3].toenum.lazy.cycle.take(10).toa

It should print [1, 2, 3, 1, 2, 3, 1, 2, 3, 1].

Yusuke Endoh mame@tsg.ne.jp

#16 Updated by Innokenty Mikhailov over 1 year ago

  • File 31_july.diff added

Yusuke Endoh, thanks a lot for pointing out on this issue.
Fixed, please see new diff attached.

#17 Updated by Zachary Scott over 1 year ago

  • Assignee changed from Nobuyoshi Nakada to Yusuke Endoh

Yusuke-san seems to be last to review this, what is your opinion?

#18 Updated by Innokenty Mikhailov over 1 year ago

  • File December_3.diff added

I've merged the patch with latest trunk (see latest diff attached), specifically with Enumerator lazy size feature.
Also I've removed the ugly case switch: now proc entry stores pointer to a function that is executed when iterating over the elements.
So now it even resembles the current implementation a bit.

Please, let me know your thoughts about all this.
Thanks in advance.

#19 Updated by Nobuyoshi Nakada over 1 year ago

  • Tracker changed from Bug to Feature

#20 Updated by Yusuke Endoh over 1 year ago

Sorry, maybe I have no time to review your patch. Anyone interested?

Yusuke Endoh mame@tsg.ne.jp

#21 Updated by Zachary Scott 12 months ago

  • Assignee changed from Yusuke Endoh to Zachary Scott

This patch is big, but I hope to review it soon as I will be working on Lazy soon

#22 Updated by Innokenty Mikhailov 12 months ago

Zachary Scott, thanks for your interest. I'm afraid it doesn't merge into trunk cleanly anymore after introducing lazy #size. I'll try to fix it and let you know when ready.

#23 Updated by Zachary Scott 12 months ago

@gregolsen Sure, np!

#24 Updated by Innokenty Mikhailov 11 months ago

Finally managed to merge. Please see latest diff attached.

#25 Updated by Zachary Scott 11 months ago

@gregolsen Thank you! I will try to review this soon, before you have to rebase again ;)

Mind if I delete the old patches? It might confuse someone looking at the ticket.

#26 Updated by Innokenty Mikhailov 11 months ago

Sure, feel free to clean old files. Thanks!

#27 Updated by Zachary Scott 11 months ago

  • File deleted (lazy_enumerator.diff)

#28 Updated by Zachary Scott 11 months ago

  • File deleted (code)

#29 Updated by Zachary Scott 11 months ago

  • File deleted (new_lazy_enumerator.diff)

#30 Updated by Zachary Scott 11 months ago

  • File deleted (lazy_enumerator_hybrid.diff)

#31 Updated by Zachary Scott 11 months ago

  • File deleted (16_may.diff)

#32 Updated by Zachary Scott 11 months ago

  • File deleted (2012-07-14.diff)

#33 Updated by Zachary Scott 11 months ago

  • File deleted (31_july.diff)

#34 Updated by Zachary Scott 11 months ago

  • File deleted (December_3.diff)

#35 Updated by Zachary Scott 11 months ago

@gregolsen Looks like some of this was applied, sorry I haven't had a chance to check it yet.

Could you check?

It may need a rebase, the patch didn't apply cleanly and some of the functions already exist (like append_method())

#36 Updated by Innokenty Mikhailov 11 months ago

Indeed appendmethod was exctracted by nobu 2 weeks ago as a refactoring of enumeratorinspect. But that's it, nothing was merged yet. I'm not sure I'll be able to rebase patch in next few weeks - got only android tablet with me. I'll let you know when ready. Thanks a lot for your interest.

#37 Updated by Innokenty Mikhailov 10 months ago

Rebased towards latest trunk.

#38 Updated by Nobuyoshi Nakada 10 months ago

When will generator::hybrid become other than Qfalse?

#39 Updated by Innokenty Mikhailov 10 months ago

@nobu thanks for pointing!
Indeed hybrid flag is already obsolete since I'm checking for proc_entry presence.
If no proc entries present - than it's effectively a hybrid case.
Fixed patch attached.

PS
I'm using this github branch - https://github.com/gregolsen/ruby/tree/enumerator_lazy_optimization
Should I still attach diffs here?

#40 Updated by Nobuyoshi Nakada 10 months ago

Your patch seems
* reverting inspect_enumerator()
* containing dead code

I pushed a branch which split and merged a few functions, and used function pointer tables.
https://github.com/nobu/ruby/tree/lazy_enum

#41 Updated by Innokenty Mikhailov 10 months ago

Indeed I messed up the patch with numerous rebases.
Thanks a lot for your refactoring branch - everything looks great.

So what our next steps?

#42 Updated by Zachary Scott 10 months ago

  • Assignee changed from Zachary Scott to Nobuyoshi Nakada

Also available in: Atom PDF