Project

General

Profile

Actions

Feature #6183

closed

Enumerator::Lazy performance issue

Added by gregolsen (Innokenty Mikhailov) over 12 years ago. Updated about 8 years ago.

Status:
Closed
Target version:
-
[ruby-core:43529]

Description

I benchmarked Enumerator::Lazy and that's what I got:
user system total real
Lazy: 0.690000 0.010000 0.700000 ( 0.733160)
Normal: 0.160000 0.010000 0.170000 ( 0.186695)

It seems like even with 4 chain links and 3000 elements in initial array, Lazy enumerator is almost 4(!) times slower than the normal case.

Instead of performance benefit we've got 4 times performance drawback.

See test file attached.


Files

lazy_test.rb (428 Bytes) lazy_test.rb gregolsen (Innokenty Mikhailov), 03/21/2012 10:24 PM
bench.rb (389 Bytes) bench.rb benchmark used for measuring gregolsen (Innokenty Mikhailov), 04/03/2012 05:43 PM
new_bench.rb (957 Bytes) new_bench.rb gregolsen (Innokenty Mikhailov), 05/16/2012 05:15 PM
16.05.2013.diff (23.8 KB) 16.05.2013.diff gregolsen (Innokenty Mikhailov), 05/17/2013 05:48 AM
26_07_2013.diff (24.6 KB) 26_07_2013.diff gregolsen (Innokenty Mikhailov), 06/26/2013 07:20 PM
26_06_2013_2.diff (24.4 KB) 26_06_2013_2.diff gregolsen (Innokenty Mikhailov), 06/27/2013 05:24 AM

Related issues 1 (0 open1 closed)

Has duplicate Ruby master - Bug #6250: Enumerator::Lazy performance increasedRejected04/03/2012Actions

Updated by trans (Thomas Sawyer) over 12 years ago

Granted that seems like a bit too much overhead, but the advantage to lazy is not in a 1-to-1 comparison with non-lazy. Try removing the .each call at the end and adding .take(10) instead.

Updated by mame (Yusuke Endoh) over 12 years ago

  • Status changed from Open to Rejected

Hello,

Enumerator::Lazy is not a silver bullet; it removes the overhead for
creating an intermediate array, but brings the drawback for calling
a block. Unfortunately, the latter is much bigger than the former.
Thus, in general, Lazy does bring performance drawback.

The worth of Enumerator::Lazy is to extract first some elements from
big array, especially, infinite sequence. For example:

Prime.lazy.select {|x| x % 4 == 3 }.take(10).to_a

The code becomes much complex without Lazy:

a = []
Prime.each do |x|
next if x % 4 != 3
a << x
break if a.size == 10
end

Anyway, this is not a bug. If you have any concrete idea to "fix"
this issue, please reopen the ticket.

Thank you,

--
Yusuke Endoh

Updated by gregolsen (Innokenty Mikhailov) over 12 years ago

=begin
Finally come up with a concrete idea how to "fix" (based on my first PR https://github.com/ruby/ruby/pull/100).

The idea is to keep all blocks (passed with lazy methods like map or select) as (({Proc})) objects inside the enumerator and apply them one by one when value requested ((({to_a})), (({next})), etc)
This strategy avoids enumerator chaining on each lazy method call and eliminates fair amount of 'calling the block' with (({rb_block_call})) operations.
Here's benchmark results:

2.0.0| ~/projects/ruby(trunk)$ rvm ruby-head
2.0.0| ~/projects/ruby(trunk)$ ruby bench.rb
user system total real
Lazy enumerator 1.460000 0.000000 1.460000 ( 1.465739)
Simple array 0.420000 0.000000 0.420000 ( 0.421446)
0.287671 NaN NaN ( 0.287531)
2.0.0| ~/projects/ruby(trunk)$ rvm system
2.0.0| ~/projects/ruby(trunk)$ ruby bench.rb
user system total real
Lazy enumerator 0.770000 0.000000 0.770000 ( 0.764750)
Simple array 0.370000 0.000000 0.370000 ( 0.382653)
0.480519 NaN NaN ( 0.500364)

ruby-head is current trunk compiled, and system ruby - is the same trunk but with my patch applied.
Last row in results is ratio between 'Simple array' time and 'Lazy Enumerator' time.
So, as you can see, with this patch lazy enumerator becomes almost 2 times faster.

It's a 'proof of concept' patch (only map and select added) - let me know if it makes sense.
I believe that using this approach and with your help lazy enumerator performance can be improved significantly.

I'm attaching the diff along with the main part of the source code just in case it's hard to follow the diff.

=end

Updated by matz (Yukihiro Matsumoto) over 12 years ago

  • Status changed from Rejected to Assigned
  • Assignee set to nobu (Nobuyoshi Nakada)
  • Priority changed from Normal to 3

Nobu, could you review the patch? And if you don't see any problem, check it in?

Matz.

Updated by mame (Yusuke Endoh) over 12 years ago

Hello,

matz (Yukihiro Matsumoto) wrote:

Nobu, could you review the patch? And if you don't see any problem, check it in?

I didn't read the patch, but I found a weird behavior.
I think the approach is interesting, though.

$ cat t.rb
a = (1..10).lazy
p a.map {|x| x + 1 }.to_a
p a.map {|x| x + 1 }.to_a
p a.map {|x| x + 1 }.to_a

without the patch

$ ./miniruby.org t.rb
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

with the patch applied

$ ./miniruby.new t.rb
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
[4, 5, 6, 7, 8, 9, 10, 11, 12, 13]

--
Yusuke Endoh

Updated by gregolsen (Innokenty Mikhailov) over 12 years ago

That's because each time you mapping lazy enumerator another proc objected added to procs array, so in your example you effectively mapping 3 times.
I should return new enumerator object rather than modifying existing one while calling lazy map or select (or whatever lazy method).

A lot of work should be done to finish this patch: all other lazy methods should be added.
Also I'm getting an error while working with big arrays (> 10^4).
But if you are all positive about the approach I'll happily proceed and do my best to make this fully work.

Updated by trans (Thomas Sawyer) over 12 years ago

In response to #6250, wouldn't the simplest implementation be to return a new Lazy Enumerator for each lazy call?

See https://github.com/rubyworks/facets/blob/master/lib/core/facets/denumerable.rb for the basic idea.

Maybe storing each block internally in an Enumerator might be a tad more efficient, but I would think the added complexity to Enumerator state would make it not worth it.

Updated by gregolsen (Innokenty Mikhailov) over 12 years ago

  • File new_lazy_enumerator.diff added

Here's the new patch attached - problem, mentioned by Yusuke Endoh, fixed - now I'm creating a new copy of enumerator on each lazy method call.
Also I fixed an error for big arrays - forgot to gc_mark procs array.

Thomas, that's the point - current implementation is very simple and hence very inefficient.
It mimics ruby implementations but as soon as we are in the C sources already - we can come up with something more efficient.

Updated by mame (Yusuke Endoh) over 12 years ago

Hello,

2012/4/4 gregolsen (Innokenty Mikhailov) :

Here's the new patch attached - problem, mentioned by Yusuke Endoh, fixed - now I'm creating a new copy of enumerator on each lazy method call.

Okay, the next problem :-)

(1..10).lazy.select {|x| false }.map {|x| p x }.to_a

should print nothing, but it actually prints 1, 2, ..., 10
with your patch applied. It can be fixed easily, though.

I glanced your patch. It will degrade functional modularity
in enumerator.c. Currently, it not so big problem because
it only implements #map and #select. But I guess implementing
other methods, especially, #cycle and #zip, will make some
functions (process_element and lazy_iterator_block) complex
and hard to maintain.

Thus, until you create the final patch, it is hard to say
whether we can import your patch or not.

--
Yusuke Endoh

Updated by nobu (Nobuyoshi Nakada) over 12 years ago

Hi,

(12/04/04 20:59), Yusuke Endoh wrote:

I glanced your patch. It will degrade functional modularity
in enumerator.c. Currently, it not so big problem because
it only implements #map and #select. But I guess implementing
other methods, especially, #cycle and #zip, will make some
functions (process_element and lazy_iterator_block) complex
and hard to maintain.

Agree.

Naturally, this approach which chains lazy enumerator processes
directly should be faster than current one. So I want to see this
being merged in an extensible way.

Thus, until you create the final patch, it is hard to say
whether we can import your patch or not.

And, do not comment out existing code with //. It unnecessarily
increases noise in the patch.

--
Nobu Nakada

Updated by gregolsen (Innokenty Mikhailov) over 12 years ago

  • File lazy_enumerator_hybrid.diff added

(Yusuke Endoh) wrote:

(1..10).lazy.select {|x| false }.map {|x| p x }.to_a
should print nothing, but it actually prints 1, 2, ..., 10

Fixed, thanks.

But I guess implementing
other methods, especially, #cycle and #zip, will make some
functions (process_element and lazy_iterator_block) complex
and hard to maintain.

Agree, those methods (especially #cycle) will be hard to implement in terms of procs chaining approach.

(Nobu Nakada) wrote:

So I want to see this being merged in an extensible way.

I came up with a new hybrid patch.
It uses procs chaining to handle lazy #map and #select, and current enumerator chaining approach for other methods.
I believe this is an extensible way. We can move forward step by step and we can stop any time.
If those tricky #zip and #cycle methods optimization won't worth the code complexity added - we can leave it as it is: based on enumerator chaining.
See new lazy_enumerator_hybrid.diff (tests are green except test_inspect).

Please, let me know you thoughts about this.

Updated by Anonymous over 12 years ago

It's a 'proof of concept' patch (only map and select added) - let me know if it makes sense.
I believe that using this approach and with your help lazy enumerator performance can be improved significantly.

That would be awesome.
-r

Updated by gregolsen (Innokenty Mikhailov) over 12 years ago

=begin
Here's an update.
All methods except (({Lazy#cycle})), (({Lazy#zip})) and (({Lazy#flat_map})) are optimized.
Benchmark results shown below.

I was working in ((<this branch|URL:https://github.com/gregolsen/ruby/tree/lazy_enumerator_optimization>)).
Cycle, zip and flat_map are really tough to convert to procs chaining, however they are working fine in this hybrid solution and can be leave as it is.

All tests pass except test_inspect.
If this implementation is acceptable then the next step will be to fix inspect and add more tests to cover different types of chaining:
like chaining of enumerator chained (cycle, zip, flat_map) methods with procs chained optimized methods.

Please, let me know your thoughts.

Thanks.

======================Lazy#map
user system total real
Trunk 1.420000 0.010000 1.430000 ( 1.461111)
Optimized 0.830000 0.000000 0.830000 ( 0.833560)
======================Lazy#select
user system total real
Trunk 1.270000 0.010000 1.280000 ( 1.274074)
Optimized 0.780000 0.000000 0.780000 ( 0.785303)
======================Lazy#grep
user system total real
Trunk 1.540000 0.010000 1.550000 ( 1.552651)
Optimized 0.780000 0.000000 0.780000 ( 0.783526)
======================Lazy#take_while
user system total real
Trunk 1.260000 0.000000 1.260000 ( 1.257465)
Optimized 0.780000 0.010000 0.790000 ( 0.784682)
======================Lazy#drop_while
user system total real
Trunk 1.030000 0.000000 1.030000 ( 1.030987)
Optimized 0.400000 0.000000 0.400000 ( 0.394112)
======================Lazy#reject
user system total real
Trunk 1.240000 0.000000 1.240000 ( 1.243565)
Optimized 0.780000 0.000000 0.780000 ( 0.781825)
======================Lazy#drop
user system total real
Trunk 4.150000 0.000000 4.150000 ( 4.159758)
Optimized 1.590000 0.000000 1.590000 ( 1.598785)
======================Lazy#take
user system total real
Trunk 0.520000 0.000000 0.520000 ( 0.517902)
Optimized 0.010000 0.000000 0.010000 ( 0.003274)
=end

Updated by gregolsen (Innokenty Mikhailov) over 12 years ago

  • File 2012-07-14.diff added

Here's an update:
Finally I've fixed the last test for Enumerator::Lazy#inspect - now it supports procs chaining too.
Nobuyoshi Nakada, please, see diff attached and let me know your thoughts about this.
Thanks.

Updated by mame (Yusuke Endoh) over 12 years ago

Your patch makes the following code stuck.

p [1,2,3].to_enum.lazy.cycle.take(10).to_a

It should print [1, 2, 3, 1, 2, 3, 1, 2, 3, 1].

--
Yusuke Endoh

Updated by gregolsen (Innokenty Mikhailov) over 12 years ago

  • File 31_july.diff added

Yusuke Endoh, thanks a lot for pointing out on this issue.
Fixed, please see new diff attached.

Updated by zzak (zzak _) about 12 years ago

  • Assignee changed from nobu (Nobuyoshi Nakada) to mame (Yusuke Endoh)

Yusuke-san seems to be last to review this, what is your opinion?

Updated by gregolsen (Innokenty Mikhailov) almost 12 years ago

  • File December_3.diff added

I've merged the patch with latest trunk (see latest diff attached), specifically with Enumerator lazy size feature.
Also I've removed the ugly case switch: now proc entry stores pointer to a function that is executed when iterating over the elements.
So now it even resembles the current implementation a bit.

Please, let me know your thoughts about all this.
Thanks in advance.

Actions #19

Updated by nobu (Nobuyoshi Nakada) almost 12 years ago

  • Tracker changed from Bug to Feature

Updated by mame (Yusuke Endoh) almost 12 years ago

Sorry, maybe I have no time to review your patch. Anyone interested?

--
Yusuke Endoh

Updated by zzak (zzak _) over 11 years ago

  • Assignee changed from mame (Yusuke Endoh) to zzak (zzak _)

This patch is big, but I hope to review it soon as I will be working on Lazy soon

Actions #22

Updated by gregolsen (Innokenty Mikhailov) over 11 years ago

Zachary Scott, thanks for your interest. I'm afraid it doesn't merge into trunk cleanly anymore after introducing lazy #size. I'll try to fix it and let you know when ready.

Updated by gregolsen (Innokenty Mikhailov) over 11 years ago

Finally managed to merge. Please see latest diff attached.

Updated by zzak (zzak _) over 11 years ago

@gregolsen (Innokenty Mikhailov) Thank you! I will try to review this soon, before you have to rebase again ;)

Mind if I delete the old patches? It might confuse someone looking at the ticket.

Actions #26

Updated by gregolsen (Innokenty Mikhailov) over 11 years ago

Sure, feel free to clean old files. Thanks!

Actions #27

Updated by zzak (zzak _) over 11 years ago

  • File deleted (lazy_enumerator.diff)
Actions #28

Updated by zzak (zzak _) over 11 years ago

  • File deleted (code)
Actions #29

Updated by zzak (zzak _) over 11 years ago

  • File deleted (new_lazy_enumerator.diff)
Actions #30

Updated by zzak (zzak _) over 11 years ago

  • File deleted (lazy_enumerator_hybrid.diff)
Actions #31

Updated by zzak (zzak _) over 11 years ago

  • File deleted (16_may.diff)
Actions #32

Updated by zzak (zzak _) over 11 years ago

  • File deleted (2012-07-14.diff)
Actions #33

Updated by zzak (zzak _) over 11 years ago

  • File deleted (31_july.diff)
Actions #34

Updated by zzak (zzak _) over 11 years ago

  • File deleted (December_3.diff)

Updated by zzak (zzak _) over 11 years ago

@gregolsen (Innokenty Mikhailov) Looks like some of this was applied, sorry I haven't had a chance to check it yet.

Could you check?

It may need a rebase, the patch didn't apply cleanly and some of the functions already exist (like append_method())

Updated by gregolsen (Innokenty Mikhailov) over 11 years ago

Indeed append_method was exctracted by nobu 2 weeks ago as a refactoring of enumerator_inspect. But that's it, nothing was merged yet. I'm not sure I'll be able to rebase patch in next few weeks - got only android tablet with me. I'll let you know when ready. Thanks a lot for your interest.

Updated by gregolsen (Innokenty Mikhailov) over 11 years ago

Rebased towards latest trunk.

Updated by nobu (Nobuyoshi Nakada) over 11 years ago

When will generator::hybrid become other than Qfalse?

Updated by gregolsen (Innokenty Mikhailov) over 11 years ago

@nobu (Nobuyoshi Nakada) thanks for pointing!
Indeed hybrid flag is already obsolete since I'm checking for proc_entry presence.
If no proc entries present - than it's effectively a hybrid case.
Fixed patch attached.

PS
I'm using this github branch - https://github.com/gregolsen/ruby/tree/enumerator_lazy_optimization
Should I still attach diffs here?

Updated by nobu (Nobuyoshi Nakada) over 11 years ago

Your patch seems

  • reverting inspect_enumerator()
  • containing dead code

I pushed a branch which split and merged a few functions, and used function pointer tables.
https://github.com/nobu/ruby/tree/lazy_enum

Updated by gregolsen (Innokenty Mikhailov) over 11 years ago

Indeed I messed up the patch with numerous rebases.
Thanks a lot for your refactoring branch - everything looks great.

So what our next steps?

Updated by zzak (zzak _) over 11 years ago

  • Assignee changed from zzak (zzak _) to nobu (Nobuyoshi Nakada)

Updated by fbernier (François Bernier) about 8 years ago

As of today, Enumerable::Lazy is pretty much still unused because of the performance hit. Is there anything I could do to help getting this in? Is there a paper on that subject I could read about to make improvements?

Updated by nobu (Nobuyoshi Nakada) about 8 years ago

I rebased the branch (and fixed a bug in the trunk).
Seems 30~40% faster than the current implementation.

Actions #45

Updated by nobu (Nobuyoshi Nakada) about 8 years ago

  • Status changed from Assigned to Closed

Applied in changeset r56185.


enumerator.c: lazy enum improvement

  • enumerator.c (lazy_init_yielder): directly call stored functions.
    [Feature #6183]
  • enumerator.c (lazy_add_method): create lazy enumerator which
    uses lazy_init_yielder().
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0