Feature #3608

Enhancing Pathname#each_child to be lazy

Added by Tomasz Wegrzanowski almost 4 years ago. Updated over 1 year ago.

[ruby-core:31469]
Status:Assigned
Priority:Normal
Assignee:Akira Tanaka
Category:-
Target version:next minor

Description

=begin
Right now it lists entire directory, then yields
every element, that is x.each_child(&b) means x.children.each(&b).

This is too slow for directories mounted over networked file systems etc.,
and there is currently no way to get lazy behaviour, other than leaving
convenient #each_child/#children API and moving to lower level.

With this patch:
* #children is eager like before, no change here
* #eachchild becomes lazy
* #each
child without block returns lazy enumerator,
so it can be used like this dir.each_child.find(&:symlink?)
without losing laziness.

Patch is against trunk. pathname.rb was in lib/ not ext/pathname/lib/
before, but it works either way.

The part to return enumerator when called without a block wouldn't
work in 1.8. If backport is desired, that line would need to be thrown
away, and #children would need to build result array instead
of calling eachchild(withdirectory).to_a - this would be straightforward.
=end

lazy_each_child.diff Magnifier (1.19 KB) Tomasz Wegrzanowski, 07/24/2010 10:27 AM

lazy_path_test.rb Magnifier (1.06 KB) Tomasz Wegrzanowski, 08/02/2010 05:43 AM

History

#1 Updated by Akira Tanaka over 3 years ago

=begin
2010/7/24 Tomasz Wegrzanowski redmine@ruby-lang.org:

Feature #3608: Enhancing Pathname#each_child to be lazy
http://redmine.ruby-lang.org/issues/show/3608

Right now it lists entire directory, then yields
every element, that is x.each_child(&b) means x.children.each(&b).

This is too slow for directories mounted over networked file systems etc.,
and there is currently no way to get lazy behaviour, other than leaving
convenient #each_child/#children API and moving to lower level.

A problem of the lazy behaviour that is it opens a file descriptor when
the block is called.

If the lazy each_child is used for recursively, the limit of number of
descriptors limits the recursive levels.

I'm not sure which problem is important.
--
Tanaka Akira

=end

#2 Updated by Tomasz Wegrzanowski over 3 years ago

=begin

A problem of the lazy behaviour that is it opens a file descriptor when
the block is called.

If the lazy each_child is used for recursively, the limit of number of
descriptors limits the recursive levels.

I'm not sure which problem is important.

This won't normally be a problem as directory
handler isn't opened on to_enum, only once
iteration actually begins.

Unless you put these enumerators on different fibres or
something like that, your maximum number of open
files will be limited by your file system depth
and also by stack depth, whichever is lower.

You'd need to have 100s of sub directories
nested in one another like 1/2/3/4/5/.../100,
and have all these nested on ruby stack.

Take a look at attached test code
(also at http://pastebin.org/439336 )

Even with ulimit -n as low as 16 and
a lot of directories it works perfectly
(tested on 00/a - 99/z and on ruby source tree).

Test 1 shows that calling map(&:each_child) won't open
directory handlers just yet.

Test 2 shows that each_child works all right with recursion.

Test 3 just verifies that ulimit -n is applied.
=end

#3 Updated by Akira Tanaka almost 3 years ago

  • Project changed from Ruby to ruby-trunk
  • Assignee set to Akira Tanaka

#4 Updated by Shyouhei Urabe about 2 years ago

  • Status changed from Open to Assigned

#5 Updated by Yusuke Endoh over 1 year ago

  • Description updated (diff)
  • Target version set to next minor

Also available in: Atom PDF