Project

General

Profile

Actions

Feature #3608

open

Enhancing Pathname#each_child to be lazy

Added by taw (Tomasz Wegrzanowski) over 13 years ago. Updated about 6 years ago.

Status:
Assigned
Target version:
-
[ruby-core:31469]

Description

=begin
Right now it lists entire directory, then yields
every element, that is x.each_child(&b) means x.children.each(&b).

This is too slow for directories mounted over networked file systems etc.,
and there is currently no way to get lazy behaviour, other than leaving
convenient #each_child/#children API and moving to lower level.

With this patch:

  • #children is eager like before, no change here
  • #each_child becomes lazy
  • #each_child without block returns lazy enumerator,
    so it can be used like this dir.each_child.find(&:symlink?)
    without losing laziness.

Patch is against trunk. pathname.rb was in lib/ not ext/pathname/lib/
before, but it works either way.

The part to return enumerator when called without a block wouldn't
work in 1.8. If backport is desired, that line would need to be thrown
away, and #children would need to build result array instead
of calling each_child(with_directory).to_a - this would be straightforward.
=end


Files

lazy_each_child.diff (1.19 KB) lazy_each_child.diff taw (Tomasz Wegrzanowski), 07/24/2010 10:27 AM
lazy_path_test.rb (1.06 KB) lazy_path_test.rb taw (Tomasz Wegrzanowski), 08/02/2010 05:43 AM
Actions #1

Updated by akr (Akira Tanaka) over 13 years ago

2010/7/24 Tomasz Wegrzanowski :

Feature #3608: Enhancing Pathname#each_child to be lazy
http://redmine.ruby-lang.org/issues/show/3608

Right now it lists entire directory, then yields
every element, that is x.each_child(&b) means x.children.each(&b).

This is too slow for directories mounted over networked file systems etc.,
and there is currently no way to get lazy behaviour, other than leaving
convenient #each_child/#children API and moving to lower level.

A problem of the lazy behaviour that is it opens a file descriptor when
the block is called.

If the lazy each_child is used for recursively, the limit of number of
descriptors limits the recursive levels.

I'm not sure which problem is important.

--
Tanaka Akira

Actions #2

Updated by taw (Tomasz Wegrzanowski) over 13 years ago

A problem of the lazy behaviour that is it opens a file descriptor when
the block is called.

If the lazy each_child is used for recursively, the limit of number of
descriptors limits the recursive levels.

I'm not sure which problem is important.

This won't normally be a problem as directory
handler isn't opened on to_enum, only once
iteration actually begins.

Unless you put these enumerators on different fibres or
something like that, your maximum number of open
files will be limited by your file system depth
and also by stack depth, whichever is lower.

You'd need to have 100s of sub directories
nested in one another like 1/2/3/4/5/.../100,
and have all these nested on ruby stack.

Take a look at attached test code
(also at http://pastebin.org/439336 )

Even with ulimit -n as low as 16 and
a lot of directories it works perfectly
(tested on 00/a - 99/z and on ruby source tree).

Test 1 shows that calling map(&:each_child) won't open
directory handlers just yet.

Test 2 shows that each_child works all right with recursion.

Test 3 just verifies that ulimit -n is applied.

Actions #3

Updated by akr (Akira Tanaka) over 12 years ago

  • Project changed from Ruby to Ruby master
  • Assignee set to akr (Akira Tanaka)
Actions #4

Updated by shyouhei (Shyouhei Urabe) about 12 years ago

  • Status changed from Open to Assigned

Updated by mame (Yusuke Endoh) over 11 years ago

  • Description updated (diff)
  • Target version set to 2.6
Actions #6

Updated by naruse (Yui NARUSE) about 6 years ago

  • Target version deleted (2.6)
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0