Project

General

Profile

Actions

Feature #21800

open

`Dir.foreach` and `Dir.each_child` to optionally yield `File::Stat` object alongside the children name

Feature #21800: `Dir.foreach` and `Dir.each_child` to optionally yield `File::Stat` object alongside the children name

Added by byroot (Jean Boussier) 1 day ago. Updated 1 day ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:124342]

Description

When listing a directory, it's very common to need to know the type of each children, generally because you want to scan recursively.
The naive way to do this is to call stat(2) for each children, but this is quite costly.

This use case is common enough that readdir on most modern platforms do expose struct dirent.d_type, which allows to know the type of the child without an extra syscall:

From the scandir manpage:

d_type: This field contains a value indicating the file type,
making it possible to avoid the expense of calling lstat(2)

I wrote a quick prototype, and relying on dirent.d_type instead of stat(2) allows to recursively scan Ruby's repository twice as fast on my machine: https://github.com/ruby/ruby/pull/15667

Given that recursively scanning directories is a common task across many popular ruby tools (zeitwerk, rubocop, etc), I think it would be very valuable to provide this more efficient interface.

In addition, @nobu (Nobuyoshi Nakada) noticed my prototype, and implemented a nicer version of it, where a File::Stat is yielded: https://github.com/ruby/ruby/commit/9acf67057b9bc6f855b2c37e41c1a2f91eae643a

In that case the File::Stat is lazy, it's only if you access something other than file type, that the actual stat(2) call is emitted.
I think this API is both more efficient and more convenient.

Proposed API

Dir.foreach(path) { |name| }
Dir.foreach(path) { |name, stat| }
Dir.each_child(path) { |name| }
Dir.each_child(path) { |name, stat| }
Dir.new(path).each_child { |name| }
Dir.new(path).each_child { |name, stat| }
Dir.new(path).each { |name| }
Dir.new(path).each { |name, stat| }

Also important to note, the File::Stat is expected to be equivalent to a lstat(2) call, as to be able to chose to follow symlinks or not.

Basic use case:

def count_ruby_files(root)
  count = 0
  queue = [root]
  while dir = queue.pop
    Dir.each_child(dir) do |name, stat|
      next if name.start_with?(".")

      if stat.directory?
        queue << File.join(dir, name)
      elsif stat.file?
        count += 1 if name.end_with?(".rb")
      end
    end
  end
  count
end

Related issues 1 (0 open1 closed)

Related to Ruby - Feature #17001: [Feature] Dir.scan to yield dirent for efficient and composable recursive directory scaningClosedActions

Updated by byroot (Jean Boussier) 1 day ago Actions #1

  • Related to Feature #17001: [Feature] Dir.scan to yield dirent for efficient and composable recursive directory scaning added

Updated by byroot (Jean Boussier) 1 day ago Actions #2

  • Description updated (diff)

Updated by byroot (Jean Boussier) 1 day ago Actions #3

  • Description updated (diff)

Updated by byroot (Jean Boussier) 1 day ago Actions #4

  • Description updated (diff)
Actions

Also available in: PDF Atom