Feature #21800
Updated by byroot (Jean Boussier) 1 day ago
When listing a directory, it's very common to need to know the type of each children, generally because you want to scan recursively.
The naive way to do this is to call `stat(2)` for each children, but this is quite costly.
This use case is common enough that `readdir` on most modern platforms do expose `struct dirent.d_type`, which allows to know the type of the child without an extra syscall:
From the `scandir` manpage:
> d_type: This field contains a value indicating the file type,
making it possible to avoid the expense of calling lstat(2)
I wrote a quick prototype, and relying on `dirent.d_type` instead of `stat(2)` allows to recursively scan Ruby's repository twice as fast on my machine: https://github.com/ruby/ruby/pull/15667
Given that recursively scanning directories is a common task across many popular ruby tools (`zeitwerk`, `rubocop`, etc), I think it would be very valuable to provide this more efficient interface.
In addition, @nobu noticed my prototype, and implemented a nicer version of it, where a `File::Stat` is yielded: https://github.com/ruby/ruby/commit/9acf67057b9bc6f855b2c37e41c1a2f91eae643a
In that case the `File::Stat` is lazy, it's only if you access something other than file type, that the actual `stat(2)` call is emitted.
I think this API is both more efficient and more convenient.
### Proposed API
```ruby
Dir.foreach(path) { |name| }
Dir.foreach(path) { |name, stat| }
Dir.each_child(path) { |name| }
Dir.each_child(path) { |name, stat| |name| }
Dir.new(path).each_child { |name| |name, stat| }
Dir.new(path).each_child { |name, stat| }
Dir.new(path).each { |name| |name, stat| }
Dir.new(path).each { |name, stat| }
```
Also important to note, the `File::Stat` is expected to be equivalent to a `lstat(2)` call, as to be able to chose to follow symlinks or not.
Basic use case:
```ruby
def count_ruby_files(root)
count = 0
queue = [root]
while dir = queue.pop
Dir.(dir) do |name, stat|
next if name.start_with?(".")
if stat.directory?
queue << File.join(dir, name)
elsif stat.file?
count += 1 if name.end_with?(".rb")
end
end
end
count
end
```