Project

General

Profile

Feature #21800

Updated by byroot (Jean Boussier) 1 day ago

When listing a directory, it's very common to need to know the type of each children, generally because you want to scan recursively. 
 The naive way to do this is to call `stat(2)` for each children, but this is quite costly.  

 This use case is common enough that `readdir` on most modern platforms do expose `struct dirent.d_type`, which allows to know the type of the child without an extra syscall: 

 From the `scandir` manpage: 

 >    d_type: This field contains a value indicating the file type, 
               making it possible to avoid the expense of calling lstat(2) 

 I wrote a quick prototype, and relying on `dirent.d_type` instead of `stat(2)` allows to recursively scan Ruby's repository twice as fast on my machine: https://github.com/ruby/ruby/pull/15667 

 Given that recursively scanning directories is a common task across many popular ruby tools (`zeitwerk`, `rubocop`, etc), I think it would be very valuable to provide this more efficient interface. 

 In addition, @nobu noticed my prototype, and implemented a nicer version of it, where a `File::Stat` is yielded: https://github.com/ruby/ruby/commit/9acf67057b9bc6f855b2c37e41c1a2f91eae643a 

 In that case the `File::Stat` is lazy, it's only if you access something other than file type, that the actual `stat(2)` call is emitted. 
 I think this API is both more efficient and more convenient. 

 ### Proposed API 

 ```ruby 
 Dir.foreach(path) { |name| } 
 Dir.foreach(path) { |name, stat| } 
 Dir.each_child(path) { |name| } 
 Dir.each_child(path) { |name, stat| |name| } 
 Dir.new(path).each_child { |name| |name, stat| } 
 Dir.new(path).each_child { |name, stat| } 
 Dir.new(path).each { |name| |name, stat| } 
 Dir.new(path).each { |name, stat| } 
 ``` 

 Also important to note, the `File::Stat` is expected to be equivalent to a `lstat(2)` call, as to be able to chose to follow symlinks or not. 

 Basic use case: 

 ```ruby 
 def count_ruby_files(root) 
   count = 0 
   queue = [root] 
   while dir = queue.pop 
     Dir.(dir) do |name, stat| 
       next if name.start_with?(".") 

       if stat.directory? 
         queue << File.join(dir, name) 
       elsif stat.file? 
         count += 1 if name.end_with?(".rb") 
       end 
     end 
   end 
   count 
 end 
 ```

Back