Bug #21119
openPrograms containing `Dir.glob` with a thread executing a CPU-heavy task run very slowly.
Description
Executing the following code in Ruby 3.4.1 takes a very long time, especially when there are many files (100~) in the current directory.
This delay does not occur in Ruby 3.3.6.
Reproducible script¶
# hoge.rb
# Launch a thread to execute CPU-heavy task
Thread.new do
loop do
arr = []
100.times do
arr << rand(1...100)
end
end
end
# Execute a program containing `Dir.glob` in the main thread.
10.times do
Dir.glob('*')
puts "aaaa"
end
Execution Results¶
Executing the above code in Ruby 3.4.1 takes 119.43s.
$ ruby -v
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
$ time ruby hoge.rb
aaaa
aaaa
aaaa
aaaa
aaaa
aaaa
aaaa
aaaa
aaaa
aaaa
ruby hoge.rb 119.43s user 0.30s system 99% cpu 1:59.89 total
Executing it in Ruby master also takes 118.87s.
$ ~/opt-ruby/bin/ruby -v
ruby 3.5.0dev (2025-02-06T14:10:34Z master adbf9c5b36) +PRISM [arm64-darwin24]
$ time ~/opt-ruby/bin/ruby hoge.rb
aaaa
aaaa
aaaa
aaaa
aaaa
aaaa
aaaa
aaaa
aaaa
aaaa
~/opt-ruby/bin/ruby hoge.rb 118.87s user 0.46s system 99% cpu 2:00.45 total
Executing it in Ruby 3.3.6 takes only 2.22s.
$ ruby -v
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) [arm64-darwin24]
$ time ruby hoge.rb
aaaa
aaaa
aaaa
aaaa
aaaa
aaaa
aaaa
aaaa
aaaa
aaaa
ruby hoge.rb 2.22s user 0.03s system 98% cpu 2.286 total
So, there are roughly 50x delays.
Possible Cause¶
From Ruby 3.4.0, Dir.glob
releases the GVL frequently.
Due to this change, when a CPU-heavy thread releases the GVL, Dir.glob
also releases the GVL immediately.
As a result, Dir.glob
gets significantly delayed because it has to continuously regain the GVL causing a major slowdown in execution.
Note about Execution Results¶
I measured the execution results under a stress condition, with 100 files in the current directory.
If there are fewer files, the slowdown may be less pronounced.
Updated by luke-gru (Luke Gruber) 16 days ago
ยท Edited
This might be an issue with Kernel#loop being defined now in Ruby itself, and it never calls a primitive to check interrupts. Checking interrupts and having a timer interrupt would switch threads. You can mimic this by calling Thread.pass
at the bottom of the loop.
Updated by jeremyevans0 (Jeremy Evans) 16 days ago
It is simple to revert the GVL-releasing, but then no other thread can run while accessing the filesystem (which may block for a long period of time for networked filesystems). GVL-releasing is a tradeoff. It mitigates damage if the filesystem access takes a long time, but it makes the common case slower. I think this issue is much more pronounced on Mac OS and other systems where getattrlist
/fgetattrlist
are used in order to determine whether normalization is needed, because then the GVL is released for every directory entry. I don't have any opinion on whether the tradeoff is worth it in this case.
Updated by byroot (Jean Boussier) 16 days ago
- Related to Bug #20587: dir.c calls blocking filesystem APIs/system calls while holding the GVL added
Updated by byroot (Jean Boussier) 16 days ago
I don't think we should revert the GVL freeing, but we should really start to think about a smarter scheduler that don't penalize threads that release the GVL. It's a longer project though.
Updated by luke-gru (Luke Gruber) 16 days ago
Yeah sorry it is the GVL, like you guys are saying. There are many syscalls here, it would be nice to just release it at the top and get it back after all the syscalls, but then
there's probably a lot of ruby functions in between the syscalls that need the GVL...
I agree with @byroot (Jean Boussier) that we need a smarter scheduler for these cases. And alternatively to not penalizing threads that release the GVL, we could do like Go and not release the GVL (the P
in go parlance) on potentially short blocking syscalls and instead register the thread with a monitoring thread (maybe the timer thread?) before the syscall. That monitoring thread checks ruby threads that are in this blocking state for too long and gives the GVL to another waiting thread if it exceeds the limit. If it doesn't exceed this time limit, the ruby thread never yields. This way we could use the GVL release for calls that we know will block a while and use the optimistic no-release case for calls we think will be fast.
Updated by naruse (Yui NARUSE) 9 days ago
If there is a C Dir.glob implementation, we can run it in another pthread in parallel.