Project

General

Profile

Actions

Bug #19378

open

Windows: Use less syscalls for faster require of big gems

Bug #19378: Windows: Use less syscalls for faster require of big gems

Added by aidog (Andi Idogawa) over 3 years ago. Updated 1 day ago.

Status:
Assigned
Assignee:
Target version:
-
[ruby-core:112045]

Description

Hello ๐Ÿ™‚

Problem

require is slow on windows for big gems. (example: require 'gtk3'=> 3 seconds+). This is a problem for people who want to make cross platform GUI apps with ruby.

Possible Reason

As touched on in #15797 it seems like require uses realpath, which is emulated on windows. It checks every parent directory. The same syscalls run many times.

Testfile

C:\tmp\speedtest\testrequire.rb:

require __dir__ + "/helloworld1.rb"
require __dir__ + "/helloworld2.rb"
ruby --disable-gems C:\tmp\speedtest\testrequire.rb

Syscalls per File/Directory:

  1. CreateFile
  2. QueryInformationVolume
  3. QueryIdInformation
  4. QueryAllInformationFile
  5. QueryNameInformationFile
  6. QueryNameInformationFile
  7. QueryNormalizedNameInformationFile
  8. CloseFile

Files/Directories checked

  1. C:\tmp
  2. C:\tmp\speedtest
  3. C:\tmp\speedtest\helloworld1.rb
  4. C:\tmp
  5. C:\tmp\speedtest
  6. C:\tmp\speedtest\helloworld2.rb

For two required files Ruby had to do 8*6 = 48 syscalls.
The syscalls orginate from rb_w32_reparse_symlink_p / lstat

Rubygems live in subfolders with 9+ parts: "C:\Ruby32-x64\lib\ruby\gems\3.2.0\gems\glib2-4.0.8\lib\glib2\variant.rb"
Each file takes 8 * 9 = 72+ calls. For variant.rb it is 80 calls.
The result for the syscalls don't change in such a short time, so it should be possible to cache it.

With require_relative it's twice as many calls.

Other testcases

Same result:

File.realpath __dir__ + "/helloworld1.rb"
File.realpath __dir__ + "/helloworld2.rb"
File.stat __dir__ + "/helloworld1.rb"
File.stat __dir__ + "/helloworld2.rb"

It does not happen in $LOAD_PATH.resolve_feature_path(dir + "/helloworld1.rb")

Request

Would it be possible to cache the stat calls when using require?
I tried to implement a cache inside the ruby source code, but failed.
If not, is there now a way to combine ruby files into one?

I previously talked about require here: YJIT: Windows support lacking.

How to reproduce

Ruby versions: At least 3.0+, most likely older ones too.
Tested using Ruby Installer 3.1 and 3.2.
Procmon Software by Sysinternals


Files

windows-no-realpath-require.patch (992 Bytes) windows-no-realpath-require.patch test to avoid repeated syscalls aidog (Andi Idogawa), 01/30/2023 03:10 AM
windows-revert-79a4484a.patch (5.42 KB) windows-revert-79a4484a.patch joshc (Josh C), 02/24/2023 01:40 AM
MINIMAL-stat-fastpath.patch (6.41 KB) MINIMAL-stat-fastpath.patch aidog (Andi Idogawa), 06/16/2026 09:16 PM
recommendations.txt (4.01 KB) recommendations.txt aidog (Andi Idogawa), 06/16/2026 09:17 PM

Updated by aidog (Andi Idogawa) over 3 years ago 1Actions #1 [ruby-core:112110]

Thanks to the new windows build docs by ioquatix, I made a test patch to check how much faster it would be if some of the repeated syscalls on the folders (c:/tmp/, c:/tmp/speedtest, gems and so on) are avoided:

tzinfo: 0.8s to 0.3s
gtk3: 2.8s to 2.5s (I see another similar issue inside the gem C code)

Windows has GetFinalPathNameByHandleW since Vista, which some other projects use for realpath. Would it work for Ruby?

Updated by nobu (Nobuyoshi Nakada) over 3 years ago Actions #2 [ruby-core:112257]

  • Status changed from Open to Assigned
  • Assignee set to windows

Updated by joshc (Josh C) over 3 years ago Actions #3 [ruby-core:112566]

I've also noticed a significant increase in file IO events (as reported by procmon) due to https://github.com/ruby/ruby/commit/79a4484a072e9769b603e7b4fbdb15b1d7eccb15 introduced in Ruby 3.1.0. The code tries to prevent the same file from being loaded twice by calling rb_realpath_internal to see if the realpath has already been loaded. This is a problem on systems like Windows that use Ruby's emulated realpath, especially when there are deeply nested directories. I've attached a revert patch. It'd be great to use GetFinalPathNameByHandleW and avoid the emulate code.

Updated by jeremyevans0 (Jeremy Evans) over 3 years ago Actions #4 [ruby-core:112567]

joshc (Josh C) wrote in #note-3:

I've attached a revert patch.

I think the only way we would revert 79a4484a072e9769b603e7b4fbdb15b1d7eccb15 is if someone can come up with an alternative approach to fixing Bug #17885.

It'd be great to use GetFinalPathNameByHandleW and avoid the emulate code.

If you mean to use this on Windows for the internals of File#realpath, I think we would be open to a backwards compatible patch for that, but @usa (Usaku NAKAMURA) would need to decide as he maintains the mswin64 platform.

Updated by MSP-Greg (Greg L) over 3 years ago Actions #6 [ruby-core:112648]

Just to be clear, this issue affects all Windows MRI platforms, so both mswin64 and mingw32 (mingw & ucrt builds) are affected.

Updated by rlam (Robert Lam) 5 months ago ยท Edited Actions #7 [ruby-core:124488]

Hello, thank you for your attention to this issue. We have a large Ruby project using windows and after upgrading from v2.1.4 to v3.4.7 we are noticing the slowness with require.

This is not an issue under v2.1.4. Looking at the syscalls the major difference between the two versions is instead of all 8 syscalls per file \ dir as you mentioned above it's only these 3:

  1. CreateFile
  2. QueryNetworkOpenInformationFile
  3. CloseFile

For require bigdecimal we experience ~12000 syscalls under Ruby 3 and only ~3000 under Ruby 2.

Updated by aidog (Andi Idogawa) 1 day ago Actions #8 [ruby-core:125783]

3 years and a few RubyKaigis later, I've run some tests.

Using GetFileInformationByName (Windows 11 24H2+) in winnt_stat I'm seeing speedups of 2.32x faster loading of active_support/all
2.77x for loading nokogiri and File.stat of an existing file 112 us -> 21 us (5.4x).
Benchmark results โ€” minimal one-file patch vs unpatched, Ruby 4.0.5

(best of 5 cold-process runs)

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Gem โ”‚ Unpatched โ”‚ Minimal โ”‚ Speedup โ”‚ Features โ”‚
โ”‚ โ”‚ โ”‚ patch โ”‚ โ”‚ โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ rubocop โ”‚ 3186 ms โ”‚ 1651 ms โ”‚ 1.93ร— โ”‚ 1221 โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ sinatra โ”‚ 546 ms โ”‚ 296 ms โ”‚ 1.85ร— โ”‚ 137 โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ faraday โ”‚ 340 ms โ”‚ 204 ms โ”‚ 1.67ร— โ”‚ 112 โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ sequel โ”‚ 254 ms โ”‚ 140 ms โ”‚ 1.81ร— โ”‚ 60 โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ rspec โ”‚ 213 ms โ”‚ 100 ms โ”‚ 2.13ร— โ”‚ 62 โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ bigdecimal โ”‚ 52 ms โ”‚ 18 ms โ”‚ 2.95ร— โ”‚ 2 โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ nokogiri (earlier) โ”‚ 224 ms โ”‚ 81 ms โ”‚ 2.77ร— โ”‚ 84 โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ active_support/all โ”‚ 1005 ms โ”‚ 433 ms โ”‚ 2.32ร— โ”‚ 433 โ”‚
โ”‚ (earlier) โ”‚ โ”‚ โ”‚ โ”‚ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

I've attached a Claude patch about 70 lines of code, which i'm sure a lot of it is wrong, but it might be interesting regardless. It is not a big change.
Some other improvements were found, but nothing else made such a huge difference.

Actions

Also available in: PDF Atom