Project

General

Profile

Feature #16848

Allow callables in $LOAD_PATH

Added by byroot (Jean Boussier) 3 months ago. Updated about 1 month ago.

Status:
Feedback
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:98257]

Description

Make it easier to implement $LOAD_PATH caching, and speed up application boot time.

I benchmarked it on Redmine's master using bootsnap with only the optimization enabled:

if ENV['CACHE_LOAD_PATH']
  require 'bootsnap'
  Bootsnap.setup(
    cache_dir:            'tmp/cache',
    development_mode:     false,
    load_path_cache:      true,
    autoload_paths_cache: true,
    disable_trace:        false,
    compile_cache_iseq:   true,
    compile_cache_yaml:   false,
  )
end
$ RAILS_ENV=production time bin/rails runner 'p 1'
        2.66 real         1.99 user         0.66 sys
$ RAILS_ENV=production time bin/rails runner 'p 1'
        2.71 real         1.97 user         0.66 sys
$ CACHE_LOAD_PATH=1 RAILS_ENV=production time bin/rails runner 'p 1'
        1.41 real         1.12 user         0.28 sys
$ CACHE_LOAD_PATH=1 RAILS_ENV=production time bin/rails runner 'p 1'
        1.41 real         1.12 user         0.28 sys

That's twice for a relatively small application. And the performance improvement is not linear; the larger the application, the larger the improvement.

How it works

require has O($LOAD_PATH.size) performance. The more gems you add to your Gemfile, the larger $LOAD_PATH becomes. require "foo.rb" will try to open the file in each of the $LOAD_PATH entries. And since more gems usually also means more require calls, loading Ruby code may take up to quadratic performance loss.

To improve this, Bootsnap pre-computes a map of all the files in your $LOAD_PATH, and uses it to convert relative paths into absolute paths so that Ruby skips the $LOAD_PATH traversal.

$LOAD_PATH = $w(/gems/foo/lib /gems/bar/lib)

BOOTSNAP_CACHE = {
  "bar.rb" => "/gems/bar/lib/bar.rb",
}

This resolves file lookup by a single hash lookup, and reduces boot performance from roughly O($LOAD_PATH.size * number_of_files_to_require) to O(number_of_files_to_require).

This optimization is also used in Gel, a Rubygems/Bundler replacement.

Trade offs

Every time $LOAD_PATH is modified, the cache must become invalidated. While this is complex to do for Bootsnap, it would be fairly easy if it is implemented inside Ruby.

More importantly, you have to invalidate the cache whenever you add or delete a file to/from one of the $LOAD_PATH members; otherwise, if you shadow or unshadow another file farther in the $LOAD_PATH, Bootsnap will load a wrong file. For instance, if require "foo.rb" initially resolves to /some/gem/foo.rb, and you create lib/foo.rb, you'll need to flush Bootsnap cache.

That latter is trickier, and Bootsnap has decided that it is rare enough to cause actual problems, and so far that holds. But that is not a trade off Ruby can make.

However that's probably a tradeoff Rubygems/Bundler can make. While it's common to edit your gems to debug something, it's really uncommon to add or remove files inside them. So in theory Rubygems/Bundler could compute a map of all files in a gem that can be required after it installs it. Then when you activate it, you merge it together with the other activated gems.

Proposal

This could be reasonably easy to implement if $LOAD_PATH accepted callables in addition to paths. Something like this:

$LOAD_PATH = [
  'my_app/lib',
  BundlerOrRubygems.method(:lookup),
]

The contract would be that BundlerOrRubygems.lookup("some_relative/path.rb") would return either an absolute path or nil. With such API, it would be easy to cache absolute paths only for gems and the stdlib, and preserve the current cache-less behavior for the application specific load paths, which are usually much less numerous. It would also allow frameworks such as Rails to implement the same caching for application paths when running in an environment
where the source files are immutable (typically production).

Also available in: Atom PDF