Project

General

Profile

Feature #16848

Updated by nobu (Nobuyoshi Nakada) over 4 years ago

This would make it much easier to implement `$LOAD_PATH` caching, this technique allow to greatly speedup Ruby applications boot time. 

 I just benchmarked it on Redmine's master, using bootsnap with only that optimization enabled: 

 ```ruby 
 if ENV['CACHE_LOAD_PATH'] 
   require 'bootsnap' 
   Bootsnap.setup( 
     cache_dir:              'tmp/cache', 
     development_mode:       false, 
     load_path_cache:        true, 
     autoload_paths_cache: true, 
     disable_trace:          false, 
     compile_cache_iseq:     true, 
     compile_cache_yaml:     false, 
   ) 
 end 
 ``` 

 ``` 
 $ RAILS_ENV=production time bin/rails runner 'p 1' 
         2.66 real           1.99 user           0.66 sys 
 $ RAILS_ENV=production time bin/rails runner 'p 1' 
         2.71 real           1.97 user           0.66 sys 
 $ CACHE_LOAD_PATH=1 RAILS_ENV=production time bin/rails runner 'p 1' 
         1.41 real           1.12 user           0.28 sys 
 $ CACHE_LOAD_PATH=1 RAILS_ENV=production time bin/rails runner 'p 1' 
         1.41 real           1.12 user           0.28 sys 
 ``` 

 That's twice, for a relatively small application. However the performance improvement is not linear, the larger the application, the larger the improvement is. 

 ### How it works 

 The problem itself is simple, in Ruby `require` has `O($LOAD_PATH.size)` performance. The more gems you add to your Gemfile, the bigger `$LOAD_PATH` gets. Then when you `require "foo.rb"` Ruby 
 will try to open that path in each of the `$LOAD_PATH` entries. And since more gems usually also means more `require` calls, you can even consider than loading Ruby code has quadratic performance. 

 To improve this Bootsnap precompute a map of all the files in your `$LOAD_PATH`, and use it to translate relative paths into absolute paths so that Ruby skips the `$LOAD_PATH` traversal. 

 ```ruby ``` 
 $LOAD_PATH = $w(/gems/foo/lib /gems/bar/lib) 

 BOOTSNAP_CACHE = { 
   "bar.rb" => "/gems/bar/lib/bar.rb", 
 } 
 ``` 

 This allow to resolve a file lookup with a single Hash lookup, which roughtly bring back the boot performance from `O($LOAD_PATH.size * number_of_files_to_require)` to `O(number_of_files_to_require)`. 

 This optimization is also applied by [Gel](https://github.com/gel-rb/gel) a Rubygems/Bundler replacement. 

 ### Tradeoffs 

 One difficulty with this is that any time `$LOAD_PATH` is modified, the cache must be considered as invalidated. But while this is complex to do for Bootsnap, it would be fairly easy if it's implemented inside Ruby. 

 The other, more important difficulty, is that you also have to invalidate the cache whenever you add or delete a file in one of the `$LOAD_PATH` members, otherwise if you shadow or unshadow another file that is farther in the `$LOAD_PATH`, Bootsnap will load the wrong file. 
 For instance if `require "foo.rb"` used to resolve to `/some/gem/foo.rb`, but you just created `lib/foo.rb`, you'll need to flush Bootsnap cache.  

 That one is much trickier to handle right, and Bootsnap has simply decided that it was unlikely enough to cause actual problems, and so far it held true. 
 But obviously that's not a tradeoff Ruby can make. 

 However that's probably a tradeoff Rubygems/Bundler can make. While it's common to edit your gems to debug something, it's really uncommon to add or remove files inside them. So in theory Rubygems/Bundler could compute a map of all requirable files in a gem after it installs it. 
 Then when you activate it, you merge it together with the other activated gem. 

 ### Proposal 

 This could be reasonably easy to implement if `$LOAD_PATH` accepted callables in addition to paths. Something like this: 

 ```ruby 
 $LOAD_PATH = [ 
   'my_app/lib', 
   BundlerOrRubygems.method(:lookup), 
 ] 
 ``` 

 The contract would be that `BundlerOrRubygems.lookup("some_relative/path.rb")` would return either an absolute path or `nil`. With such API it would be easy to provide absolute path caching only for gems and the stdlib, 
 and preserve the current no cache behavior for the application specific load paths, which are usually much less numerous. It would also allow frameworks such as Rails to implement that same caching for the application paths when running in an environment 
 where the source files are considered immutable (typically production). 

Back