Project

General

Profile

Feature #16848

Updated by sawa (Tsuyoshi Sawada) 3 months ago

Make This would make it much easier to implement `$LOAD_PATH` caching, and speed up application this technique allow to greatly speedup Ruby applications boot time. 

 I just benchmarked it on Redmine's master master, using bootsnap with only the that optimization enabled: 

 ```ruby 
 if ENV['CACHE_LOAD_PATH'] 
   require 'bootsnap' 
   Bootsnap.setup( 
     cache_dir:              'tmp/cache', 
     development_mode:       false, 
     load_path_cache:        true, 
     autoload_paths_cache: true, 
     disable_trace:          false, 
     compile_cache_iseq:     true, 
     compile_cache_yaml:     false, 
   ) 
 end 
 ``` 

 ``` 
 $ RAILS_ENV=production time bin/rails runner 'p 1' 
         2.66 real           1.99 user           0.66 sys 
 $ RAILS_ENV=production time bin/rails runner 'p 1' 
         2.71 real           1.97 user           0.66 sys 
 $ CACHE_LOAD_PATH=1 RAILS_ENV=production time bin/rails runner 'p 1' 
         1.41 real           1.12 user           0.28 sys 
 $ CACHE_LOAD_PATH=1 RAILS_ENV=production time bin/rails runner 'p 1' 
         1.41 real           1.12 user           0.28 sys 
 ``` 

 That's twice twice, for a relatively small application. And However the performance improvement is not linear; linear, the larger the application, the larger the improvement. improvement is. 

 ### How it works 

 The problem itself is simple, in Ruby `require` has `O($LOAD_PATH.size)` performance. The more gems you add to your `Gemfile`, Gemfile, the larger bigger `$LOAD_PATH` becomes. gets. Then when you `require "foo.rb"` Ruby 
 will try to open the file that path in each of the `$LOAD_PATH` entries. And since more gems usually also means more `require` calls, you can even consider than loading Ruby code may take up to has quadratic performance loss. performance. 

 To improve this, this Bootsnap pre-computes precompute a map of all the files in your `$LOAD_PATH`, and uses use it to convert translate relative paths into absolute paths so that Ruby skips the `$LOAD_PATH` traversal. 

 ```ruby 
 $LOAD_PATH = $w(/gems/foo/lib /gems/bar/lib) 

 BOOTSNAP_CACHE = { 
   "bar.rb" => "/gems/bar/lib/bar.rb", 
 } 
 ``` 

 This resolves allow to resolve a file lookup by with a single hash Hash lookup, and reduces which roughtly bring back the boot performance from roughly `O($LOAD_PATH.size * number_of_files_to_require)` to `O(number_of_files_to_require)`. 

 This optimization is also used in [Gel](https://github.com/gel-rb/gel), applied by [Gel](https://github.com/gel-rb/gel) a Rubygems/Bundler replacement. 

 ### Trade offs Tradeoffs 

 Every One difficulty with this is that any time `$LOAD_PATH` is modified, the cache must become be considered as invalidated. While But while this is complex to do for Bootsnap, it would be fairly easy if it is it's implemented inside Ruby. 

 More importantly, The other, more important difficulty, is that you also have to invalidate the cache whenever you add or delete a file to/from in one of the `$LOAD_PATH` members; otherwise, members, otherwise if you shadow or unshadow another file that is farther in the `$LOAD_PATH`, Bootsnap will load a the wrong file. 
 For instance, instance if `require "foo.rb"` initially resolves used to resolve to `/some/gem/foo.rb`, and but you create just created `lib/foo.rb`, you'll need to flush Bootsnap cache. 

  

 That latter one is trickier, much trickier to handle right, and Bootsnap has simply decided that it is rare was unlikely enough to cause actual problems, and so far that holds. it held true. 
 But that is obviously that's not a trade off tradeoff Ruby can make. 

 However that's probably a tradeoff Rubygems/Bundler can make. While it's common to edit your gems to debug something, it's really uncommon to add or remove files inside them. So in theory Rubygems/Bundler could compute a map of all requirable files in a gem that can be required after it installs it. 
 Then when you activate it, you merge it together with the other activated gems. gem. 

 ### Proposal 

 This could be reasonably easy to implement if `$LOAD_PATH` accepted callables in addition to paths. Something like this: 

 ```ruby 
 $LOAD_PATH = [ 
   'my_app/lib', 
   BundlerOrRubygems.method(:lookup), 
 ] 
 ``` 

 The contract would be that `BundlerOrRubygems.lookup("some_relative/path.rb")` would return either an absolute path or `nil`. With such API, API it would be easy to cache provide absolute paths path caching only for gems and the stdlib, 
 and preserve the current cache-less no cache behavior for the application specific load paths, which are usually much less numerous. It would also allow frameworks such as Rails to implement the that same caching for the application paths when running in an environment 
 where the source files are considered immutable (typically production). 

Back