Project

General

Profile

Feature #11148

Add a way to require files, but not raise an exception when the file isn't found

Added by tenderlovemaking (Aaron Patterson) over 4 years ago. Updated over 4 years ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:69157]

Description

Hi,

I'm trying to make is so that RubyGems doesn't need to put directories on $LOAD_PATH (which is why I submitted Feature #11140). I would like the require implemented in RubyGems to look up the file from a cache generated when the gem is installed, then pass a full file path to require.

The problem is that the user may have manipulated the load path somehow, and RubyGems needs to detect if the file is in the load path. Today, the algorithm inside RubyGems looks something like this:

def require file
  if file_is_from_a_default_gem?(file) # this is so you can install new versions of default gems
    add_default_gem_to_loadpath
  end
  real_require file
rescue LoadError
  gem = find_gem_that_contains_file(file)
  add_gem_to_loadpath gem
  real_require file
end

Instead of adding the directory to the load path, I would like to look up the full file path from a cache that is generated when the gem is installed. If we had a cache, that means the new implementation would look like this:

def require file
  if file_is_from_a_default_gem?(file) # this is so you can install new versions of default gems
    add_default_gem_to_loadpath
  end
  real_require file # get slower as paths are added to LOAD_PATH
rescue LoadError
  gem = find_gem_that_contains_file(file) # use a cache so lookup is O(1)
  fully_qualified_path = gem.full_path file
  real_require fully_qualified_path # send a fully qualified path, so LOAD_PATH isn't searched
end

Unfortunately, that means that every call to require in the system would raise an exception. I'd like to add a version of require that we can call that doesn't raise an exception. Then I could write the code like this:

def require file
  if file_is_from_a_default_gem?(file) # this is so you can install new versions of default gems
    add_default_gem_to_loadpath
  end
  found = try_require file
  if nil == found
    gem = find_gem_that_contains_file(file) # use a cache so lookup is O(1)
    fully_qualified_path = gem.full_path file
    real_require fully_qualified_path # send a fully qualified path, so LOAD_PATH isn't searched
  end
    found
  end
end

This would keep the load path small, and prevent exceptions from happening during the "normal" case.

I've attached a patch that implements try_require, but I'm not set on the name. Maybe doing require(file, exception: false) would work too.


Files

try_require.patch (2.44 KB) try_require.patch tenderlovemaking (Aaron Patterson), 05/12/2015 10:47 PM

History

Updated by nobu (Nobuyoshi Nakada) over 4 years ago

  • Description updated (diff)

Although I had an idea to separate require into "search" and "load", this may be simpler.

Updated by Eregon (Benoit Daloze) over 4 years ago

Why is that exception problematic?
For performance (the cost of the search is already large I suppose)
or to only catch the LoadError from require and not accidentally from somewhere else? (this could potentially affect compatibility)

#3

Updated by tenderlovemaking (Aaron Patterson) over 4 years ago

nobu (Nobuyoshi Nakada) I was thinking the same, but this was the smallest patch that would accomplish what I need

Benoit (Benoit BENEZECH) yes, for performance, and to avoid catching load errors. If my plan is successful, rubygems would stop adding directories to the load path. That means searching should be relatively fast (since the load path would be relatively small). With the current algorithm, the first require that "activates" a gem will always raise an exception, then the gem gets loaded, and all of the requires inside the gem will not raise an exception. So say 98% of the time, require doesn't raise an exception. If I stop adding directories to the load path, then 98% of requires will raise an exception. I think that would incur a non-trivial overhead (though I don't have numbers for you right now).

Updated by Eregon (Benoit Daloze) over 4 years ago

Aaron Patterson wrote:

Benoit (Benoit BENEZECH) yes, for performance, and to avoid catching load errors. If my plan is successful, rubygems would stop adding directories to the load path. That means searching should be relatively fast (since the load path would be relatively small). With the current algorithm, the first require that "activates" a gem will always raise an exception, then the gem gets loaded, and all of the requires inside the gem will not raise an exception. So say 98% of the time, require doesn't raise an exception. If I stop adding directories to the load path, then 98% of requires will raise an exception. I think that would incur a non-trivial overhead (though I don't have numbers for you right now).

Right, makes sense. It would be great to have some data though :)

How would the cache deal with duplicated keys, that is when multiple gems have a same relative path inside their lib/,etc directories? I think there might be some expectation for some gems on having the gem lib/,etc in $LOAD_PATH.

Is the first find_gem_that_contains_file(file) O(number of installed gems) or is there some heuristic matching the first component of file with a gem name?

Also available in: Atom PDF