Project

General

Profile

Actions

Bug #7158

closed

require is slow in its bookkeeping; can make Rails startup 2.2x faster

Added by gregprice (Greg Price) almost 12 years ago. Updated over 11 years ago.

Status:
Closed
Target version:
ruby -v:
ruby 1.9.3p194 (2012-04-20 revision 35409) [i686-linux]
Backport:
[ruby-core:47970]

Description

=begin
Starting a large application in Ruby is slow. Most of the startup
time is not spent in the actual work of loading files and running Ruby
code, but in bookkeeping in the 'require' implementation. I've
attached a patch series which makes that bookkeeping much faster.
These patches speed up a large Rails application's startup by 2.2x,
and a pure-'require' benchmark by 3.4x.

These patches fix two ways in which 'require' is slow. Both problems
have been discussed before, but these patches solve the problems with
less code and stricter compatibility than previous patches I've seen.

  • Currently we iterate through $LOADED_FEATURES to see if anything
    matches the newly required feature. Further, each iteration
    iterates in turn through $LOAD_PATH. Xavier Shay spotted this
    problem last year and a series of patches were discussed
    (in Issue #3924) to add a Hash index alongside $LOADED_FEATURES,
    but for 1.9.3 none were merged; Masaya Tarui committed Revision r31875,
    which mitigated the problem. This series adds a Hash index,
    and keeps it up to date even if the user modifies $LOADED_FEATURES.
    This is worth a 40% speedup on one large Rails application,
    and 2.3x on a pure-'require' benchmark.

  • Currently each 'require' call runs through $LOAD_PATH and calls
    rb_file_expand_path() on each element. Yura Sokolov (funny_falcon)
    proposed caching this last December in Issue #5767, but it wasn't
    merged. This series also caches $LOAD_PATH, and keeps the cache up
    to date with a different, less invasive technique. The cache takes
    34 lines of code, and is worth an additional 57% speedup in
    starting a Rails app and a 46% speedup in pure 'require'.

== Staying Compatible

With both the $LOADED_FEATURES index and the $LOAD_PATH cache,

  • we exactly preserve the semantics of the user modifying $LOAD_PATH
    or $LOADED_FEATURES;

  • both $LOAD_PATH and $LOADED_FEATURES remain ordinary Arrays, with
    no singleton methods;

  • we make just one semantic change: each element of $LOAD_PATH and
    $LOADED_FEATURES is made into a frozen string. This doesn't limit
    the flexibility Ruby offers to the programmer in any way; to alter
    an element of either array, one simply reassigns it to the new
    value. Further, normal path-munging code which only adds and
    removes elements shouldn't have to change at all.

These patches use the following technique to keep the cache and the
index up to date without modifying the methods of $LOADED_FEATURES or
$LOAD_PATH: we take advantage of the sharing mechanism in the Array
implementation to detect, in O(1) time, whether either array has been
mutated. We cause $LOADED_FEATURES to be shared with an Array we keep
privately in load.c; if anything modifies it, it will break the
sharing and we will know to rebuild the index. Similarly for
$LOAD_PATH.

== Benchmarks

First, on my company's Rails application, where $LOAD_PATH.size is 207
and $LOADED_FEATURES.size is 2126. I measured the time taken by
'bundle exec rails runner "p 1"'.

. Rails startup time,
version best of 5 speedup
v1_9_3_194 12.197s
v1_9_3_194+index 8.688s 1.40x
v1_9_3_194+index+cache 5.538s 2.20x

And now isolating the performance of 'require', by requiring
16000 empty files.

version time, best of 5 speedup
trunk (at r36920) 10.115s
trunk+index 4.363s 2.32x
trunk+index+cache 2.984s 3.39x

(The timings for the Rails application are based on the latest release
rather than trunk because a number of gems failed to compile against
trunk for me.)

== The Patches

I've attached four patches:

(1) Patch 1 changes no behavior at all. It adds comments and
simplifies a bit of code to help in understanding why patch 3 is
correct. 42 lines, most of them comments.

(2) Patch 2 adds a function to array.c which will help us tell when
$LOAD_PATH or $LOADED_FEATURES has been modified. 17 lines.

(3) Patch 3 adds the $LOADED_FEATURES index. 150 lines.

(4) Patch 4 adds the $LOAD_PATH cache. 34 lines.

Reviews and comments welcome -- I'm sure there's something I could do
to make these patches better. I hope we can get some form of them
into trunk before the next release. My life has been happier since I
switched to this version because commands in my Rails application all
run faster now, and I want every Ruby programmer to be happier in the
same way with 2.0 and ideally with 1.9.4.

=end


Files

0001-Clarify-and-explain-loaded_feature_path-and-rb_featu.patch (4.54 KB) 0001-Clarify-and-explain-loaded_feature_path-and-rb_featu.patch Patch 1 - comments and clarifications only gregprice (Greg Price), 10/14/2012 01:56 PM
0002-Expose-whether-two-arrays-are-shared-read-only-C-onl.patch (2.15 KB) 0002-Expose-whether-two-arrays-are-shared-read-only-C-onl.patch Patch 2 - add rb_ary_shared_with_p() gregprice (Greg Price), 10/14/2012 01:56 PM
0003-Index-LOADED_FEATURES-so-require-isn-t-so-slow.patch (11.2 KB) 0003-Index-LOADED_FEATURES-so-require-isn-t-so-slow.patch Patch 3 - index for $LOADED_FEATURES gregprice (Greg Price), 10/14/2012 01:56 PM
0004-Cache-the-expanded-load-path.patch (4.33 KB) 0004-Cache-the-expanded-load-path.patch Patch 4 - cache for $LOAD_PATH gregprice (Greg Price), 10/14/2012 01:56 PM

Related issues 1 (0 open1 closed)

Related to Ruby master - Bug #3924: Performance bug (in require?)Closedyugui (Yuki Sonoda)10/10/2010Actions

Updated by gregprice (Greg Price) almost 12 years ago

=begin
I've also made these patches available at https://github.com/gnprice/ruby , in the "fast-require" branch against a recent trunk and in "fast-require-1.9.3p194" against the latest release.

If you use ((<rbenv|URL:https://github.com/sstephenson/rbenv>)) and its ((<ruby-build|URL:https://github.com/sstephenson/ruby-build>)) plugin, one convenient way to try out the patched version is to create a file "fast-require" containing the one line

install_git 1.9.3-p194-price-fast-require https://github.com/gnprice/ruby fast-require-1.9.3p194 autoconf standard

and run "rbenv install /path/to/fast-require". Then "rbenv shell fast-require" will select the new version in the current shell, and "rbenv global fast-require" will select it globally.

=end

Updated by trans (Thomas Sawyer) almost 12 years ago

I believe a great deal of additional speed could be gained by optimizing #require_relative (and making use of it, of course). From what I understand, #require_relative ends up calling ordinary #require code, which is inefficient since #require_relative already knows which path to find the script, so why have require search the $LOAD_PATH for it?

In all these efforts to speed up load time it would be nice to see someone tackle this issue as well.

Updated by gregprice (Greg Price) almost 12 years ago

trans (Thomas Sawyer) wrote:

I believe a great deal of additional speed could be gained by optimizing
#require_relative (and making use of it, of course).

I'd like to keep this thread focused on speeding up existing code which uses #require. If you're interested in making changes to #require_relative (which is a function that is not involved in the startup time of most existing libraries or applications), a separate issue would be the best place to discuss that.

From what I understand, #require_relative ends up calling ordinary
#require code, which is inefficient since #require_relative already
knows which path to find the script, so why have require search
the $LOAD_PATH for it?

Note that most of the time #require spends is not in searching $LOAD_PATH -- it's in deciding whether the requested library needs to be loaded at all, or refers to a file that has already been loaded. That's what this patch series addresses. (Even the expanded $LOAD_PATH, which this Patch 4 caches, is used in making that decision before it is used to search to find the script.)

Greg

Updated by wycats (Yehuda Katz) almost 12 years ago

Yehuda Katz
(ph) 718.877.1325

On Tue, Oct 16, 2012 at 12:21 AM, gregprice (Greg Price) wrote:

Issue #7158 has been updated by gregprice (Greg Price).

trans (Thomas Sawyer) wrote:

I believe a great deal of additional speed could be gained by optimizing
#require_relative (and making use of it, of course).

I'd like to keep this thread focused on speeding up existing code which
uses #require. If you're interested in making changes to #require_relative
(which is a function that is not involved in the startup time of most
existing libraries or applications), a separate issue would be the best
place to discuss that.

I would agree. Also, I consider require_relative an antipattern, so I hope
people don't start insisting that libraries use it in order to speed things
up instead of just speeding up require.

From what I understand, #require_relative ends up calling ordinary
#require code, which is inefficient since #require_relative already
knows which path to find the script, so why have require search
the $LOAD_PATH for it?

Note that most of the time #require spends is not in searching $LOAD_PATH
-- it's in deciding whether the requested library needs to be loaded at
all, or refers to a file that has already been loaded. That's what this
patch series addresses. (Even the expanded $LOAD_PATH, which this Patch 4
caches, is used in making that decision before it is used to search to find
the script.)

Greg


Bug #7158: require is slow in its bookkeeping; can make Rails startup 2.2x
faster
https://bugs.ruby-lang.org/issues/7158#change-30820

Author: gregprice (Greg Price)
Status: Open
Priority: Normal
Assignee:
Category: core
Target version:
ruby -v: ruby 1.9.3p194 (2012-04-20 revision 35409) [i686-linux]

=begin
Starting a large application in Ruby is slow. Most of the startup
time is not spent in the actual work of loading files and running Ruby
code, but in bookkeeping in the 'require' implementation. I've
attached a patch series which makes that bookkeeping much faster.
These patches speed up a large Rails application's startup by 2.2x,
and a pure-'require' benchmark by 3.4x.

These patches fix two ways in which 'require' is slow. Both problems
have been discussed before, but these patches solve the problems with
less code and stricter compatibility than previous patches I've seen.

  • Currently we iterate through $LOADED_FEATURES to see if anything
    matches the newly required feature. Further, each iteration
    iterates in turn through $LOAD_PATH. Xavier Shay spotted this
    problem last year and a series of patches were discussed
    (in Issue #3924) to add a Hash index alongside $LOADED_FEATURES,
    but for 1.9.3 none were merged; Masaya Tarui committed Revision r31875,
    which mitigated the problem. This series adds a Hash index,
    and keeps it up to date even if the user modifies $LOADED_FEATURES.
    This is worth a 40% speedup on one large Rails application,
    and 2.3x on a pure-'require' benchmark.

  • Currently each 'require' call runs through $LOAD_PATH and calls
    rb_file_expand_path() on each element. Yura Sokolov (funny_falcon)
    proposed caching this last December in Issue #5767, but it wasn't
    merged. This series also caches $LOAD_PATH, and keeps the cache up
    to date with a different, less invasive technique. The cache takes
    34 lines of code, and is worth an additional 57% speedup in
    starting a Rails app and a 46% speedup in pure 'require'.

== Staying Compatible

With both the $LOADED_FEATURES index and the $LOAD_PATH cache,

  • we exactly preserve the semantics of the user modifying $LOAD_PATH
    or $LOADED_FEATURES;

  • both $LOAD_PATH and $LOADED_FEATURES remain ordinary Arrays, with
    no singleton methods;

  • we make just one semantic change: each element of $LOAD_PATH and
    $LOADED_FEATURES is made into a frozen string. This doesn't limit
    the flexibility Ruby offers to the programmer in any way; to alter
    an element of either array, one simply reassigns it to the new
    value. Further, normal path-munging code which only adds and
    removes elements shouldn't have to change at all.

These patches use the following technique to keep the cache and the
index up to date without modifying the methods of $LOADED_FEATURES or
$LOAD_PATH: we take advantage of the sharing mechanism in the Array
implementation to detect, in O(1) time, whether either array has been
mutated. We cause $LOADED_FEATURES to be shared with an Array we keep
privately in load.c; if anything modifies it, it will break the
sharing and we will know to rebuild the index. Similarly for
$LOAD_PATH.

== Benchmarks

First, on my company's Rails application, where $LOAD_PATH.size is 207
and $LOADED_FEATURES.size is 2126. I measured the time taken by
'bundle exec rails runner "p 1"'.

. Rails startup time,
version best of 5 speedup
v1_9_3_194 12.197s
v1_9_3_194+index 8.688s 1.40x
v1_9_3_194+index+cache 5.538s 2.20x

And now isolating the performance of 'require', by requiring
16000 empty files.

version time, best of 5 speedup
trunk (at r36920) 10.115s
trunk+index 4.363s 2.32x
trunk+index+cache 2.984s 3.39x

(The timings for the Rails application are based on the latest release
rather than trunk because a number of gems failed to compile against
trunk for me.)

== The Patches

I've attached four patches:

(1) Patch 1 changes no behavior at all. It adds comments and
simplifies a bit of code to help in understanding why patch 3 is
correct. 42 lines, most of them comments.

(2) Patch 2 adds a function to array.c which will help us tell when
$LOAD_PATH or $LOADED_FEATURES has been modified. 17 lines.

(3) Patch 3 adds the $LOADED_FEATURES index. 150 lines.

(4) Patch 4 adds the $LOAD_PATH cache. 34 lines.

Reviews and comments welcome -- I'm sure there's something I could do
to make these patches better. I hope we can get some form of them
into trunk before the next release. My life has been happier since I
switched to this version because commands in my Rails application all
run faster now, and I want every Ruby programmer to be happier in the
same way with 2.0 and ideally with 1.9.4.

=end

--
http://bugs.ruby-lang.org/

Updated by usa (Usaku NAKAMURA) almost 12 years ago

  • Status changed from Open to Assigned
  • Assignee set to ko1 (Koichi Sasada)

Updated by ko1 (Koichi Sasada) almost 12 years ago

I heard that taru-san's analysis of for your patch (1) to (4).

(1) to (3) are good (no compatibility issue). "freezing" strings are acceptable (no impact, I think). I want to introduce them.

However, (4) has compatibility issue because the patch assume that the current working directory does not changed. Please correct me if our understanding is wrong.

Updated by funny_falcon (Yura Sokolov) almost 12 years ago

Checking for "shares same array" is a great catch :) I like it.

But my patch #5767 for $LOAD_PATH so "invasive" cause there is markable benefits from overriding "$:.push", so that whole $LOAD_PATH will not be re-scaped after each $: << gem_path .
Also, which way do you handle changing of current directory? Cache should be invalidated on Dir.chdir .
Also, which way do you handle changing filesystem encoding? One test from 'make check' fails, if you do not invalidate cache on changing filesystem encoding.

load_features_index will be recalculated after each require for new feature also? It is very strange, that you patch for $LOADED_FEATURES gains so much speedup. My patch #5427 gains only 6%, not 40%, and I doubt that hash lookup on ~2000 elements so much faster than crafted binary search (As well as you, I compare only "basename" of a feature). And I have no additional objects: no hashes, no arrays, no substrings, only $LOADED_FEATURES itself sorted in a rignt way.
... but maybe I'm mistacken.

About numbers: speedup 1.40x - is 29% speedup, and 2.20x is 55% speedup. My patches combined gains same 55% speedup as yours.

Can you test with "cache" patch but without "index" patch also?

Updated by h.shirosaki (Hiroshi Shirosaki) almost 12 years ago

Indeed in Greg's patch (4) load path cache remains when current working directory was changed. But so far I cannot find any test cases which fail with Greg's patch.
In rb_find_file_safe() in file.c load path is expanded without cache, so cached load path in rb_feature_p() might be recovered by rb_find_file_safe().

I tried to create patches for current working directory change using same approach as Yura Sokolov's #5767.

https://gist.github.com/3964679

These patches are applied on the Greg's patches.

Patch 1 expands relative load path always. (If load path has many relative path, require slows down a lot.)
Patch 2 caches expanded relative load path and invalidate cache if current working directory is changed.
Patch 3 uses cached expanded load path in rb_find_file_ext_safe() and rb_find_file_safe().

From my benchmark, performance is almost same as Greg's. Benchmark is in the gist.
make test, test-all, test-rubyspec seem fine.

Updated by ko1 (Koichi Sasada) almost 12 years ago

Hi,

(2012/10/27 23:40), h.shirosaki (Hiroshi Shirosaki) wrote:

Indeed in Greg's patch (4) load path cache remains when current working directory was changed. But so far I cannot find any test cases which fail with Greg's patch.
In rb_find_file_safe() in file.c load path is expanded without cache, so cached load path in rb_feature_p() might be recovered by rb_find_file_safe().

I tried to create patches for current working directory change using same approach as Yura Sokolov's #5767.

https://gist.github.com/3964679

These patches are applied on the Greg's patches.

Patch 1 expands relative load path always. (If load path has many relative path, require slows down a lot.)
Patch 2 caches expanded relative load path and invalidate cache if current working directory is changed.
Patch 3 uses cached expanded load path in rb_find_file_ext_safe() and rb_find_file_safe().

From my benchmark, performance is almost same as Greg's. Benchmark is in the gist.
make test, test-all, test-rubyspec seem fine.

Okay, there are patch of Gregs (1) to (4) and yours (1) to (3). Let's
say (G1) to (G4) and (S1) to (S3) respectively.

Could you give us the explanation of compatibility impact?
I doubt that current test cases have enough tests.

BTW, on your benchmark results, I can't find any performance improvement
with (S1) to (S3). What's the purpose?

--
// SASADA Koichi at atdot dot net

Updated by h.shirosaki (Hiroshi Shirosaki) almost 12 years ago

ko1 (Koichi Sasada) wrote:

Could you give us the explanation of compatibility impact?
I doubt that current test cases have enough tests.

BTW, on your benchmark results, I can't find any performance improvement
with (S1) to (S3). What's the purpose?

Hi,

I think (S1) is more compatible trunk. It expands only relative load path always. Trunk expands all load path always.
If load path has many relative load path, (G1) to (G4) + (S1) performance is more close to (G1) to (G3) since relative paths are no cache.
I didn't show benchmark result of that.

(S2) is similar approach as Yura's #5767. It caches also relative load path. If current working directory was changed, it invalidates load path cache when getting expanded load path.
I can't find test cases which fail so far. Yura's patch is widely used (I'm also using it with 1.9.3) without issues. I think it's enough compatible.

I thought (S3) would give performance improvement in theory since it reduces expand costs to use expanded path. But my benchmark shows very small improvement. Yura's #5767 also uses this. Another benchmark might show more meaningful improvement result. If (S2) is enough compatible, (S3) would be also enough compatible because using same expanded load path cache.

Updated by ko1 (Koichi Sasada) almost 12 years ago

(2012/10/28 7:10), h.shirosaki (Hiroshi Shirosaki) wrote:

I think (S1) is more compatible trunk. It expands only relative load path always. Trunk expands all load path always.
If load path has many relative load path, (G1) to (G4) + (S1) performance is more close to (G1) to (G3) since relative paths are no cache.
I didn't show benchmark result of that.

(S2) is similar approach as Yura's #5767. It caches also relative load path. If current working directory was changed, it invalidates load path cache when getting expanded load path.
I can't find test cases which fail so far. Yura's patch is widely used (I'm also using it with 1.9.3) without issues. I think it's enough compatible.

I thought (S3) would give performance improvement in theory since it reduces expand costs to use expanded path. But my benchmark shows very small improvement. Yura's #5767 also uses this. Another benchmark might show more meaningful improvement result. If (S2) is enough compatible, (S3) would be also enough compatible because using same expanded load path cache.

Thanks a lot for your explanation.

I'm very happy if you describe the "changed behavior/assumption".
Your explanation is "how to do it".
But we need to know "what changed".
Sorry if I misunderstood your explanation.

Or no behavior changed except "freeze $LOAD_PATH" by (G3)?
(Your words "enough compatible" meant that?)

Thanks,
Koichi

--
// SASADA Koichi at atdot dot net

Updated by h.shirosaki (Hiroshi Shirosaki) almost 12 years ago

On Sun, Oct 28, 2012 at 7:36 AM, SASADA Koichi wrote:

I'm very happy if you describe the "changed behavior/assumption".
Your explanation is "how to do it".
But we need to know "what changed".
Sorry if I misunderstood your explanation.

Or no behavior changed except "freeze $LOAD_PATH" by (G3)?
(Your words "enough compatible" meant that?)

Yes. I think no behavior changed except "freeze $LOAD_PATH" by (G3)
if I didn't overlook anything.

--
Hiroshi Shirosaki

Updated by ko1 (Koichi Sasada) almost 12 years ago

(2012/10/28 8:03), Hiroshi Shirosaki wrote:

Yes. I think no behavior changed except "freeze $LOAD_PATH" by (G3)
if I didn't overlook anything.

Thank you for your clarification.

Maybe Tarui-san will review them.
Please wait for it.

Thanks,
Koichi

--
// SASADA Koichi at atdot dot net

Updated by gregprice (Greg Price) almost 12 years ago

Sasada-san, thank you for the review. You're right, patch (4) should (and doesn't) invalidate the cache when the working directory changes. I believe Yura is right that it should also invalidate the cache when the filesystem encoding changes.

Yura, thanks for your existing patches and your detailed comments. At first look, I suspect that the reason a sorted-list approach to $LOADED_FEATURES was not a major speedup is that insertion is slow -- it takes time O(n), on the same order as the lookup takes in trunk. So it should still be helpful when one does many 'require' calls on the same files, but not as much as if insertion is also fast. With the hash-table-based index in (3), insertion takes time O(p), where p ~= 10 is the number of path components in the library filenames, and lookup is O(1) except in pathological cases.

I'm not too worried about recomputing the $LOAD_PATH cache from scratch after it's modified. For one thing, we do so only in an actual 'require' call, and it's only as expensive as the computation we do now in trunk on every 'require' call, so there's no case in which it should cause a regression. I suspect it's also the case that in common usage (with e.g. Bundler) most $LOAD_PATH modifications happen all at once with no intervening 'require' calls, and in that case we will only recompute it once for all of those changes. (But I haven't checked that that's the case.) So I think the trade-off of recomputing from scratch with less code complexity is a good one.

Shirosaki-san, thanks for your additional patches. I'm reading them now.

Updated by funny_falcon (Yura Sokolov) almost 12 years ago

Which way insertion into sorted array takes same O(n) time? Do you mean
cost of memcpy after finding position for insert? Despite memcpy (which I
be leave cheap on this amounts) cost of insertion is O(log n).
Can you show result with caching LOAD_PATH and without indexing
LOADED_FEATURES? I believe that indexing will add small improvement after
caching.

28.10.2012 5:30 пользователь "gregprice (Greg Price)"
написал:

Issue #7158 has been updated by gregprice (Greg Price).

Sasada-san, thank you for the review. You're right, patch (4) should
(and doesn't) invalidate the cache when the working directory changes. I
believe Yura is right that it should also invalidate the cache when the
filesystem encoding changes.

Yura, thanks for your existing patches and your detailed comments. At
first look, I suspect that the reason a sorted-list approach to
$LOADED_FEATURES was not a major speedup is that insertion is slow -- it
takes time O(n), on the same order as the lookup takes in trunk. So it
should still be helpful when one does many 'require' calls on the same
files, but not as much as if insertion is also fast. With the
hash-table-based index in (3), insertion takes time O(p), where p ~= 10 is
the number of path components in the library filenames, and lookup is O(1)
except in pathological cases.

I'm not too worried about recomputing the $LOAD_PATH cache from scratch
after it's modified. For one thing, we do so only in an actual 'require'
call, and it's only as expensive as the computation we do now in trunk on
every 'require' call, so there's no case in which it should cause a
regression. I suspect it's also the case that in common usage (with e.g.
Bundler) most $LOAD_PATH modifications happen all at once with no
intervening 'require' calls, and in that case we will only recompute it
once for all of those changes. (But I haven't checked that that's the
case.) So I think the trade-off of recomputing from scratch with less code
complexity is a good one.

Shirosaki-san, thanks for your additional patches. I'm reading them now.


Bug #7158: require is slow in its bookkeeping; can make Rails startup
2.2x faster
https://bugs.ruby-lang.org/issues/7158#change-31844

Author: gregprice (Greg Price)
Status: Assigned
Priority: Normal
Assignee: ko1 (Koichi Sasada)
Category: core
Target version:
ruby -v: ruby 1.9.3p194 (2012-04-20 revision 35409) [i686-linux]

=begin
Starting a large application in Ruby is slow. Most of the startup
time is not spent in the actual work of loading files and running Ruby
code, but in bookkeeping in the 'require' implementation. I've
attached a patch series which makes that bookkeeping much faster.
These patches speed up a large Rails application's startup by 2.2x,
and a pure-'require' benchmark by 3.4x.

These patches fix two ways in which 'require' is slow. Both problems
have been discussed before, but these patches solve the problems with
less code and stricter compatibility than previous patches I've seen.

  • Currently we iterate through $LOADED_FEATURES to see if anything
    matches the newly required feature. Further, each iteration
    iterates in turn through $LOAD_PATH. Xavier Shay spotted this
    problem last year and a series of patches were discussed
    (in Issue #3924) to add a Hash index alongside $LOADED_FEATURES,
    but for 1.9.3 none were merged; Masaya Tarui committed Revision r31875,
    which mitigated the problem. This series adds a Hash index,
    and keeps it up to date even if the user modifies $LOADED_FEATURES.
    This is worth a 40% speedup on one large Rails application,
    and 2.3x on a pure-'require' benchmark.

  • Currently each 'require' call runs through $LOAD_PATH and calls
    rb_file_expand_path() on each element. Yura Sokolov (funny_falcon)
    proposed caching this last December in Issue #5767, but it wasn't
    merged. This series also caches $LOAD_PATH, and keeps the cache up
    to date with a different, less invasive technique. The cache takes
    34 lines of code, and is worth an additional 57% speedup in
    starting a Rails app and a 46% speedup in pure 'require'.

== Staying Compatible

With both the $LOADED_FEATURES index and the $LOAD_PATH cache,

  • we exactly preserve the semantics of the user modifying $LOAD_PATH
    or $LOADED_FEATURES;

  • both $LOAD_PATH and $LOADED_FEATURES remain ordinary Arrays, with
    no singleton methods;

  • we make just one semantic change: each element of $LOAD_PATH and
    $LOADED_FEATURES is made into a frozen string. This doesn't limit
    the flexibility Ruby offers to the programmer in any way; to alter
    an element of either array, one simply reassigns it to the new
    value. Further, normal path-munging code which only adds and
    removes elements shouldn't have to change at all.

These patches use the following technique to keep the cache and the
index up to date without modifying the methods of $LOADED_FEATURES or
$LOAD_PATH: we take advantage of the sharing mechanism in the Array
implementation to detect, in O(1) time, whether either array has been
mutated. We cause $LOADED_FEATURES to be shared with an Array we keep
privately in load.c; if anything modifies it, it will break the
sharing and we will know to rebuild the index. Similarly for
$LOAD_PATH.

== Benchmarks

First, on my company's Rails application, where $LOAD_PATH.size is 207
and $LOADED_FEATURES.size is 2126. I measured the time taken by
'bundle exec rails runner "p 1"'.

. Rails startup time,
version best of 5 speedup
v1_9_3_194 12.197s
v1_9_3_194+index 8.688s 1.40x
v1_9_3_194+index+cache 5.538s 2.20x

And now isolating the performance of 'require', by requiring
16000 empty files.

version time, best of 5 speedup
trunk (at r36920) 10.115s
trunk+index 4.363s 2.32x
trunk+index+cache 2.984s 3.39x

(The timings for the Rails application are based on the latest release
rather than trunk because a number of gems failed to compile against
trunk for me.)

== The Patches

I've attached four patches:

(1) Patch 1 changes no behavior at all. It adds comments and
simplifies a bit of code to help in understanding why patch 3 is
correct. 42 lines, most of them comments.

(2) Patch 2 adds a function to array.c which will help us tell when
$LOAD_PATH or $LOADED_FEATURES has been modified. 17 lines.

(3) Patch 3 adds the $LOADED_FEATURES index. 150 lines.

(4) Patch 4 adds the $LOAD_PATH cache. 34 lines.

Reviews and comments welcome -- I'm sure there's something I could do
to make these patches better. I hope we can get some form of them
into trunk before the next release. My life has been happier since I
switched to this version because commands in my Rails application all
run faster now, and I want every Ruby programmer to be happier in the
same way with 2.0 and ideally with 1.9.4.

=end

--
http://bugs.ruby-lang.org/

Updated by tarui (Masaya Tarui) almost 12 years ago

Hi,

I re-checked behavior, and found some compatibility issues.

first,
rb_file_expand_path_fast depends on not only current working directory but also home directories too.
it uses ENV["HOME"] and getpwnam (normally).

second,
if non String Object is included in LOAD_PATH, it will be replaced to result of it's to_str.

we have some choice.

  1. support both ENV and getpwnam.
  2. support only ENV.
  3. no support home dirs change.
  4. accept (G1)~(G3) and reject others.
  5. reject this ticket :-(

what do you think about this issue?
in my mind, 4 > 1 = 2 >>>>>>> 5 > 3

Updated by h.shirosaki (Hiroshi Shirosaki) almost 12 years ago

Thank you for review.

  1. support both ENV and getpwnam.

My intention is 1) since I would like see (G4) merged.

Additional patch.
https://gist.github.com/3964679#file_0004_fix_compatibility_of_require.patch

No cache for '~' or '~xxxx' and not replace load path.
If $LOAD_PATH contains relative path or '~' or '~user', require would slow down.

I tried another benchmark. It uses many relative load path instead of expanded load path.
In this benchmark, (G1) to (G4) slows down a lot. But I think usually load path contains absolute path.
https://gist.github.com/3964679#file_0000_bench_requrie_empty_relative.rb

relative load path

Greg's patch (G1) to (G4)

ruby 2.0.0dev (2012-10-27 trunk 37341) [x86_64-darwin12.2.0]
user system total real
0.810000 1.350000 2.160000 ( 2.171822)
LOAD_PATH SIZE = 209
LOADED_FEATURES SIZE = 1010

Greg's patch (G1) to (G4) + my patch (S1) to (S4)

ruby 2.0.0dev (2012-10-27 trunk 37341) [x86_64-darwin12.2.0]
user system total real
0.400000 0.360000 0.760000 ( 0.755793)

Updated by ko1 (Koichi Sasada) almost 12 years ago

  • Assignee changed from ko1 (Koichi Sasada) to tarui (Masaya Tarui)
  • Priority changed from Normal to 5
  • Target version set to 2.0.0

Updated by tarui (Masaya Tarui) almost 12 years ago

  • Assignee changed from tarui (Masaya Tarui) to h.shirosaki (Hiroshi Shirosaki)

shirosaki-san,
Thank you for fast patch.

it seems almost good.
one point, freezed NonStringObject can return different result by redefining to_str method in class scope.
if expand it everytime as same as home dirs, It is not necessary to freeze it.
And rb_ary_shared_with_p mechanism will continue operating well about StringObject.

Would you make the above-mentioned change?
In either case, please commit all patches.

Best regards.

Updated by h.shirosaki (Hiroshi Shirosaki) almost 12 years ago

tarui (Masaya Tarui) wrote:

it seems almost good.
one point, freezed NonStringObject can return different result by redefining to_str method in class scope.
if expand it everytime as same as home dirs, It is not necessary to freeze it.
And rb_ary_shared_with_p mechanism will continue operating well about StringObject.

Would you make the above-mentioned change?
In either case, please commit all patches.

Thank you. I agreed.
I will change to expand NonStringObject(TYPE(v) != T_STRING) every time and not freeze it.

Updated by h.shirosaki (Hiroshi Shirosaki) almost 12 years ago

I noticed one compatibility issue.

rb_file_expand_path_fast calls to_path method before to_str, but to_path was ignored.

I fixed to_path is called before to_str. If StringObject has to_path, the object are expended every time because frozen StringObject's to_path can return different result.

NonStringObject is not frozen and is expended every times.

I extracted functions from rb_get_path_check() and call it in load.c for to_path and to_str.

tarui-san, what do you think?

I rebased against latest trunk and pushed to github.

https://github.com/shirosaki/ruby/commits/optimize_require

latest patch:
https://github.com/shirosaki/ruby/commit/480fcd43ce844fef456b958352c9b8e605a6ff0e

diff againt trunk:
https://github.com/shirosaki/ruby/compare/trunk...optimize_require

Updated by tenderlovemaking (Aaron Patterson) almost 12 years ago

Just wanted to update with some test results. I tried Hiroshi's patch along with my DTrace patches to time require searches against a small Rails application. Without GC running, before Hiroshi's patch, require searching took about 30% of the app require time. After applying the patch, I saw it was only 16%.

Here is the application I used:

https://github.com/tenderlove/lolwut

This is the test I ran:

$ ruby -Ilib:. -rnotrace -rconfig/environment -e 0

"notrace.rb" just contains "GC.disable". This test loads bundler and the rails environment, then exits.

Here is the DTrace script I used:

https://gist.github.com/3997732#file_find_require.d

CSV of the times (left column is the current time, center column is the time it took for searching, right column is the file name),

trunk: https://gist.github.com/3997732#file_trunk_req.csv
with patch: https://gist.github.com/3997732#file_fast_req.csv

Here are graphs of the data. X is the file number required (1 is the first file, 2 is the second, and so on), Y is the time it took to search in nanoseconds. Black is trunk, green is with this patch:

http://i.imgur.com/22hf4.png

Another graph, but the times are sorted. Black is trunk, red is with the patch:

http://i.imgur.com/yvAWn.png

Updated by tarui (Masaya Tarui) almost 12 years ago

Thank you for the careful work.

Although I saw patch, compatibility is maintained now or it cannot
have confidence.
ex) OK even if String#to_path is defined after constructing expand_path?

I think reasonable to change specification to stop calling to_path
when it is T_STRING.

shirosaki-san, sasada-san, endoh-san, what do you think?

Best regards,

Updated by ko1 (Koichi Sasada) almost 12 years ago

(2012/11/01 21:10), Masaya TARUI wrote:

shirosaki-san, sasada-san, endoh-san, what do you think?

I don't have any idea on it.

Matz:
Please decide it.

--
// SASADA Koichi at atdot dot net

Updated by Anonymous almost 12 years ago

Hi,

In message "Re: [ruby-core:48738] Re: [ruby-trunk - Bug #7158] require is slow in its bookkeeping; can make Rails startup 2.2x faster"
on Fri, 2 Nov 2012 14:42:16 +0900, SASADA Koichi writes:
|
|(2012/11/01 21:10), Masaya TARUI wrote:
|> shirosaki-san, sasada-san, endoh-san, what do you think?
|
|I don't have any idea on it.
|
|Matz:
|Please decide it.

Then, go ahead and see if we will face any problem.

						matz.

Updated by h.shirosaki (Hiroshi Shirosaki) almost 12 years ago

tarui (Masaya Tarui) wrote:

I think reasonable to change specification to stop calling to_path
when it is T_STRING.

I agree with that. to_path of T_STRING may be less-useful.
Then I'm going to stop calling to_path of T_STRING in rb_get_path_check() for consistency.

Here is the change.
https://github.com/shirosaki/ruby/commit/bc7652840d6ad03db4fb713c4142e1464a695e0c

test, test-all, rubyspec results look fine. So I'll commit if there are no other opinions.

Actions #27

Updated by Anonymous almost 12 years ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r37477.
Greg, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


Clarify and explain loaded_feature_path and rb_feature_p

  • load.c (loaded_feature_path): clarify and briefly comment
    function. These clarifications have no effect on the behavior
    of the function.

  • load.c (rb_feature_p): explain the search loop. Especially
    useful because the logic is complicated as described in the
    second paragraph.
    Patch by Greg Price.
    [ruby-core:47970] [Bug #7158]

Updated by lzap (Lukas Zapletal) over 11 years ago

Hello,

I am trying Ruby 2.0 RC1 which should contain this fix, but I think I still see many stat and open syscalls with ENOENT result. The number is slightly lower.

$ ruby -v
ruby 1.9.3p327 (2012-11-10 revision 37606) [x86_64-linux]
$ strace irb < /dev/null 2>&1 | grep ENOENT | wc -l
5195

$ ruby -v
ruby 2.0.0dev (2013-01-07 trunk 38733) [x86_64-linux]
$ strace irb < /dev/null 2>&1 | grep ENOENT | wc -l
5112

Here is the full strace output with all these ENOENT lines for 2.0 RC1. I guess something is wrong with locale trying to find all rb/so files with my cs_CZ and cs locales first, which essentialy triple amount of requires. I can confirm the same behavior for 1.9.3 ruby.

Edit: Forgot the add the link - http://sprunge.us/gffS

Updated by lzap (Lukas Zapletal) over 11 years ago

Ok the above looks like bug in lib/irb/locale.rb:

LC_ALL=C strace irb < /dev/null 2>&1 | grep ENOENT | wc -l
293

It not only triples amount of stat/open calls, it is like 17.5x faster. I am filling new bugreport for this.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0