Project

General

Profile

Bug #10015

Performance regression in Dir#[]

Added by tenderlovemaking (Aaron Patterson) over 3 years ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.3.0 [x86_64-darwin13.0]
[ruby-core:63591]

Description

r44802 seems to have introduced a performance regression in Dir#[].

Here is the test program:

require 'benchmark'

puts Benchmark.realtime {
  glob = "minitest/*_plugin.rb{,.rb,.bundle}"
  $LOAD_PATH.map { |load_path|
    Dir["#{File.expand_path glob, load_path}"]
  }.flatten.select { |file| File.file? file.untaint }
}

Here is the test time for me:

$ ruby -v test.rb
ruby 2.2.0dev (2014-02-04 trunk 44801) [x86_64-darwin13.0]
0.000341
$ ruby -v test.rb
ruby 2.2.0dev (2014-02-04 trunk 44802) [x86_64-darwin13.0]
0.009333

r44801 seems much faster than r44802.


Related issues

Related to Ruby trunk - Feature #5994: Dir.glob without wildcards returns pattern, not filenameClosed2012-02-10
Related to Ruby trunk - Bug #13167: Dir.glob is 25x slower since Ruby 2.2Open

Associated revisions

Revision 48973
Added by nobu (Nobuyoshi Nakada) almost 3 years ago

ChangeLog: fix ref of r48972. [Bug #10015]

Revision 48973
Added by nobu (Nobuyoshi Nakada) almost 3 years ago

ChangeLog: fix ref of r48972. [Bug #10015]

Revision 48973
Added by nobu (Nobuyoshi Nakada) almost 3 years ago

ChangeLog: fix ref of r48972. [Bug #10015]

Revision 48975
Added by nobu (Nobuyoshi Nakada) almost 3 years ago

dir.c: shortcut for case-insensitive name

  • dir.c (glob_helper): shortcut for case-insensitive name by stopping reading after a matching name found. [Bug #10015]

Revision 48975
Added by nobu (Nobuyoshi Nakada) almost 3 years ago

dir.c: shortcut for case-insensitive name

  • dir.c (glob_helper): shortcut for case-insensitive name by stopping reading after a matching name found. [Bug #10015]

Revision 48975
Added by nobu (Nobuyoshi Nakada) almost 3 years ago

dir.c: shortcut for case-insensitive name

  • dir.c (glob_helper): shortcut for case-insensitive name by stopping reading after a matching name found. [Bug #10015]

Revision 48975
Added by nobu (Nobuyoshi Nakada) almost 3 years ago

dir.c: shortcut for case-insensitive name

  • dir.c (glob_helper): shortcut for case-insensitive name by stopping reading after a matching name found. [Bug #10015]

Revision 48990
Added by nobu (Nobuyoshi Nakada) almost 3 years ago

dir.c: replace_real_basename

  • dir.c (replace_real_basename): get the real name and replace the base name with it by getattrlist(2) if available. suggested by Matthew Draper at . [Bug #10015]
  • dir.c (glob_helper): get the real name of the whole path, not only the last name.

Revision 48990
Added by nobu (Nobuyoshi Nakada) almost 3 years ago

dir.c: replace_real_basename

  • dir.c (replace_real_basename): get the real name and replace the base name with it by getattrlist(2) if available. suggested by Matthew Draper at . [Bug #10015]
  • dir.c (glob_helper): get the real name of the whole path, not only the last name.

Revision 48990
Added by nobu (Nobuyoshi Nakada) almost 3 years ago

dir.c: replace_real_basename

  • dir.c (replace_real_basename): get the real name and replace the base name with it by getattrlist(2) if available. suggested by Matthew Draper at . [Bug #10015]
  • dir.c (glob_helper): get the real name of the whole path, not only the last name.

Revision 51106
Added by nobu (Nobuyoshi Nakada) over 2 years ago

dir.c: update path type

  • dir.c (replace_real_basename): update path type by the target attributes if possible, to improve the performance. [Bug #10015]

Revision 51106
Added by nobu (Nobuyoshi Nakada) over 2 years ago

dir.c: update path type

  • dir.c (replace_real_basename): update path type by the target attributes if possible, to improve the performance. [Bug #10015]

Revision 51106
Added by nobu (Nobuyoshi Nakada) over 2 years ago

dir.c: update path type

  • dir.c (replace_real_basename): update path type by the target attributes if possible, to improve the performance. [Bug #10015]

Revision 51109
Added by nobu (Nobuyoshi Nakada) over 2 years ago

dir.c: show warnings

  • dir.c (replace_real_basename): show warnings at errors. [Bug #10015]

Revision 51109
Added by nobu (Nobuyoshi Nakada) over 2 years ago

dir.c: show warnings

  • dir.c (replace_real_basename): show warnings at errors. [Bug #10015]

Revision 51109
Added by nobu (Nobuyoshi Nakada) over 2 years ago

dir.c: show warnings

  • dir.c (replace_real_basename): show warnings at errors. [Bug #10015]

Revision 51111
Added by nobu (Nobuyoshi Nakada) over 2 years ago

dir.c: set errno

  • dir.c (replace_real_basename): Win32 API does not set errno, get the last error by GetLastError() and map to errno. [Bug #10015]

Revision 51111
Added by nobu (Nobuyoshi Nakada) over 2 years ago

dir.c: set errno

  • dir.c (replace_real_basename): Win32 API does not set errno, get the last error by GetLastError() and map to errno. [Bug #10015]

Revision 51111
Added by nobu (Nobuyoshi Nakada) over 2 years ago

dir.c: set errno

  • dir.c (replace_real_basename): Win32 API does not set errno, get the last error by GetLastError() and map to errno. [Bug #10015]

History

#1 [ruby-core:63599] Updated by nobu (Nobuyoshi Nakada) over 3 years ago

  • Description updated (diff)

Yes.
Or use case-sensitive platforms.

#2 [ruby-core:63606] Updated by tenderlovemaking (Aaron Patterson) over 3 years ago

nobu (Nobuyoshi Nakada) this change causes a significant increase in Rails boot time on my system.

My application pays a 120ms price for this change. hsbt (Hiroshi SHIBATA) will pay 520ms:

http://twitter.com/hsbt/status/486378979138367488

Is there a way we can keep the performance on case-insensitive file systems? If there really is no work-around, then we need to make some changes in RubyGems, bundler, minitest, and Rails before trunk is released because many people will complain about slow boot times. :(

#3 [ruby-core:63607] Updated by nobu (Nobuyoshi Nakada) over 3 years ago

It might affect unintentionally.
I'll check the bottleneck.

#4 [ruby-core:63608] Updated by tenderlovemaking (Aaron Patterson) over 3 years ago

Thanks nobu, I really appreciate it! If it can't be fixed, I have ideas for avoiding the calls to Dir#[], but it will take time to get merged and released.

#5 [ruby-core:63610] Updated by normalperson (Eric Wong) over 3 years ago

Btw, you're already paying significant costs for a case-insensitive FS
(not just Ruby, but things like git, too).

#6 [ruby-core:67102] Updated by nobu (Nobuyoshi Nakada) almost 3 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

Applied in changeset r48973.


ChangeLog: fix ref of r48972. [Bug #10015]

#8 [ruby-core:67116] Updated by matthewd (Matthew Draper) almost 3 years ago

I don't know what I'm talking about, but this seems like it could use fcntl(2) + F_GETPATH, or getattrlist(2) + ATTR_CMN_NAME.. or something along those lines -- we shouldn't need the loop+fnmatch at all, should we?

#9 [ruby-core:67119] Updated by nobu (Nobuyoshi Nakada) almost 3 years ago

Thanks, that's it.

#10 Updated by Intrepidd (Adrien Siami) over 2 years ago

I've noticed a significant slowdown of a rails app when using ruby 2.2.1 rather than 2.1.5.

A slow page was running in 16 seconds with 2.2.1 where it was running in less than 3 seconds with 2.1.5

I tested this very simple benchmark :

require 'benchmark'

puts Benchmark.realtime {
  800000.times do
    Dir['/tmp']
  end
}

it runs in 3.1289695860032225 secs with 2.2.1 and 2.403 secs with 2.1.5

Is this issue still present ?

#11 Updated by nobu (Nobuyoshi Nakada) over 2 years ago

  • Status changed from Closed to Feedback

Adrien Siami wrote:

A slow page was running in 16 seconds with 2.2.1 where it was running in less than 3 seconds with 2.1.5
it runs in 3.1289695860032225 secs with 2.2.1 and 2.403 secs with 2.1.5

What's your platform and filesystem?
As the former seems pretty bigger than the later, they may be caused by different reasons.

#12 Updated by Intrepidd (Adrien Siami) over 2 years ago

Mac OS X Yosemite with the standard filesystem, I believe HFS+ Insensitive

#13 [ruby-core:69492] Updated by mkarklins (Magnuss Karklins) over 2 years ago

Experiencing similar results with Adrien's benchmark:
With ruby-2.2.2: 2.9347980790189467
WIth ruby-2.1.5: 1.848318

OS X Yosemite 10.10.3, Journaled HFS+ file system, case insensitive.

#14 [ruby-core:69832] Updated by daniel-rikowski (Daniel Rikowski) over 2 years ago

I ran Adrien's benchmark on Windows:

Besides showing a similar performance regression it looks like Dir[] is dramatically slower on Windows:

ruby-2.2.2p95: 161.4
ruby-2.1.6p336: 31.8

(Windows 8.1, SSD, NTFS)

I guess that is because of the GetFileAttributes vs. stat performance difference.

Using Process Monitor I also noticed that in 2.2.2 the (expensive) GetFileAttributesEx function is performed exactly twice as often as in 2.1.6, so part of the problems seems to be repeated (unnecessary?) calls to GetFileAttributes / stat.
(Judging from runtime behaviour only)

#15 Updated by nobu (Nobuyoshi Nakada) over 2 years ago

  • Status changed from Feedback to Closed

Applied in changeset r51106.


dir.c: update path type

  • dir.c (replace_real_basename): update path type by the target attributes if possible, to improve the performance. [Bug #10015]

#16 [ruby-core:74832] Updated by l.ferreira (Layon Ferreira) over 1 year ago

  • ruby -v changed from ruby 2.2.0dev (2014-02-04 trunk 44802) [x86_64-darwin13.0] to ruby 2.3.0 [x86_64-darwin13.0]
  • Backport changed from 2.0.0: UNKNOWN, 2.1: UNKNOWN to 2.0.0: UNKNOWN, 2.3: UNKNOWN

The problem seems to persist Mac OS X, although is better on 2.3.0, is still a lot slower than 2.1.x.
I have run the benchmark

require "benchmark"
time = []
5.times do
  result = Benchmark.realtime { 800000.times do Dir['/tmp'] end }
  puts result
  time << result
end
average = time.inject{ |sum, el| sum + el }.to_f / time.size

puts "average of: #{average}"

On this machine:

ProductName:    Mac OS X
ProductVersion: 10.11.4
BuildVersion:   15E65

File System Personality:  Case-sensitive Journaled HFS+
Type (Bundle):            hfs
Name (User Visible):      Mac OS Extended (Case-sensitive, Journaled)
Journal:                  Journal size 40960 KB at offset 0xe8e000
Owners:                   Enabled

On the all releases after 2.1.7 and got this results:

=====================================================
2.1.7: benchmarking
=====================================================
Using /Users/admin/.rvm/gems/ruby-2.1.7
2.86774
3.012784
2.867388
2.873973
3.127474
average of: 2.9498717999999995
=====================================================
2.1.8: benchmarking
=====================================================
Using /Users/admin/.rvm/gems/ruby-2.1.8
3.095918
2.981513
2.998208
3.142286
3.048603
average of: 3.0533056000000003
=====================================================
2.1.9: benchmarking
=====================================================
Using /Users/admin/.rvm/gems/ruby-2.1.9
3.022291
2.983054
2.931454
2.944659
3.051114
average of: 2.9865144
=====================================================
2.2.0: benchmarking
=====================================================
Using /Users/admin/.rvm/gems/ruby-2.2.0
4.985971766000148
5.047520838999844
5.066739754000082
5.005606599001112
5.0079711159996805
average of: 5.022762014800174
=====================================================
2.2.1: benchmarking
=====================================================
Using /Users/admin/.rvm/gems/ruby-2.2.1
5.0500176069999725
5.031266175999917
5.011832148000394
5.043594308999673
5.0333263849988725
average of: 5.034007324999766
=====================================================
2.2.2: benchmarking
=====================================================
Using /Users/admin/.rvm/gems/ruby-2.2.2
4.983856856000784
4.975439867999739
5.112457198998527
4.985339404000115
4.961809472000823
average of: 5.003780559799997
=====================================================
2.2.3: benchmarking
=====================================================
Using /Users/admin/.rvm/gems/ruby-2.2.3
4.832416973000363
4.970395954000196
5.007825999999113
4.891315167000357
4.913950877999014
average of: 4.923180994399809
=====================================================
2.2.4: benchmarking
=====================================================
Using /Users/admin/.rvm/gems/ruby-2.2.4
5.083285558999705
5.07147571600035
5.153286901999309
5.110161799999332
5.192000732999077
average of: 5.122042141999555
=====================================================
2.3.0: benchmarking
=====================================================
Using /Users/admin/.rvm/gems/ruby-2.3.0
3.785227087999374
3.9434133479990123
4.052104171998508
3.844644614999197
3.88892919600039
average of: 3.9028636837992963

#17 Updated by nobu (Nobuyoshi Nakada) over 1 year ago

  • Related to Feature #5994: Dir.glob without wildcards returns pattern, not filename added

#18 Updated by nobu (Nobuyoshi Nakada) 10 months ago

  • Related to Bug #13167: Dir.glob is 25x slower since Ruby 2.2 added

Also available in: Atom PDF