Project

General

Profile

Bug #1696

Updated by jeremyevans0 (Jeremy Evans) over 4 years ago

=begin 
  
  I've been using a a bit of code which use open-uri and http urls to download files and have found this is unuseably slow. 
  Example: 
 
  require 'open-uri' 
  open( <http uri> ) 
 
  I did some benchmarking and the download of a 157MB file from the local machine was taking over 20 seconds and using 100% where as wget for the same file only took 0.7seconds. 
 
  Digging some more and profiling the code with RubyProf revealed that during the execution of the download over 11,000 threads where being created. This I've tracked down to the net/protocol module and the Net::BufferIO::rbuf_fill method which is using a timeout block to wrap the @io.sysread(1024) call. This is clearly an extremely bad way to do this, totally crippling the performance. 
 
  Below are benchmarks and some profiles of this issue and various quick fixes I tested here 
  == 1.8.6 core == 
        user       system        total          real 
  open 15.664062     6.164062    21.828125 ( 22.810859) 
 
   %self       total       self       wait      child      calls    name 
   14.04        5.27       3.00       2.27       0.00     109426    <Class::Thread>#start (ruby_runtime:0} 
   10.97       13.14       2.34       0.00      10.80     109427    <Module::Timeout>#timeout (/usr/local/lib/ruby/1.8/timeout.rb:52} 
    6.65        2.52       1.42       1.10       0.00     109426    Thread#kill (ruby_runtime:0} 
    5.78       21.36       1.23       0.00      20.12          1    Net::BufferedIO#read (/usr/local/lib/ruby/1.8/net/protocol.rb:79} 
    5.27       14.27       1.12       0.00      13.14     109427    Object#timeout (/usr/local/lib/ruby/1.8/timeout.rb:92} 
    5.12        1.11       1.09       0.02       0.00     109426    IO#sysread (ruby_runtime:0} 
    4.75        2.61       1.02       0.00       1.59     109426    OpenURI::Buffer#<< (/usr/local/lib/ruby/1.8/open-uri.rb:297} 
    3.40        3.62       0.73       0.00       2.89     109426    Net::ReadAdapter#call_block (/usr/local/lib/ruby/1.8/net/protocol.rb:380} 
    2.52        0.54       0.54       0.00       0.00     109435    String#slice! (ruby_runtime:0} 
    2.30        1.33       0.49       0.00       0.84     109415    #<Class:0x801ac6f30>#<< ((eval):1} 
    2.08        0.50       0.45       0.00       0.05     109427    Class#initialize (ruby_runtime:0} 
    2.08        0.98       0.45       0.00       0.54     109435    Net::BufferedIO#rbuf_consume (/usr/local/lib/ruby/1.8/net/protocol.rb:137} 
    2.01        4.05       0.43       0.00       3.62     109426    Net::ReadAdapter#<< (/usr/local/lib/ruby/1.8/net/protocol.rb:371} 
    1.94        0.41       0.41       0.00       0.00     109416    IO#write (ruby_runtime:0} 
    1.72        1.03       0.37       0.00       0.66     109435    Class#new (ruby_runtime:0} 
    1.50        0.32       0.32       0.00       0.00     328286    String#size (ruby_runtime:0} 
    1.32       14.55       0.28       0.00      14.27     109426    Net::BufferedIO#rbuf_fill (/usr/local/lib/ruby/1.8/net/protocol.rb:131} 
    1.21        0.84       0.26       0.00       0.58     109417    Kernel#__send__ (ruby_runtime:0} 
    1.13        0.24       0.24       0.00       0.00     109429    String#<< (ruby_runtime:0} 
    0.99        0.30       0.21       0.00       0.09     109428    Fixnum#== (ruby_runtime:0} 
    0.84        0.18       0.18       0.00       0.00     328289    Fixnum#+ (ruby_runtime:0} 
    0.77        0.16       0.16       0.00       0.00     109427    <Class::Class>#allocate (ruby_runtime:0} 
    0.77        0.58       0.16       0.00       0.41     109415    IO#<< (ruby_runtime:0} 
    0.73        0.28       0.16       0.00       0.12     109475    Hash#[] (ruby_runtime:0} 
  ... 
 
  == 1.8.6 core + no timeout block == 
        user       system        total          real 
  open    1.757812     0.765625     2.523438 (    2.873194) 
 
   %self       total       self       wait      child      calls    name 
   21.62        2.63       2.63       0.00       0.00     109426    IO#sysread (ruby_runtime:0} 
   15.72        1.91       1.91       0.00       0.00     109429    String#<< (ruby_runtime:0} 
   13.98        1.70       1.70       0.00       0.00     109416    IO#write (ruby_runtime:0} 
    9.56       12.16       1.16       0.00      10.99          1    Net::BufferedIO#read (/usr/local/lib/ruby/1.8/net/protocol.rb:79} 
    7.83        3.73       0.95       0.00       2.77     109426    OpenURI::Buffer#<< (/usr/local/lib/ruby/1.8/open-uri.rb:297} 
    5.58        4.68       0.68       0.00       4.00     109426    Net::ReadAdapter#call_block (/usr/local/lib/ruby/1.8/net/protocol.rb:385} 
    3.66        4.99       0.45       0.00       4.55     109426    Net::BufferedIO#rbuf_fill (/usr/local/lib/ruby/1.8/net/protocol.rb:138} 
    3.34        2.38       0.41       0.00       1.97     109415    #<Class:0x801ac82b8>#<< ((eval):1} 
    3.27        0.53       0.40       0.00       0.13     109435    Net::BufferedIO#rbuf_consume (/usr/local/lib/ruby/1.8/net/protocol.rb:142} 
    2.95        5.04       0.36       0.00       4.68     109426    Net::ReadAdapter#<< (/usr/local/lib/ruby/1.8/net/protocol.rb:376} 
    2.25        0.27       0.27       0.00       0.00     328289    Fixnum#+ (ruby_runtime:0} 
    1.54        0.19       0.19       0.00       0.00     328286    String#size (ruby_runtime:0} 
    1.28        0.27       0.16       0.00       0.12     109475    Hash#[] (ruby_runtime:0} 
    1.22        0.15       0.15       0.00       0.00     109431    Module#=== (ruby_runtime:0} 
    1.22        1.85       0.15       0.00       1.70     109415    IO#<< (ruby_runtime:0} 
    1.09        0.13       0.13       0.00       0.00     109435    String#slice! (ruby_runtime:0} 
    0.96        1.97       0.12       0.00       1.85     109417    Kernel#__send__ (ruby_runtime:0} 
    0.96        0.12       0.12       0.00       0.00     109426    String#length (ruby_runtime:0} 
    0.96        0.12       0.12       0.00       0.00     109435    Hash#default (ruby_runtime:0} 
    0.71        0.09       0.09       0.00       0.00     109439    Fixnum#< (ruby_runtime:0} 
  ... 
 
  == 1.8.6 core + no timeout block + 10K sysread == 
        user       system        total          real 
  open    0.570312     1.406250     1.976562 (    2.521614) 
 
   %self       total       self       wait      child      calls    name 
   27.32        0.39       0.39       0.00       0.00      10953    String#slice! (ruby_runtime:0} 
   18.58        0.27       0.27       0.00       0.00      10943    IO#write (ruby_runtime:0} 
   12.02        0.17       0.17       0.00       0.00      10944    IO#sysread (ruby_runtime:0} 
    8.74        1.42       0.12       0.00       1.30          1    Net::BufferedIO#read (/usr/local/lib/ruby/1.8/net/protocol.rb:79} 
    7.10        0.50       0.10       0.00       0.40      10944    OpenURI::Buffer#<< (/usr/local/lib/ruby/1.8/open-uri.rb:297} 
    5.46        0.37       0.08       0.00       0.29      10942    #<Class:0x801ac8b28>#<< ((eval):1} 
    3.28        0.05       0.05       0.00       0.00      32840    String#size (ruby_runtime:0} 
    2.73        0.55       0.04       0.00       0.52      10944    Net::ReadAdapter#call_block (/usr/local/lib/ruby/1.8/net/protocol.rb:385} 
    2.73        0.04       0.04       0.00       0.00      32843    Fixnum#+ (ruby_runtime:0} 
    2.73        0.59       0.04       0.00       0.55      10944    Net::ReadAdapter#<< (/usr/local/lib/ruby/1.8/net/protocol.rb:376} 
    1.64        0.21       0.02       0.00       0.19      10944    Net::BufferedIO#rbuf_fill (/usr/local/lib/ruby/1.8/net/protocol.rb:138} 
    1.64        0.41       0.02       0.00       0.39      10953    Net::BufferedIO#rbuf_consume (/usr/local/lib/ruby/1.8/net/protocol.rb:142} 
    1.09        0.02       0.02       0.00       0.00      10944    String#length (ruby_runtime:0} 
    1.09        0.02       0.02       0.00       0.00      10947    String#<< (ruby_runtime:0} 
    1.09        0.02       0.02       0.00       0.00      10953    Hash#default (ruby_runtime:0} 
    1.09        0.28       0.02       0.00       0.27      10942    IO#<< (ruby_runtime:0} 
  ... 
 
  == 1.8.6 core + no timeout block + 100K sysread == 
        user       system        total          real 
  open    0.390625     0.484375     0.875000 (    1.013516) 
 
   %self       total       self       wait      child      calls    name 
   45.45        0.43       0.43       0.00       0.00       1098    IO#sysread (ruby_runtime:0} 
   33.88        0.32       0.32       0.00       0.00       1097    IO#write (ruby_runtime:0} 
    8.26        0.08       0.08       0.00       0.00       1107    String#slice! (ruby_runtime:0} 
    1.65        0.45       0.02       0.00       0.44       1098    Net::BufferedIO#rbuf_fill (/usr/local/lib/ruby/1.8/net/protocol.rb:138} 
    1.65        0.02       0.02       0.00       0.00       3302    String#size (ruby_runtime:0} 
    1.65        0.35       0.02       0.00       0.34       1098    OpenURI::Buffer#<< (/usr/local/lib/ruby/1.8/open-uri.rb:297} 
    1.65        0.93       0.02       0.00       0.91          1    Net::BufferedIO#read (/usr/local/lib/ruby/1.8/net/protocol.rb:79} 
    0.83        0.01       0.01       0.00       0.00          1    URI::HTTP#initialize (/usr/local/lib/ruby/1.8/uri/http.rb:77} 
    0.83        0.09       0.01       0.00       0.08       1107    Net::BufferedIO#rbuf_consume (/usr/local/lib/ruby/1.8/net/protocol.rb:142} 
    0.83        0.01       0.01       0.00       0.00        357    Module#method_added (ruby_runtime:0} 
    0.83        0.01       0.01       0.00       0.00         37    <Class::Dir>#[] (ruby_runtime:0} 
    0.83        0.01       0.01       0.00       0.00       1101    String#<< (ruby_runtime:0} 
    0.83        0.36       0.01       0.00       0.35       1098    Net::ReadAdapter#call_block (/usr/local/lib/ruby/1.8/net/protocol.rb:385} 
    0.83        0.33       0.01       0.00       0.32       1096    IO#<< (ruby_runtime:0} 
  ... 
 
  == 1.8.6 core + no timeout block + 1M sysread == 
        user       system        total          real 
  open    0.445312     0.335938     0.781250 (    0.913736) 
 
   %self       total       self       wait      child      calls    name 
   29.66        0.27       0.27       0.00       0.00        157    IO#sysread (ruby_runtime:0} 
   28.81        0.27       0.27       0.00       0.00        166    String#slice! (ruby_runtime:0} 
   28.81        0.27       0.27       0.00       0.00        156    IO#write (ruby_runtime:0} 
    7.63        0.07       0.07       0.00       0.00        160    String#<< (ruby_runtime:0} 
    1.69        0.30       0.02       0.00       0.28        157    Net::ReadAdapter#call_block (/usr/local/lib/ruby/1.8/net/protocol.rb:385} 
    0.85        0.01       0.01       0.00       0.00         37    <Class::Dir>#[] (ruby_runtime:0} 
    0.85        0.01       0.01       0.00       0.00         98    Class#inherited (ruby_runtime:0} 
    0.85        0.02       0.01       0.00       0.02          2    Kernel#gem_original_require (ruby_runtime:0} 
    0.85        0.28       0.01       0.00       0.27        157    OpenURI::Buffer#<< (/usr/local/lib/ruby/1.8/open-uri.rb:297} 
  ... 
 
  == 1.8.6 core + no timeout block + 1M sysread + slice bypass == 
        user       system        total          real 
  open    0.093750     0.523438     0.617188 (    0.717748) 
 
   %self       total       self       wait      child      calls    name 
   66.67        0.23       0.23       0.00       0.00         52    IO#write (ruby_runtime:0} 
   13.33        0.05       0.05       0.00       0.00         52    IO#sysread (ruby_runtime:0} 
    4.44        0.02       0.02       0.00       0.00         55    String#<< (ruby_runtime:0} 
    4.44        0.02       0.02       0.00       0.00          1    TCPSocket#initialize (ruby_runtime:0} 
    2.22        0.01       0.01       0.00       0.00        101    Hash#[] (ruby_runtime:0} 
    2.22        0.01       0.01       0.00       0.00         10    String#slice! (ruby_runtime:0} 
    2.22        0.02       0.01       0.00       0.01          2    Kernel#gem_original_require (ruby_runtime:0} 
    2.22        0.25       0.01       0.00       0.24         52    OpenURI::Buffer#<< (/usr/local/lib/ruby/1.8/open-uri.rb:297} 
    2.22        0.01       0.01       0.00       0.00         37    <Class::Dir>#[] (ruby_runtime:0} 
  ... 
 
  Note: The change does also raises questions about the core slice method not performing as well as it might for the no-op case. 
 
  == 1.8.6 core + select timeout + 1M sysread + slice bypass == 
        user       system        total          real 
  open    0.070312     0.226562     0.296875 (    0.898285) 
 
   %self       total       self       wait      child      calls    name 
   64.10        0.20       0.20       0.00       0.00         49    IO#write (ruby_runtime:0} 
   15.38        0.05       0.05       0.00       0.00         49    IO#sysread (ruby_runtime:0} 
    7.69        0.02       0.02       0.00       0.00         52    String#<< (ruby_runtime:0} 
    5.13        0.23       0.02       0.00       0.21         49    OpenURI::Buffer#<< (/usr/local/lib/ruby/1.8/open-uri.rb:297} 
    2.56        0.02       0.01       0.00       0.02          2    Kernel#gem_original_require (ruby_runtime:0} 
    2.56        0.01       0.01       0.00       0.00         37    <Class::Dir>#[] (ruby_runtime:0} 
    2.56        0.02       0.01       0.00       0.01          3    Array#each (ruby_runtime:0} 
  ... 
 
  So overall these changes improve the speed from 22seconds to 0.7seconds in my test case making ruby comparable with wget's performance. 
 
  The patch for this last run is attached and should be compatible with existing functionality through the use of select + raise Timeout::Error instead of timeout( @read_timeout ) { .. } 
 
  Hope this helps, any questions let me know 
 
 =end 
 

Back