Project

General

Profile

Actions

Bug #1332

closed

Reading file on Windows is 500x slower then with previous Ruby version

Added by ther (Damjan Rems) almost 15 years ago. Updated almost 13 years ago.

Status:
Closed
Target version:
ruby -v:
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-mswin32]
Backport:
[ruby-core:23063]

Description

=begin
time = [Time.new]
c = ''
'aaaa'.upto('zzzz') {|e| c << e}
3.times { c << c }
time << Time.new
File.open('out.file','w') { |f| f.write(c) }
time << Time.new
c = File.open('out.file','r') { |f| f.read }
time << Time.new
0.upto(time.size - 2) {|i| p "#{i} #{time[i+1]-time[i]}" }

ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-mswin32]
"0 0.537075"
"1 0.696244"
"2 40.188834"

ruby 1.8.6 (2007-09-24 patchlevel 111) [i386-mswin32]
"0 0.551"
"1 0.133"
"2 0.087"

That is about 5x slower write and 500x read operation. Times are the
same if I do:
f = File.new('out.file','r')
c = f.read
f.close

Tried on two machines. Vista SP1 and XP SP3. Same results.

Tried with virus scanner disabled. Same results.

Tried on old Win2K P4 2.4Ghz machine without virus scanner
"0 1.0625"
"1 1.09375"
"2 111.171875"

Thats 111 seconds to read 14.623.232 bytes long file which is probably read from cache anyway.

The problem doesn't seem to exist on Linux althow I have tried only Ruby 1.9.0 version.

by
TheR
=end


Related issues 2 (0 open2 closed)

Related to Ruby master - Bug #2742: IO#read/gets can be very slow in dozeClosed02/13/2010Actions
Related to Ruby master - Feature #3228: speedup File.readRejected05/01/2010Actions
Actions #1

Updated by yugui (Yuki Sonoda) over 14 years ago

  • Target version set to 1.9.2
Actions #2

Updated by yugui (Yuki Sonoda) over 14 years ago

  • Status changed from Open to Assigned
  • Assignee set to akr (Akira Tanaka)
  • Priority changed from Normal to 3
Actions #3

Updated by rogerdpack (Roger Pack) over 14 years ago

I believe this is related to other issues regarding reading files in non-binary mode being slow in 1.9

a = File.open('l', 'w'); 10000000.times { a.write "abc\n" }; a.close
Benchmark.measure { a = File.open('l', 'r'); a.readlines; a.close }.real
=> 11.890625
Benchmark.measure { a = File.open('l', 'rb'); a.readlines; a.close }.real
=> 3.59375

I believe that it is doing a string conversion from one encoding ["\r\n"] to another ["\n"].

Perhaps there is a way to speed this up? (ex: special case it somehow)?

-r

refs:
http://www.ruby-forum.com/topic/182691
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/24824

Actions #4

Updated by usa (Usaku NAKAMURA) over 14 years ago

=begin
Hello,

In message "[ruby-core:26505] [Bug #1332] Reading file on Windows is 500x slower then with previous Ruby version"
on Nov.04,2009 04:50:49, wrote:

I believe that it is doing a string conversion from one encoding ["\r\n"] to another ["\n"].

right.

Perhaps there is a way to speed this up? (ex: special case it somehow)?

Currently, we has implemented the newline conversion as a
transcode converter, just like encoding conversion.
But the design of transcode is too general to use it such
a simple operation, as our finding.
We want to find a better mechanism which doesn't deviate
from the current design of IO...

Regards,

U.Nakamura

=end

Actions #5

Updated by jonforums (Jon Forums) over 14 years ago

=begin

Currently, we has implemented the newline conversion as a
transcode converter, just like encoding conversion.
But the design of transcode is too general to use it such
a simple operation, as our finding.
We want to find a better mechanism which doesn't deviate
from the current design of IO...

Do you think the current transcode design is also the cause of

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/24839

Jon

=end

Actions #6

Updated by rogerdpack (Roger Pack) over 14 years ago

A temporary work around [though not actually binary compatible] appears to be

Index: ruby.c
===================================================================
--- ruby.c      (revision 25830)
+++ ruby.c      (working copy)
@@ -1484,6 +1484,7 @@
        int fd, mode = O_RDONLY;
 #if defined DOSISH || defined __CYGWIN__
        {
+           mode |= O_BINARY;
            const char *ext = strrchr(fname, '.');
            if (ext && STRCASECMP(ext, ".exe") == 0)
                mode |= O_BINARY;

This causes all ruby script files loaded to be loaded as binary. The drawback is that if you have a ruby script that was saved as ascii and contains strings that wrap lines, those strings will have an extra "\n" in them, ex:

File.write 'stringy.rb', "a="abc\r\ndef"; puts a.inspect"

normal ruby:

C:>ruby stringy.rb
"abc\ndef"

patched ruby:

C:\>ruby stringy.rb
"abc\r\ndef"

But if your files were saved in binary mode it will be the same.
And the slowdown is gone for now.
Hopefully a better fix can be created.
Thanks.
-r

Actions #7

Updated by usa (Usaku NAKAMURA) over 14 years ago

Hello,

In message "[ruby-core:26840] [Bug #1332] Reading file on Windows is 500x slower then with previous Ruby version"
on Nov.21,2009 08:10:45, wrote:

This causes all ruby script files loaded to be loaded as binary. The drawback is that if you have a ruby script that was saved as ascii and contains strings that wrap lines, those strings will have an extra "\n" in them, ex:

pseudo-IO DATA recognizes the script file as data file.
So, changing default mode breaks the compatibility of such
scripts.

Regards,

U.Nakamura

Actions #8

Updated by rogerdpack (Roger Pack) about 14 years ago

Appears that

  1. the writes have slowed down, "only" by about 100% (take twice as long to write in ascii 1.9 as in 1.8). Not terrible.

  2. the reads have slowed down by something like 40000% (!)

I think to avoid the slowdown with reads you can "hack a work around" like

c = File.open('out.file','rb') { |f| f.read }
c.gsub!("\r\n", "\n")

But this seems like there might be a bug in there, too.

-rp

Actions #9

Updated by mame (Yusuke Endoh) almost 14 years ago

  • Status changed from Assigned to Closed

Hi,

This was fixed at r27340.

Buffer was extended (realloc'ed) in linear-order, which resulted
in O(n^2 ). Now it is extended using "double memory if you run out"
rule, like String. So the problem was solved, I think.

Thanks,

--
Yusuke Endoh

Actions #10

Updated by rogerdpack (Roger Pack) almost 14 years ago

appears to be much better in trunk.

1.9.1:

"0 0.396039"
"1 0.352035"
"2 43.111311"

1.9.2:

"0 0.369037"
"1 0.513051"
"2 1.626163" # still 10x as slow as 1.8.6, but probably because of a different reason.

Thanks!
-rp

Actions #11

Updated by mame (Yusuke Endoh) almost 14 years ago

Hi,

2010/4/16 Roger Pack :

1.9.2:

"0 0.369037"
"1 0.513051"
"2 1.626163" # still 10x as slow as 1.8.6, but probably because of a different reason.

Yes, text mode is still 10x -- 30x slower than binary mode.
It is reproduced not only on windows but also Linux.
Perhaps, this is the symptom because of the reason explained
in [ruby-core:26515].

--
Yusuke ENDOH

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0