Project

General

Profile

Bug #1332

Reading file on Windows is 500x slower then with previous Ruby version

Added by ther (Damjan Rems) over 10 years ago. Updated over 8 years ago.

Status:
Closed
Priority:
Normal
Target version:
ruby -v:
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-mswin32]
Backport:
[ruby-core:23063]

Description

=begin
time = [Time.new]
c = ''
'aaaa'.upto('zzzz') {|e| c << e}
3.times { c << c }
time << Time.new
File.open('out.file','w') { |f| f.write(c) }
time << Time.new
c = File.open('out.file','r') { |f| f.read }
time << Time.new
0.upto(time.size - 2) {|i| p "#{i} #{time[i+1]-time[i]}" }

ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-mswin32]
"0 0.537075"
"1 0.696244"
"2 40.188834"

ruby 1.8.6 (2007-09-24 patchlevel 111) [i386-mswin32]
"0 0.551"
"1 0.133"
"2 0.087"

That is about 5x slower write and 500x read operation. Times are the
same if I do:
f = File.new('out.file','r')
c = f.read
f.close

Tried on two machines. Vista SP1 and XP SP3. Same results.

Tried with virus scanner disabled. Same results.

Tried on old Win2K P4 2.4Ghz machine without virus scanner
"0 1.0625"
"1 1.09375"
"2 111.171875"

Thats 111 seconds to read 14.623.232 bytes long file which is probably read from cache anyway.

The problem doesn't seem to exist on Linux althow I have tried only Ruby 1.9.0 version.

by
TheR
=end


Related issues

Related to Ruby master - Bug #2742: IO#read/gets can be very slow in dozeClosed02/13/2010Actions
Related to Ruby master - Feature #3228: speedup File.readRejected05/01/2010Actions

Associated revisions

Revision 51801
Added by hsbt (Hiroshi SHIBATA) about 4 years ago

Revision 51801
Added by hsbt (Hiroshi SHIBATA) about 4 years ago

Revision 51801
Added by hsbt (Hiroshi SHIBATA) about 4 years ago

Revision 51801
Added by hsbt (Hiroshi SHIBATA) about 4 years ago

Revision 51801
Added by hsbt (Hiroshi SHIBATA) about 4 years ago

History

#1

Updated by yugui (Yuki Sonoda) over 10 years ago

  • Target version set to 1.9.2
#2

Updated by yugui (Yuki Sonoda) over 10 years ago

  • Status changed from Open to Assigned
  • Assignee set to akr (Akira Tanaka)
  • Priority changed from Normal to 3
#3

Updated by rogerdpack (Roger Pack) almost 10 years ago

I believe this is related to other issues regarding reading files in non-binary mode being slow in 1.9

a = File.open('l', 'w'); 10000000.times { a.write "abc\n" }; a.close
Benchmark.measure { a = File.open('l', 'r'); a.readlines; a.close }.real
=> 11.890625
Benchmark.measure { a = File.open('l', 'rb'); a.readlines; a.close }.real
=> 3.59375

I believe that it is doing a string conversion from one encoding ["\r\n"] to another ["\n"].

Perhaps there is a way to speed this up? (ex: special case it somehow)?

-r

refs:
http://www.ruby-forum.com/topic/182691
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/24824

#4

Updated by usa (Usaku NAKAMURA) almost 10 years ago

=begin
Hello,

In message "[ruby-core:26505] [Bug #1332] Reading file on Windows is 500x slower then with previous Ruby version"
on Nov.04,2009 04:50:49, redmine@ruby-lang.org wrote:

I believe that it is doing a string conversion from one encoding ["\r\n"] to another ["\n"].

right.

Perhaps there is a way to speed this up? (ex: special case it somehow)?

Currently, we has implemented the newline conversion as a
transcode converter, just like encoding conversion.
But the design of transcode is too general to use it such
a simple operation, as our finding.
We want to find a better mechanism which doesn't deviate
from the current design of IO...

Regards,
--
U.Nakamura usa@garbagecollect.jp

=end

#5

Updated by jonforums (Jon Forums) almost 10 years ago

=begin

Currently, we has implemented the newline conversion as a
transcode converter, just like encoding conversion.
But the design of transcode is too general to use it such
a simple operation, as our finding.
We want to find a better mechanism which doesn't deviate
from the current design of IO...

Do you think the current transcode design is also the cause of

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/24839

Jon

=end

#6

Updated by rogerdpack (Roger Pack) almost 10 years ago

A temporary work around [though not actually binary compatible] appears to be

Index: ruby.c
===================================================================
--- ruby.c      (revision 25830)
+++ ruby.c      (working copy)
@@ -1484,6 +1484,7 @@
        int fd, mode = O_RDONLY;
 #if defined DOSISH || defined __CYGWIN__
        {
+           mode |= O_BINARY;
            const char *ext = strrchr(fname, '.');
            if (ext && STRCASECMP(ext, ".exe") == 0)
                mode |= O_BINARY;

This causes all ruby script files loaded to be loaded as binary. The drawback is that if you have a ruby script that was saved as ascii and contains strings that wrap lines, those strings will have an extra "\n" in them, ex:

File.write 'stringy.rb', "a=\"abc\r\ndef\"; puts a.inspect"

normal ruby:

C:>ruby stringy.rb
"abc\ndef"

patched ruby:

C:\>ruby stringy.rb
"abc\r\ndef"

But if your files were saved in binary mode it will be the same.
And the slowdown is gone for now.
Hopefully a better fix can be created.
Thanks.
-r

#7

Updated by usa (Usaku NAKAMURA) almost 10 years ago

Hello,

In message "[ruby-core:26840] [Bug #1332] Reading file on Windows is 500x slower then with previous Ruby version"
on Nov.21,2009 08:10:45, redmine@ruby-lang.org wrote:

This causes all ruby script files loaded to be loaded as binary. The drawback is that if you have a ruby script that was saved as ascii and contains strings that wrap lines, those strings will have an extra "\n" in them, ex:

pseudo-IO DATA recognizes the script file as data file.
So, changing default mode breaks the compatibility of such
scripts.

Regards,

U.Nakamura usa@garbagecollect.jp

#8

Updated by rogerdpack (Roger Pack) over 9 years ago

Appears that
1) the writes have slowed down, "only" by about 100% (take twice as long to write in ascii 1.9 as in 1.8). Not terrible.

2) the reads have slowed down by something like 40000% (!)

I think to avoid the slowdown with reads you can "hack a work around" like

c = File.open('out.file','rb') { |f| f.read }
c.gsub!("\r\n", "\n")

But this seems like there might be a bug in there, too.

-rp

#9

Updated by mame (Yusuke Endoh) over 9 years ago

  • Status changed from Assigned to Closed

Hi,

This was fixed at r27340.

Buffer was extended (realloc'ed) in linear-order, which resulted
in O(n2 ). Now it is extended using "double memory if you run out"
rule, like String. So the problem was solved, I think.

Thanks,

--
Yusuke Endoh mame@tsg.ne.jp

#10

Updated by rogerdpack (Roger Pack) over 9 years ago

appears to be much better in trunk.

1.9.1:

"0 0.396039"
"1 0.352035"
"2 43.111311"

1.9.2:

"0 0.369037"
"1 0.513051"
"2 1.626163" # still 10x as slow as 1.8.6, but probably because of a different reason.

Thanks!
-rp

#11

Updated by mame (Yusuke Endoh) over 9 years ago

Hi,

2010/4/16 Roger Pack redmine@ruby-lang.org:

1.9.2:

"0 0.369037"
"1 0.513051"
"2 1.626163" # still 10x as slow as 1.8.6, but probably because of a different reason.

Yes, text mode is still 10x -- 30x slower than binary mode.
It is reproduced not only on windows but also Linux.
Perhaps, this is the symptom because of the reason explained
in [ruby-core:26515].

--
Yusuke ENDOH mame@tsg.ne.jp

Also available in: Atom PDF