Project

General

Profile

Bug #1332

Reading file on Windows is 500x slower then with previous Ruby version

Added by ther (Damjan Rems) over 10 years ago. Updated about 8 years ago.

Status:
Closed
Priority:
Normal
Target version:
ruby -v:
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-mswin32]
Backport:
[ruby-core:23063]

Description

=begin
time = [Time.new]
c = ''
'aaaa'.upto('zzzz') {|e| c << e}
3.times { c << c }
time << Time.new
File.open('out.file','w') { |f| f.write(c) }
time << Time.new
c = File.open('out.file','r') { |f| f.read }
time << Time.new
0.upto(time.size - 2) {|i| p "#{i} #{time[i+1]-time[i]}" }

ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-mswin32]
"0 0.537075"
"1 0.696244"
"2 40.188834"

ruby 1.8.6 (2007-09-24 patchlevel 111) [i386-mswin32]
"0 0.551"
"1 0.133"
"2 0.087"

That is about 5x slower write and 500x read operation. Times are the
same if I do:
f = File.new('out.file','r')
c = f.read
f.close

Tried on two machines. Vista SP1 and XP SP3. Same results.

Tried with virus scanner disabled. Same results.

Tried on old Win2K P4 2.4Ghz machine without virus scanner
"0 1.0625"
"1 1.09375"
"2 111.171875"

Thats 111 seconds to read 14.623.232 bytes long file which is probably read from cache anyway.

The problem doesn't seem to exist on Linux althow I have tried only Ruby 1.9.0 version.

by
TheR
=end


Related issues

Related to Ruby master - Bug #2742: IO#read/gets can be very slow in dozeClosed02/13/2010Actions
Related to Ruby master - Feature #3228: speedup File.readRejected05/01/2010Actions

Associated revisions

Revision 59991b6a
Added by hsbt (Hiroshi SHIBATA) almost 4 years ago

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@51801 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 51801
Added by hsbt (Hiroshi SHIBATA) almost 4 years ago

Revision 51801
Added by hsbt (Hiroshi SHIBATA) almost 4 years ago

Revision 51801
Added by hsbt (Hiroshi SHIBATA) almost 4 years ago

Revision 51801
Added by hsbt (Hiroshi SHIBATA) almost 4 years ago

Revision 51801
Added by hsbt (Hiroshi SHIBATA) almost 4 years ago

History

#1

Updated by yugui (Yuki Sonoda) about 10 years ago

  • Target version set to 1.9.2
#2

Updated by yugui (Yuki Sonoda) about 10 years ago

  • Status changed from Open to Assigned
  • Assignee set to akr (Akira Tanaka)
  • Priority changed from Normal to 3
#3

Updated by rogerdpack (Roger Pack) over 9 years ago

I believe this is related to other issues regarding reading files in non-binary mode being slow in 1.9

a = File.open('l', 'w'); 10000000.times { a.write "abc\n" }; a.close
Benchmark.measure { a = File.open('l', 'r'); a.readlines; a.close }.real
=> 11.890625
Benchmark.measure { a = File.open('l', 'rb'); a.readlines; a.close }.real
=> 3.59375

I believe that it is doing a string conversion from one encoding ["\r\n"] to another ["\n"].

Perhaps there is a way to speed this up? (ex: special case it somehow)?

-r

refs:
http://www.ruby-forum.com/topic/182691
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/24824

#4

Updated by usa (Usaku NAKAMURA) over 9 years ago

=begin
Hello,

In message "[ruby-core:26505] [Bug #1332] Reading file on Windows is 500x slower then with previous Ruby version"
on Nov.04,2009 04:50:49, redmine@ruby-lang.org wrote:

I believe that it is doing a string conversion from one encoding ["\r\n"] to another ["\n"].

right.

Perhaps there is a way to speed this up? (ex: special case it somehow)?

Currently, we has implemented the newline conversion as a
transcode converter, just like encoding conversion.
But the design of transcode is too general to use it such
a simple operation, as our finding.
We want to find a better mechanism which doesn't deviate
from the current design of IO...

Regards,
--
U.Nakamura usa@garbagecollect.jp

=end

#5

Updated by jonforums (Jon Forums) over 9 years ago

=begin

Currently, we has implemented the newline conversion as a
transcode converter, just like encoding conversion.
But the design of transcode is too general to use it such
a simple operation, as our finding.
We want to find a better mechanism which doesn't deviate
from the current design of IO...

Do you think the current transcode design is also the cause of

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/24839

Jon

=end

#6

Updated by rogerdpack (Roger Pack) over 9 years ago

A temporary work around [though not actually binary compatible] appears to be

Index: ruby.c
===================================================================
--- ruby.c      (revision 25830)
+++ ruby.c      (working copy)
@@ -1484,6 +1484,7 @@
        int fd, mode = O_RDONLY;
 #if defined DOSISH || defined __CYGWIN__
        {
+           mode |= O_BINARY;
            const char *ext = strrchr(fname, '.');
            if (ext && STRCASECMP(ext, ".exe") == 0)
                mode |= O_BINARY;

This causes all ruby script files loaded to be loaded as binary. The drawback is that if you have a ruby script that was saved as ascii and contains strings that wrap lines, those strings will have an extra "\n" in them, ex:

File.write 'stringy.rb', "a=\"abc\r\ndef\"; puts a.inspect"

normal ruby:

C:>ruby stringy.rb
"abc\ndef"

patched ruby:

C:\>ruby stringy.rb
"abc\r\ndef"

But if your files were saved in binary mode it will be the same.
And the slowdown is gone for now.
Hopefully a better fix can be created.
Thanks.
-r

#7

Updated by usa (Usaku NAKAMURA) over 9 years ago

Hello,

In message "[ruby-core:26840] [Bug #1332] Reading file on Windows is 500x slower then with previous Ruby version"
on Nov.21,2009 08:10:45, redmine@ruby-lang.org wrote:

This causes all ruby script files loaded to be loaded as binary. The drawback is that if you have a ruby script that was saved as ascii and contains strings that wrap lines, those strings will have an extra "\n" in them, ex:

pseudo-IO DATA recognizes the script file as data file.
So, changing default mode breaks the compatibility of such
scripts.

Regards,

U.Nakamura usa@garbagecollect.jp

#8

Updated by rogerdpack (Roger Pack) over 9 years ago

Appears that
1) the writes have slowed down, "only" by about 100% (take twice as long to write in ascii 1.9 as in 1.8). Not terrible.

2) the reads have slowed down by something like 40000% (!)

I think to avoid the slowdown with reads you can "hack a work around" like

c = File.open('out.file','rb') { |f| f.read }
c.gsub!("\r\n", "\n")

But this seems like there might be a bug in there, too.

-rp

#9

Updated by mame (Yusuke Endoh) over 9 years ago

  • Status changed from Assigned to Closed

Hi,

This was fixed at r27340.

Buffer was extended (realloc'ed) in linear-order, which resulted
in O(n2 ). Now it is extended using "double memory if you run out"
rule, like String. So the problem was solved, I think.

Thanks,

--
Yusuke Endoh mame@tsg.ne.jp

#10

Updated by rogerdpack (Roger Pack) over 9 years ago

appears to be much better in trunk.

1.9.1:

"0 0.396039"
"1 0.352035"
"2 43.111311"

1.9.2:

"0 0.369037"
"1 0.513051"
"2 1.626163" # still 10x as slow as 1.8.6, but probably because of a different reason.

Thanks!
-rp

#11

Updated by mame (Yusuke Endoh) over 9 years ago

Hi,

2010/4/16 Roger Pack redmine@ruby-lang.org:

1.9.2:

"0 0.369037"
"1 0.513051"
"2 1.626163" # still 10x as slow as 1.8.6, but probably because of a different reason.

Yes, text mode is still 10x -- 30x slower than binary mode.
It is reproduced not only on windows but also Linux.
Perhaps, this is the symptom because of the reason explained
in [ruby-core:26515].

--
Yusuke ENDOH mame@tsg.ne.jp

Also available in: Atom PDF