Feature #5995

calling io_advise_internal() in read_all()

Added by Masaki Matsushita about 2 years ago. Updated over 1 year ago.

[ruby-core:42471]
Status:Assigned
Priority:Normal
Assignee:Yusuke Endoh
Category:core
Target version:next minor

Description

=begin
I propose to call ioadviseinternal() in read_all().
It will increase performance.

I created a dummy file:
dd if=/dev/zero of=dummy bs=1M count=100

Then, I ran the following:

require 'benchmark'

Benchmark.bm do |x|
x.report do
f = File.open("dummy") # dummy file(about 100MB )
f.read
end
end

I freed page cache before each test:
sudo sysctl -w vm.drop_caches=1

results on Ubuntu 11.10(3.0.0-15-server):

r34462:

user system total real
0.050000 0.220000 0.270000 ( 0.356033)

   user     system      total        real

0.050000 0.190000 0.240000 ( 0.332243)

   user     system      total        real

0.060000 0.210000 0.270000 ( 0.347758)

patched ruby:

user system total real
0.030000 0.130000 0.160000 ( 0.225866)

   user     system      total        real

0.040000 0.170000 0.210000 ( 0.250172)

   user     system      total        real

0.040000 0.150000 0.190000 ( 0.254654)

It shows the patch increases performance.
=end

patch.diff Magnifier (1.25 KB) Masaki Matsushita, 02/10/2012 02:14 PM

patch2.diff Magnifier (1003 Bytes) Masaki Matsushita, 02/14/2012 12:48 PM

History

#1 Updated by Masaki Matsushita about 2 years ago

I modified the patch to use doioadvise() not ioadviseinternal().

#2 Updated by Motohiro KOSAKI about 2 years ago

Huh? fadvise() is a hint for future io access. but usually read_all() don't have any future access. I don't think this patch makes platform independent improvement. How much OSs do you tested?

#3 Updated by Masaki Matsushita about 2 years ago

=begin

How much OSs do you tested?

I have tested on the only platform because I don't have root authority to execute "sudo sysctl -w vm.drop_caches=1" except the one.
I'm sorry.

I ran the test again and the following is average real time of 10 times:

r34599: 0.3800858
proposal: 0.2377475

Infomation about my environment.

% uname -mrsvo
Linux 3.0.0-15-server #26-Ubuntu SMP Fri Jan 20 19:07:39 UTC 2012 x86_64 GNU/Linux

% lsb_release -a
LSB Version: core-2.0-amd64:core-2.0-noarch:core-3.0-amd64:core-3.0-noarch:core-3.1-amd64:core-3.1-noarch:core-3.2-amd64:core-3.2-noarch:core-4.0-amd64:core-4.0-noarch
Distributor ID: Ubuntu
Description: Ubuntu 11.10
Release: 11.10
Codename: oneiric

=end

#4 Updated by Yusuke Endoh about 2 years ago

  • Status changed from Open to Assigned
  • Assignee set to Motohiro KOSAKI

Hello,

2012/2/14 Motohiro KOSAKI kosaki.motohiro@gmail.com:

Huh? fadvise() is a hint for future io access. but usually read_all() don't have any future access.

The patch calls posixfadvise before the main part of readall.
Doesn't it make sense?

Quoted from the manpage of POSIX_FADVISE(2):

Under Linux, POSIXFADVNORMAL sets the readahead window to
the default size for the backing device; POSIXFADVSEQUENTIAL
doubles this size,

I don't wonder that the patch works well. Actually I can
reproduce the speed up.

$ sudo sysctl -w vm.dropcaches=1
vm.drop
caches = 1

$ ./ruby.old t.rb
user system total real
0.100000 1.150000 1.250000 ( 3.776475)

$ sudo sysctl -w vm.dropcaches=1
vm.drop
caches = 1

$ ./ruby.new t.rb
user system total real
0.090000 0.750000 0.840000 ( 3.766539)

$ uname -a
Linux inch 3.0.0-16-generic-pae #28-Ubuntu SMP Fri Jan 27 19:24:01 UTC 2012 i686 i686 i386 GNU/Linux

Yusuke Endoh mame@tsg.ne.jp

#5 Updated by Motohiro KOSAKI about 2 years ago

  • Status changed from Assigned to Rejected

The patch calls posixfadvise before the main part of readall.
Doesn't it make sense?

It doesn't. Because of, when we read whole file contents, we only need one read syscall if the file is regular. In other words, current readall() suck. It read a few kilo bytes and append them to a string. That's said it create tons realloc. That doesn't make sense. we need fix the root cause. btw, readall() also abuse BUFSIZ.

So, No. We are not welcome a bandaid.

#6 Updated by Masaki Matsushita about 2 years ago

=begin

In other words, current read_all() suck. It read a few kilo bytes and append them to a string.

I modified io.c to show how many bytes read_all() reads on each syscall.

io.c:1834

static long
iobufread(char *ptr, long len, rbio_t *fptr)
{
long offset = 0;
long n = len;
long c;

if (READ_DATA_PENDING(fptr) == 0) {
    while (n > 0) {
      again:
        c = rb_read_internal(fptr->fd, ptr+offset, n);
        if (c == 0) break;
        printf("%ld/%ld\n", c, len); /* how many bytes? */
        if (c < 0) {
            if (rb_io_wait_readable(fptr->fd))
                goto again;
            return -1;
        }
        offset += c;
        if ((n -= c) <= 0) break;
        rb_thread_wait_fd(fptr->fd);
    }
    return len - n;
}

io.c:2137
for (;;) {
READCHECK(fptr);
n = io
fread(str, bytes, siz - bytes, fptr);
if (n == 0 && bytes == 0) {
rbstrsetlen(str, 0);
break;
}
bytes += n;
rb
strsetlen(str, bytes);
if (cr != ENCCODERANGEBROKEN)
pos += rbstrcoderangescanrestartable(RSTRINGPTR(str) + pos, RSGPTR(str) + bytes, enc, &cr);
if (bytes < siz) break;
printf("%ld/%ld\n", n, siz); /* how many bytes? */
siz += BUFSIZ;
rbstrmodify_expand(str, BUFSIZ);
}

Then, I ran the test same as and I got:
user system total real
102400000/102400000
102400000/102400000
0.020000 0.170000 0.190000 ( 0.254729)

It shows current read_all() reads file at a time.
=end

#7 Updated by Yusuke Endoh about 2 years ago

Hello,

2012/2/15 Motohiro KOSAKI kosaki.motohiro@gmail.com:

It doesn't. Because of, when we read whole file contents, we only need one read syscall if the file is regular. In other words, current read_all() suck.  It read a few kilo bytes and append them to a string. That's said it create tons realloc.

Really? I could be wrong, but as far as I know, IO#read first
uses fstat to estimate a buffer length enough to load the whole
content of the file. Masaki also showed the behavior.
I think there is something wrong.

--
Yusuke Endoh mame@tsg.ne.jp

#8 Updated by Shota Fukumori about 2 years ago

  • Status changed from Rejected to Assigned

Hi,

Yusuke Endoh wrote at :

I think there is something wrong.

Agreed. Reopening this ticket.

#9 Updated by Yusuke Endoh about 2 years ago

  • Status changed from Assigned to Rejected

Hello,

There is indeed something wrong, but anyway, I agree with
kosaki's point; we cannot import the patch until we know
the exact reason why it brings performance improvement.
So please reopen this ticket if you find the reason.
(I expect kosaki-san to consider this!)

I wrote a simple C code to experiment this. The result is
as kosaki said; when calling only one read syscall, posix_
fadvise does NOT bring performance improvement (even slower).
I really wonder why File#read becomes faster.

using posix_fadvise

$ sudo sysctl -w vm.dropcaches=1 && time ./t dummy T
vm.drop
caches = 1
314572800

real 0m5.401s
user 0m0.000s
sys 0m0.740s

NOT using posix_fadvise

$ sudo sysctl -w vm.dropcaches=1 && time ./t dummy F
vm.drop
caches = 1
314572800

real 0m3.967s
user 0m0.000s
sys 0m0.896s

#include
#include
#include
#include
#include
#include

int main(int argc, char *argv[])
{
int fd = open(argv[1], O_RDONLY);
char *buf;
struct stat s;

fstat(fd, &s);
buf = malloc(s.st_size);

if (argv[2][0] == 'T') {
    posix_fadvise(fd, 0, 0, POSIX_FADV_SEQUENTIAL);
}

printf("%d\n", read(fd, buf, s.st_size));

return 0;

}

Yusuke Endoh mame@tsg.ne.jp

#10 Updated by Masaki Matsushita about 2 years ago

I ran Endoh-san's code.

average time of 10 times:
F: 0.2892
T(use posix_fadvise()): 0.2226

I think posix_fadvise() makes sense at least on my environment.
Can anyone reproduce this?

Anyway, I agree with Endoh-san because I have no idea about why the patch makes read_all() faster now.

#11 Updated by Martin Bosslet about 2 years ago

Motohiro KOSAKI wrote:

Because of, when we read whole file contents, we only need one read syscall if the file is regular. In other words, current readall() suck. It read a few kilo bytes and append them to a string. That's said it create tons realloc. That doesn't make sense. we need fix the root cause. btw, readall() also abuse BUFSIZ.

I noticed that, too, lately, when I had to implement something similar. read_all[1] resizes the buffer in a linear fashion. This means a linear number of reallocs - wouldn't it make sense to grow the buffer exponentially by multiplying with a constant factor (1.5 or 2 maybe) instead? That way, we would only have a logarithmic number of reallocs, which would probably already give better performance. It's a bit more wasteful on memory usage, but I assume that's tolerable because in the end we would resize to the exact total size anyway, no?

[1] https://github.com/ruby/ruby/blob/trunk/io.c#L2152

#12 Updated by Yusuke Endoh about 2 years ago

Hello, Martin

I guess that your point is off topic from this ticket.
As I and Masaki showed, in normal cases, File#read does
not reallocate a memory. (Let me know if I'm wrong)

But I think your point is valid for the general IO#read
(especially, reading from a socket). I recommend you to
create a patch and a benchmark, and discuss in another
thread.

Yusuke Endoh mame@tsg.ne.jp

#13 Updated by Martin Bosslet about 2 years ago

Yusuke Endoh wrote:

Hello, Martin

I guess that your point is off topic from this ticket.
As I and Masaki showed, in normal cases, File#read does
not reallocate a memory. (Let me know if I'm wrong)

But I think your point is valid for the general IO#read
(especially, reading from a socket). I recommend you to
create a patch and a benchmark, and discuss in another
thread.

Hi Yusuke,

you're right, my apologies! I just read 'read_all' and
'a lot of reallocs' and was immediately reminded of what
I noticed on a different note :) I opened Issue #6047 for
this separate topic!

-Martin

#14 Updated by Masaki Matsushita about 2 years ago

Some file systems(e.g. ext3, ext4) use dosyncread() for general read.
http://lxr.linux.no/#linux+v3.2.6/fs/ext3/file.c#L55
http://lxr.linux.no/#linux+v3.2.6/fs/ext4/file.c#L231

In read process, dogenericfileread() is called finally.
http://lxr.linux.no/#linux+v3.2.6/fs/read
write.c#L338 ( dosyncread() )
In ext3 and ext4, fop->aioread is genericfileaioread().
http://lxr.linux.no/#linux+v3.2.6/mm/filemap.c#L1395 ( It calls do
genericfileread(). )

Then, dogenericfileread() calls pagecachesyncreadahead() or pagecacheasync_readahead().
http://lxr.linux.no/#linux+v3.2.6/mm/filemap.c#L1118

Both pagecachesyncreadahead() and pagecacheasyncreadahead() call ondemandreadahead() and its readahead size is limited by ra->rapages.
http://lxr.linux.no/#linux+v3.2.6/mm/readahead.c#L401

posixfadvise() expands ra->rapages( http://lxr.linux.no/#linux+v3.2.6/mm/fadvise.c#L90 ) and it reduces the number of times of actual read.
Therefore, I think the patch makes sense on some file systems as stated above.

#15 Updated by Yusuke Endoh about 2 years ago

  • Status changed from Rejected to Assigned

Kosaki-san, can you check ?

Matsushita-san,
I'm not sure if the mechanism you said is right because just
using posixfadvise did not bring any speed improvement in my
experiment of . Did you run my program?
I'm afraid there is another reason why posix
fadvise brings
improvement to Ruby.

Yusuke Endoh mame@tsg.ne.jp

#16 Updated by Masaki Matsushita about 2 years ago

Yusuke Endoh wrote:

Did you run my program?

Yes. I ran your program in and I really experienced performance improvement on my environment.
Results are in and they can be reproduced.

I'm wondering if the fact that my environment is VPS hosted by customized KVM affects the results.
However, I have not found any grounded reasons so far.

#17 Updated by Yusuke Endoh about 2 years ago

Hello,

2012/2/25 Masaki Matsushita glass.saga@gmail.com:

Yusuke Endoh wrote:

Did you run my program?

Yes. I ran your program in and I really experienced performance improvement on my environment.
Results are in and they can be reproduced.

Oops. I was missing. Sorry.

--
Yusuke Endoh mame@tsg.ne.jp

#18 Updated by Motohiro KOSAKI almost 2 years ago

  • Assignee changed from Motohiro KOSAKI to Yusuke Endoh

Endoh-san,

I really dislike this patch because this patch abuse fadvise() and don't guarantee to positive effect on other environment (dirrerent os, different storage type). But if you strongly prefer it, I give up to opposite.

Please check-in by yourself.

#19 Updated by Yusuke Endoh over 1 year ago

  • Target version set to next minor

Well, I wonder what I should do.
... I procrastinate the decision to next minor.

Yusuke Endoh mame@tsg.ne.jp

Also available in: Atom PDF