Project

General

Profile

Actions

Feature #11339

closed

[PATCH] io.c: avoid kwarg parsing in C API

Added by normalperson (Eric Wong) over 9 years ago. Updated almost 9 years ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:69892]

Description

rb_scan_args and hash lookups for kwargs in the C API are clumsy and
slow.  Instead of improving the C API for performance, use Ruby
instead :)

Implement IO#read_nonblock and IO#write_nonblock in prelude.rb
to avoid argument parsing via rb_scan_args and hash lookups.

This speeds up IO#write_nonblock and IO#read_nonblock benchmarks
in both cases, including the original non-idiomatic case where
the `exception: false' hash is pre-allocated to avoid GC pressure.

Now, writing the kwargs in natural, idiomatic Ruby is fastest.
I've added the noex2 benchmark to show this.

target 0: a (ruby 2.3.0dev (2015-07-08 trunk 51190) [x86_64-linux]) at "a/ruby"
target 1: b (ruby 2.3.0dev (2015-07-08 nonblock-kwarg 51190) [x86_64-linux]) at "b/ruby"
-----------------------------------------------------------
raw data:

[["io_nonblock_noex",
  [[2.5436805468052626, 2.5724728293716908, 2.4915440678596497],
   [2.478000810369849, 2.4285155069082975, 2.462410459294915]]],
 ["io_nonblock_noex2",
  [[3.012514788657427, 3.034533655270934, 2.9972082190215588],
   [2.135501991957426, 2.146781364455819, 2.0429874528199434]]]]

Elapsed time: 30.348340944 (sec)
-----------------------------------------------------------
benchmark results:
minimum results in each 3 measurements.
Execution time (sec)
name    a       b
io_nonblock_noex        2.492   2.429
io_nonblock_noex2       2.997   2.043

Speedup ratio: compare with the result of `a' (greater is better)
name    b
io_nonblock_noex        1.026
io_nonblock_noex2       1.467

Note: I plan to followup commits for other *_nonblock methods
Eventually, I even wish to deprecate rb_scan_args :D

For what it's worth, I'm more excited about this change than usual
and hope to use prelude.rb more.

Files

0001-io.c-avoid-kwarg-parsing-in-C-API.patch (6.88 KB) 0001-io.c-avoid-kwarg-parsing-in-C-API.patch normalperson (Eric Wong), 07/07/2015 09:59 PM

Updated by normalperson (Eric Wong) about 9 years ago

wrote:

Feature #11339: [PATCH] io.c: avoid kwarg parsing in C API
https://bugs.ruby-lang.org/issues/11339

Note: I plan to followup commits for other *_nonblock methods
Eventually, I even wish to deprecate rb_scan_args :D

For what it's worth, I'm more excited about this change than usual
and hope to use prelude.rb more.

ko1/nobu/akr/others: any comments on this?

My main concern is increased parse time from prelude during startup;
but we may translate prelude to iseq array and use rb_iseq_load, too.
The parser seems to be the worst offender for startup performance
nowadays.

Updated by ko1 (Koichi Sasada) about 9 years ago

On 2015/07/16 4:41, Eric Wong wrote:

wrote:

Feature #11339: [PATCH] io.c: avoid kwarg parsing in C API
https://bugs.ruby-lang.org/issues/11339

Note: I plan to followup commits for other *_nonblock methods
Eventually, I even wish to deprecate rb_scan_args :D

For what it's worth, I'm more excited about this change than usual
and hope to use prelude.rb more.

ko1/nobu/akr/others: any comments on this?

My main concern is increased parse time from prelude during startup;
but we may translate prelude to iseq array and use rb_iseq_load, too.
The parser seems to be the worst offender for startup performance
nowadays.

We have some ideas to solve this issue. We discussed about solutions.

Known problems about C-methods parameters:
(P1) slow to parse kwargs with Hash
(P2) difficult to write scan_args
(P3) C-methods can't support Method#parameters

Solutions:

(1) Introduce wrapping Ruby methods into prelude.rb (your idea)
Pros. Easy to introduce.
Solves (P1-3).
Cons. Increase parse time at Ruby launch.

(2) Introduce new API to declare Ruby-like parameters for C-APIs

like: rb_define_method(klass, "xyzzy", klass_xyzzy, -1)

(2-1)
-> rb_defnie_method_??(klass, "xyzzy", klass_xyzzy,
"(m1, m2, o1=nil, o2=nil,
*r, p1, p2, k1: 1, k2: 2)")

VALUE klass_xyzzy(VALUE self, VALUE m1, VALUE m2, VALUE o1, VALUE o2,
VALUE r, VALUE p1, VALUE p2, VALUE k1, VALUE k2)

or
(2-2)
-> rb_defnie_method_??(klass, "xyzzy", klass_xyzzy,
2 /* mandatory num /,
2 /
optional num /,
1 /
rest num /,
2 /
post num /,
2 /
kw num */,
"m1", "m2",
"o1", Qnil, "o2", Qnil,
"r", "p1", "p2",
"k1", Qnil, "k2", Qnil);

(2-3)
-> something = rb_define_method(klass, "xyzzy", klass_xyzzy, 9);
rb_define_method_argument(something, ...);

(or something like that)

Implementation: Make new rb_iseq only to call C func (klass_xyzzy, in
this case). We have also need several issues.

Pros. Easy to specify parameters.
Solves (P1-3).
Cons. Difficult to design API (it should be compatible in future).
(2-1) introduces parse time at definition.

(3) Introduce new IDL (Interface Definition Language)


File klass.??

class Klass
def xyzzy(m1, m2, o1=nil, o2=nil, *r, p1, p2, k1: 1, k2: 2)
# This decl. calls C func klass_xyzzy with parameters m1 to k2.
# We can't write any code here.
end
end

Translate klass.?? to something like (2).
We don't touch such APIs. No compatibility issues.

Pros. We don't need to design cool API.
Solves (P1-P3).
Cons. Need to design new langauge (IDL).

(4) Introduce new IDL like Ricsin

I made a system calls Ricsin, which enable to embed C code into Ruby code.

http://www.atdot.net/~ko1/activities/ricsin2009_pro.pdf
(sorry, written in Japanese)


File klass.??

class Klass
def xyzzy(m1, m2, o1=nil, o2=nil, r, p1, p2, k1: 1, k2: 2)
# you can write any Ruby code here.
C %Q{
/
Given string argument for C is C code. */
klass_xyzzy(RV(m1), RV(m2), RV(o1), RV(o2),
RV(r), RV(p1), RV(p2), RV(k1), RV(k2));
}
end
end

Compile this file into something C-extension.

Pros. Easy to write Extensions.
Easy (and efficient) to write exception handling code
without rb_protect(). rb_iterate() is same.
(callback is difficult for C)
Solves (P1-P3).
Cons. Allowing everything can make other issues.


Matz likes the middle of (3) and (4) (not allow everything, but allow
restricted). I like (4).


I'm okay to introduce (1) because it is easy and practical.
If we can make (2)-(4), then we can replace from (1) to new mechanism.

BTW, I'm working on making AOT compilation support (it will be continued
to (3) or (4)). Recent rb_iseq_t changes were for this purpose. So that
prelude.rb is nice benchmark for me.

Thanks,
Koichi

--
// SASADA Koichi at atdot dot net

Updated by normalperson (Eric Wong) about 9 years ago

SASADA Koichi wrote:

On 2015/07/16 4:41, Eric Wong wrote:

wrote:

Feature #11339: [PATCH] io.c: avoid kwarg parsing in C API
https://bugs.ruby-lang.org/issues/11339

Note: I plan to followup commits for other *_nonblock methods
Eventually, I even wish to deprecate rb_scan_args :D

For what it's worth, I'm more excited about this change than usual
and hope to use prelude.rb more.

ko1/nobu/akr/others: any comments on this?

My main concern is increased parse time from prelude during startup;
but we may translate prelude to iseq array and use rb_iseq_load, too.
The parser seems to be the worst offender for startup performance
nowadays.

We have some ideas to solve this issue. We discussed about solutions.

Known problems about C-methods parameters:
(P1) slow to parse kwargs with Hash
(P2) difficult to write scan_args
(P3) C-methods can't support Method#parameters

Thank you for response.

Solutions:

(1) Introduce wrapping Ruby methods into prelude.rb (your idea)
Pros. Easy to introduce.
Solves (P1-3).
Cons. Increase parse time at Ruby launch.

We cannot avoid parsing Ruby :) So I want to try to make parsing faster.
Unfortunately, my parser knowledge is not much right now.

(2) Introduce new API to declare Ruby-like parameters for C-APIs

like: rb_define_method(klass, "xyzzy", klass_xyzzy, -1)

(2-1)
-> rb_defnie_method_??(klass, "xyzzy", klass_xyzzy,
"(m1, m2, o1=nil, o2=nil,
*r, p1, p2, k1: 1, k2: 2)")

OK, I had the same idea like this, too.
But I do not want to introduce a new C API. IMHO, C API should be
smaller, not bigger.

(3) Introduce new IDL (Interface Definition Language)

This may be OK... I don't see a big advantage over (1).

(4) Introduce new IDL like Ricsin

I made a system calls Ricsin, which enable to embed C code into Ruby code.

I think this is too ugly. One reason I like Ruby + (limited) C use is
relatively good separation between the different languages.

Working on C-ext is mostly normal C, and not some weird in-between
thing like Perl XS (gross!). Existing C programmers do not need to
learn a lot of new things to work with current CRuby.

I think it is important that we can use C tools (gdb, ctags, sparse,
etc...) can work without modification. But I still want to reduce C
and use more Ruby[1].

I'm okay to introduce (1) because it is easy and practical.

OK, thank you. I will commit (1) this week and work on more prelude.rb
for other IO/Socket kwargs methods.

If we can make (2)-(4), then we can replace from (1) to new mechanism.

BTW, I'm working on making AOT compilation support (it will be continued
to (3) or (4)). Recent rb_iseq_t changes were for this purpose. So that
prelude.rb is nice benchmark for me.

I want to speed up Ruby parsing + startup in general, too.

Along the lines of AOT: I also consider having something like ccache
(self-managing size, only in $HOME, hashing-based) using rb_iseq_load.

I don't want to pollute users disk with too many compiled files; and it
should use hashing so we may tweak formats/architectures and not worry
about path conflicts with concurrently-installed Ruby versions.

We already have too many bug reports because C-exts/objs get shared.

[1] Fwiw, I like Rubinius philosophy a lot. However, the non-Free
contribution platform and eventual implementation
(slow startup time, "Ruby environment" vs being "another *nix tool"
which CRuby/Perl are) put me off.

Updated by ko1 (Koichi Sasada) about 9 years ago

On 2015/07/16 16:01, Eric Wong wrote:

I'm okay to introduce (1) because it is easy and practical.
OK, thank you. I will commit (1) this week and work on more prelude.rb
for other IO/Socket kwargs methods.

Ah, sorry. I think it is okay.
Matz, what do you think about?

--
// SASADA Koichi at atdot dot net

Updated by matz (Yukihiro Matsumoto) almost 9 years ago

I am OK with the change. I don't really like something like __read_nonblock, but acceptable.

Matz.

Actions #7

Updated by Anonymous almost 9 years ago

  • Status changed from Open to Closed

Applied in changeset r52541.


io.c: avoid kwarg parsing in C API

  • benchmark/bm_io_nonblock_noex2.rb: new benchmark based
    on bm_io_nonblock_noex.rb
  • io.c (io_read_nonblock): move documentation to prelude.rb
    (io_write_nonblock): ditto
    (Init_io): private, internal methods for prelude.rb use only
  • prelude.rb (IO#read_nonblock): wrapper + documentation
    (IO#write_nonblock): ditto
    [ruby-core:71439] [Feature #11339]

rb_scan_args and hash lookups for kwargs in the C API are clumsy and
slow. Instead of improving the C API for performance, use Ruby
instead :)

Implement IO#read_nonblock and IO#write_nonblock in prelude.rb
to avoid argument parsing via rb_scan_args and hash lookups.

This speeds up IO#write_nonblock and IO#read_nonblock benchmarks
in both cases, including the original non-idiomatic case where
the `exception: false' hash is pre-allocated to avoid GC pressure.

Now, writing the kwargs in natural, idiomatic Ruby is fastest.
I've added the noex2 benchmark to show this.

2015-11-12 01:41:12 +0000
target 0: a (ruby 2.3.0dev (2015-11-11 trunk 52540) [x86_64-linux])
target 1: b (ruby 2.3.0dev (2015-11-11 avoid-kwarg-capi 52540)

benchmark results:
minimum results in each 10 measurements.
Execution time (sec)
name a b
io_nonblock_noex 2.508 2.382
io_nonblock_noex2 2.950 1.882

Speedup ratio: compare with the result of `a' (greater is better)
name b
io_nonblock_noex 1.053
io_nonblock_noex2 1.567

Updated by normalperson (Eric Wong) almost 9 years ago

wrote:

I am OK with the change. I don't really like something like
__read_nonblock, but acceptable.

OK, committed as r52541 for now. I don't have a better idea
for hiding other than private + "__" prefix...

Will followup with other IO/Socket/SSL methods

Updated by normalperson (Eric Wong) almost 9 years ago

Eric Wong wrote:

wrote:

I am OK with the change. I don't really like something like
__read_nonblock, but acceptable.

OK, committed as r52541 for now. I don't have a better idea
for hiding other than private + "__" prefix...

Will followup with other IO/Socket/SSL methods

Work-in-progress for rsock_s_recvfrom_nonblock-based methods.
Likely to commit tomorrow...

Minor nit, this may reduce startup performance slightly more than
prelude.rb changes because the RDoc comment stays in socket.rb
instead of getting dropped at compile time:

http://80x24.org/spew/20151112-avoid-arg-parsing-in-rsock_s_recvfrom_nonblock@1/raw

I also noticed I forgot to update some RDoc for r50912;
will adjust in a separate commit.

Updated by normalperson (Eric Wong) almost 9 years ago

Entire series for sockets
http://80x24.org/spew/20151113041012.27235-1-e%4080x24.org/t.mbox.gz

ref: [ruby-core:71439] [Feature #11339]

benchmark/bm_accept_nonblock.rb | 17 ++
benchmark/bm_connect_nonblock.rb | 18 ++
benchmark/bm_recvmsg_nonblock.rb | 16 ++
benchmark/bm_sendmsg_nonblock.rb | 16 ++
ext/socket/ancdata.c | 179 +++------------
ext/socket/basicsocket.c | 73 ++----
ext/socket/init.c | 23 +-
ext/socket/lib/socket.rb | 479 +++++++++++++++++++++++++++++++++++++++
ext/socket/rubysocket.h | 26 +--
ext/socket/socket.c | 203 ++---------------
ext/socket/tcpserver.c | 48 +---
ext/socket/udpsocket.c | 63 +----
ext/socket/unixserver.c | 47 +---
13 files changed, 659 insertions(+), 549 deletions(-)

Will commit soon

Updated by ko1 (Koichi Sasada) almost 9 years ago

On 2015/11/13 13:18, Eric Wong wrote:

benchmark/bm_accept_nonblock.rb | 17 ++
benchmark/bm_connect_nonblock.rb | 18 ++
benchmark/bm_recvmsg_nonblock.rb | 16 ++
benchmark/bm_sendmsg_nonblock.rb | 16 ++

could you consider to add some prefix like "bm_io" prefix?
We can understand what purpose is.

--
// SASADA Koichi at atdot dot net

Updated by normalperson (Eric Wong) almost 9 years ago

SASADA Koichi wrote:

On 2015/11/13 13:18, Eric Wong wrote:

benchmark/bm_accept_nonblock.rb | 17 ++
benchmark/bm_connect_nonblock.rb | 18 ++
benchmark/bm_recvmsg_nonblock.rb | 16 ++
benchmark/bm_sendmsg_nonblock.rb | 16 ++

could you consider to add some prefix like "bm_io" prefix?
We can understand what purpose is.

Ah, sorry, I was actually going to remove those before committing and
only leave the code in the commit message. I don't want to cause
portability problems for people running the suite.

Updated by normalperson (Eric Wong) almost 9 years ago

Eric Wong wrote:

Will followup with other IO/Socket/SSL methods

Done for normal socket, asked about SSL in [ruby-core:71538]

I might ignore optimizing ARGF.read_nonblock(... exception: false)
for now since it (AFAIK) is not used frequently and the extra
methods+parsing time isn't worth it. 2.3 will be the first
version to support "exception: false" on ARGF, even...

Updated by normalperson (Eric Wong) almost 9 years ago

wrote:

https://bugs.ruby-lang.org/issues/11339

For OpenSSL connect_nonblock/accept_nonblock, it seems to be not worth
the effort for a 1% improvement given the overheads of various parts of
OpenSSL. But I'm also not knowledgeable in OpenSSL, so my benchmark is
also likely bogus:

http://80x24.org/spew/20151202020654.18328-1-e%4080x24.org/

Additionally, for read_nonblock; calling read_nonblock/sysread
via rb_funcall from inside ossl_ssl_read_internal would also require
more work to avoid the hash allocation. Not sure if it's worth
the effort at the moment.

Updated by headius (Charles Nutter) almost 9 years ago

I don't usually jump in to grouse about CRuby changes, but this is really gross. We shouldn't be mucking up the core classes like this just to work around a missing optimization in the runtime.

I'd strongly recommend making a better rb_define_method system that allows you to pass keyword arguments directly to C code rather than having to make changes like this all over the place. Here's hoping that happens sooner rather than later and this change is only temporary.

We won't be making this change in JRuby, so our copy of socket.rb will diverge from MRI's. JRuby does not currently do allocation-free keyword arguments, but we will implement it soon for both Ruby targets and native targets.

Updated by headius (Charles Nutter) almost 9 years ago

A suggestion for how to make kwarg-passing to C functions allocation-free: have a thread-local (or perhaps global, since CRuby doesn't run Ruby code in parallel) array you can populate with the key/value pairs coming out of the VM. Methods that want to receive opts directly can specify it through a new rb_define_method form, and the requirement is that they must process those keyword arguments before doing anything else, so that global store can be re-used.

That is just a quick and dirty way to do it, but it would eliminate the need for hacks like this and make kwarg passing to C functions nearly free, as it is from Ruby to Ruby.

Updated by normalperson (Eric Wong) almost 9 years ago

wrote:

A suggestion for how to make kwarg-passing to C functions
allocation-free: have a thread-local (or perhaps global, since CRuby
doesn't run Ruby code in parallel) array you can populate with the
key/value pairs coming out of the VM. Methods that want to receive
opts directly can specify it through a new rb_define_method form, and
the requirement is that they must process those keyword arguments
before doing anything else, so that global store can be re-used.

Using globals or TLS would introduce subtle reentrancy problems
when calls are nested. I don't want to create more C-APIs
we need to support long-term, either.

Likely we will introduce something like prelude.rb into the
extension build system (and/or use the new iseq loader features).
It would speed up load times, too.

JRuby doesn't use our current prelude.rb, I hope.

Fwiw, I like the Rubinius philosophy of using Ruby as much as
possible a lot and tried to contribute there back in the day.

I just do not like their non-Free-service-based development,
C++, the isolated "Ruby environment" model, or slow startup
times.

Unsubscribe:
http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core

Updated by headius (Charles Nutter) almost 9 years ago

Eric Wong wrote:

Using globals or TLS would introduce subtle reentrancy problems
when calls are nested. I don't want to create more C-APIs
we need to support long-term, either.

The global would only be used between the runtime making a call with keyword arguments and the consumption of those arguments for variable initialization in a C method. There's no nesting of calls, and once the C method has consumed the values in the global (i.e. immediately before doing anything else), it is not touched until the next call with keywords from Ruby to C.

JRuby doesn't use our current prelude.rb, I hope.

We use parts of it, to stay in alignment with the load order of things like RubyGems and did_you_mean. We won't use the IO parts.

The socket changes are in socket.rb, which is part of stdlib and we do use it. I'll have to maintain another patch to remove this stuff.

Fwiw, I like the Rubinius philosophy of using Ruby as much as
possible a lot and tried to contribute there back in the day.

Yes, except that this doesn't buy anything because 99% of the logic of these methods still lives within C code. If you were also moving the body of these nonblock methods into Ruby, I'd see some value. You've removed a couple lines of C and added a couple lines of Ruby. And now there's going to be an extra method in stack traces and an extra Ruby frame allocated for every call.

Updated by headius (Charles Nutter) almost 9 years ago

Comparing stack traces:

[] ~/projects/ruby $ ruby23 -rsocket -e "t = TCPSocket.new('google.com', 80); t.read_nonblock(1)"
<internal:prelude>:76:in `__read_nonblock': Resource temporarily unavailable - read would block (IO::EAGAINWaitReadable)
	from <internal:prelude>:76:in `read_nonblock'
	from -e:1:in `<main>'

[] ~/projects/ruby $ ruby22 -rsocket -e "t = TCPSocket.new('google.com', 80); t.read_nonblock(1)"
-e:1:in `read_nonblock': Resource temporarily unavailable - read would block (IO::EAGAINWaitReadable)
	from -e:1:in `<main>'

Updated by normalperson (Eric Wong) almost 9 years ago

wrote:

Eric Wong wrote:

Using globals or TLS would introduce subtle reentrancy problems
when calls are nested. I don't want to create more C-APIs
we need to support long-term, either.

The global would only be used between the runtime making a call with
keyword arguments and the consumption of those arguments for variable
initialization in a C method. There's no nesting of calls, and once
the C method has consumed the values in the global (i.e. immediately
before doing anything else), it is not touched until the next call
with keywords from Ruby to C.

The current rb_get_kwargs() calls may be delayed until the keyword is
actually needed (to avoid unnecessary hash lookups).

Anyways, if somebody can design a good API for internal use,
we can use it. Current rb_get_kwargs() and even rb_scan_args()
are inefficient and even error-prone:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/68507

Fwiw, I like the Rubinius philosophy of using Ruby as much as
possible a lot and tried to contribute there back in the day.

Yes, except that this doesn't buy anything because 99% of the logic of
these methods still lives within C code. If you were also moving the
body of these nonblock methods into Ruby, I'd see some value. You've
removed a couple lines of C and added a couple lines of Ruby. And now
there's going to be an extra method in stack traces and an extra Ruby
frame allocated for every call.

It's still a work-in-progress, obviously; but the current state
already speeds things up for `exception: false' users.
Implementation details will change, of course; but we need
freedom to change them by having a smaller public C API.

Unsubscribe:
http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core

Updated by headius (Charles Nutter) almost 9 years ago

Eric Wong wrote:

Anyways, if somebody can design a good API for internal use,
we can use it. Current rb_get_kwargs() and even rb_scan_args()
are inefficient and even error-prone:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/68507

Something along the lines of vm_get_current_kwarg(name) could simply use existing interpreter structures for managing in-flight keyword arguments.

It's still a work-in-progress, obviously; but the current state
already speeds things up for `exception: false' users.
Implementation details will change, of course; but we need
freedom to change them by having a smaller public C API.

I just hope this isn't a trend, since it's a pretty ugly way to work around keyword arguments not optimizing well for calls from Ruby to C. Imagine a world where every core method that accepts keywords went the same route. I think we want a better option before that happens.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0