Project

General

Profile

Feature #15010

Reduce allocation for rest parameters

Added by chopraanmol1 (Anmol Chopra) about 1 year ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:88555]

Description

Currently multiple arrays are allocated while making a call to method with rest parameter.

E.g.

def rest_method(*args) #-> This will create 2 arrays
end

def post_method(*args,last) #-> This will create 3 arrays
end

Applying following set of changes will reduce creation of array to 1

https://github.com/ruby/ruby/pull/1935

Benchmark Result:

trunk

                                  user     system      total        real
benchmark_method              0.340000   0.000000   0.340000 (  0.337035)
rest_method                   0.964000   0.000000   0.964000 (  0.964660)
lead_method                   0.976000   0.000000   0.976000 (  0.976011)
post_method                   2.424000   0.000000   2.424000 (  2.421732)
lead_post_method              1.800000   0.000000   1.800000 (  1.799500)
rest_with_named_parameter     2.040000   0.000000   2.040000 (  2.040323)
lead_proc underflow_args      1.224000   0.000000   1.224000 (  1.225237)
opt_post_proc overflow_args   1.056000   0.000000   1.056000 (  1.057402)

modified

                                  user     system      total        real
benchmark_method              0.336000   0.000000   0.336000 (  0.336911)
rest_method                   0.708000   0.000000   0.708000 (  0.706142)
lead_method                   0.720000   0.000000   0.720000 (  0.717971)
post_method                   1.896000   0.000000   1.896000 (  1.894426)
lead_post_method              1.560000   0.000000   1.560000 (  1.560495)
rest_with_named_parameter     1.464000   0.000000   1.464000 (  1.467313)
lead_proc underflow_args      0.864000   0.000000   0.864000 (  0.863980)
opt_post_proc overflow_args   0.772000   0.000000   0.772000 (  0.770364)

Files

bench_method_arg.rb (1.32 KB) bench_method_arg.rb chopraanmol1 (Anmol Chopra), 08/20/2018 10:14 AM
Reduce-allocation-for-rest-parameters-v2.patch (3.57 KB) Reduce-allocation-for-rest-parameters-v2.patch Patch 2 chopraanmol1 (Anmol Chopra), 08/21/2018 07:34 AM
bench_method_arg_v2.rb (3.15 KB) bench_method_arg_v2.rb chopraanmol1 (Anmol Chopra), 08/21/2018 07:34 AM
Reduce-allocation-for-rest-parameters-v1.patch (5.43 KB) Reduce-allocation-for-rest-parameters-v1.patch Patch 1 chopraanmol1 (Anmol Chopra), 08/27/2018 09:36 AM

Associated revisions

Revision 1f4efb9a
Added by ko1 (Koichi Sasada) about 1 year ago

rest parameter optimization [Feature #15010]

  • vm_args.c: rb_ary_dup(args->rest) to be used at most once during
    parameter setup. [Feature #15010]
    A patch by chopraanmol1 (Anmol Chopra) chopraanmol1@gmail.com.

  • array.c (rb_ary_behead): added to remove first n elements.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64583 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 64583
Added by ko1 (Koichi Sasada) about 1 year ago

rest parameter optimization [Feature #15010]

  • vm_args.c: rb_ary_dup(args->rest) to be used at most once during
    parameter setup. [Feature #15010]
    A patch by chopraanmol1 (Anmol Chopra) chopraanmol1@gmail.com.

  • array.c (rb_ary_behead): added to remove first n elements.

Revision 64583
Added by ko1 (Koichi Sasada) about 1 year ago

rest parameter optimization [Feature #15010]

  • vm_args.c: rb_ary_dup(args->rest) to be used at most once during
    parameter setup. [Feature #15010]
    A patch by chopraanmol1 (Anmol Chopra) chopraanmol1@gmail.com.

  • array.c (rb_ary_behead): added to remove first n elements.

History

Updated by mame (Yusuke Endoh) about 1 year ago

Looks good to me. Though destructive operation to the rest array may make the source code unclear, performance is more important in this case, I think.

Some other functions in vm_args.c also use rb_ary_dup. There may be more room to optimize.

Updated by chopraanmol1 (Anmol Chopra) about 1 year ago

mame (Yusuke Endoh) wrote:

Some other functions in vm_args.c also use rb_ary_dup. There may be more room to optimize.

Yes, it can be further optimized for keyword argument and argument setup for the block. I'll modify the patch in a day or two.

Updated by normalperson (Eric Wong) about 1 year ago

chopraanmol1@gmail.com wrote:

Yes, it can be further optimized for keyword argument and argument setup for the block. I'll modify the patch in a day or two.

Cool! It's probably worth implementing something like
rb_ary_shift_m (but without the return value) to avoid looping
on rb_ary_shift.

#4

Updated by chopraanmol1 (Anmol Chopra) about 1 year ago

Updated by chopraanmol1 (Anmol Chopra) about 1 year ago

normalperson (Eric Wong) wrote:

Cool! It's probably worth implementing something like
rb_ary_shift_m (but without the return value) to avoid looping
on rb_ary_shift.

Added rb_ary_clear_m (suggestion for a better name will be appreciated) with suggested changes.

mame (Yusuke Endoh) wrote:

Some other functions in vm_args.c also use rb_ary_dup. There may be more room to optimize.

Modified patch to ensure rb_ary_dup is called at most once.

#6

Updated by chopraanmol1 (Anmol Chopra) about 1 year ago

  • File deleted (improve_rest_parameters_setup_20_8_2018.patch)
  • File deleted (improve_rest_parameters_setup.patch)
  • File 0001-Reduce-allocation-for-rest-parameters.patch added

Updated by normalperson (Eric Wong) about 1 year ago

chopraanmol1@gmail.com wrote:

normalperson (Eric Wong) wrote:

Cool! It's probably worth implementing something like
rb_ary_shift_m (but without the return value) to avoid looping
on rb_ary_shift.

Added rb_ary_clear_m (suggestion for a better name will be
appreciated) with suggested changes.

Thanks; for internal functions the name isn't as important :)

New functions prototypes should go into internal.h, though.
ruby/intern.h ended up being part of the public API and external
libraries depend on it :<

There's no reason for arg_rest_dup to be a macro instead of a
static inline function. Static inlines are preferred because
they make life easier for the compiler and debugger.

Also, multi-line macros without "do {} while (0)" is dangerous
to control flow.

Thanks again.

Updated by mame (Yusuke Endoh) about 1 year ago

Thank you, too. Two points:

First, the prefix _m is often used for an entry function of Ruby-level method that is passed to rb_define_method. Though it is just an internal function, it would be better to avoid the prefix. How about rb_ary_remove_first?

Second, I agree with ensuring rb_ary_dup is called at most once. But I'm afraid if rewriting the array without dup may cause obscure incompatibility. It is difficult for me to review your patch.

ko1 (Koichi Sasada), could you review this?

Updated by nobu (Nobuyoshi Nakada) about 1 year ago

normalperson (Eric Wong) wrote:

Added rb_ary_clear_m (suggestion for a better name will be
appreciated) with suggested changes.

Thanks; for internal functions the name isn't as important :)

What about rb_ary_clear_head? (or rb_ary_behead :)

#10

Updated by chopraanmol1 (Anmol Chopra) about 1 year ago

  • File deleted (0001-Reduce-allocation-for-rest-parameters.patch)
  • File 0001-Reduce-allocation-for-rest-parameters.patch added

Updated by chopraanmol1 (Anmol Chopra) about 1 year ago

normalperson (Eric Wong) wrote:

New functions prototypes should go into internal.h, though.
ruby/intern.h ended up being part of the public API and external
libraries depend on it :<

There's no reason for arg_rest_dup to be a macro instead of a
static inline function. Static inlines are preferred because
they make life easier for the compiler and debugger.

Updated.

mame (Yusuke Endoh) wrote:

First, the prefix _m is often used for an entry function of Ruby-level method that is passed to rb_define_method. Though it is just an internal function, it would be better to avoid the prefix. How about rb_ary_remove_first?

For now, I'm renaming the method to rb_ary_behead (suggested by nobu)

Updated by chopraanmol1 (Anmol Chopra) about 1 year ago

I'm also thinking of an alternate solution which will avoid passing the skip_dup_flag variable around, If we can ensure that args->rest is not used/assigned until args_copy is called. To do this when VM_CALL_ARGS_SPLAT flag is on instead of assigning args->rest we could expand the splat arg to locals / args->argv.

Unless it breaks test beyond repair, I'll add this alternate patch with the respective benchmark(It probably will be slower for the large array), so it can be compared side by side. In this solution, args_setup_post_parameters can be further modified to use args->argv instead of args->rest which makes zero allocation for the following example:

def opt_post(a,b,c=1,d=2,e,f); end

Updated by chopraanmol1 (Anmol Chopra) about 1 year ago

normalperson (Eric Wong) wrote:

New functions prototypes should go into internal.h, though.
ruby/intern.h ended up being part of the public API and external
libraries depend on it :<

Moving method to internal.h breaks jit https://travis-ci.org/ruby/ruby/builds/418529271 , I'm not sure how to fix this failure.

Updated by nobu (Nobuyoshi Nakada) about 1 year ago

chopraanmol1 (Anmol Chopra) wrote:

Moving method to internal.h breaks jit https://travis-ci.org/ruby/ruby/builds/418529271 , I'm not sure how to fix this failure.

Define the function with MJIT_FUNC_EXPORTED.

Updated by chopraanmol1 (Anmol Chopra) about 1 year ago

chopraanmol1 (Anmol Chopra) wrote:

I'm also thinking of an alternate solution which will avoid passing the skip_dup_flag variable around, If we can ensure that args->rest is not used/assigned until args_copy is called. To do this when VM_CALL_ARGS_SPLAT flag is on instead of assigning args->rest we could expand the splat arg to locals / args->argv.

Implementation: https://github.com/ruby/ruby/compare/trunk...chopraanmol1:improve_rest_parameters_setup_v2

Benchmark result

                               trunk        patch 1       patch 2
benchmark_method               0.196346     0.197841      0.196466
rest_method                    0.788287     0.539768      0.535512
lead_method                    0.792892     0.547752      0.533818
post_method                    1.133035     0.636972      0.540609
lead_post_method               0.867869     0.709440      0.370182
benchmark_method *args         0.227389     0.230066      0.227671
rest_method *args              0.826490     0.559881      0.563779
lead_method *args              0.821036     0.602590      0.565583
post_method *args              1.157621     0.649459      0.570189
lead_post_method *args         1.064632     0.687248      0.387054
rest_method *long_args         0.985696     0.766369      0.779729
lead_method *long_args         0.997824     0.870107      0.794615
post_method *long_args         1.703731     0.863923      0.813282
lead_post_method *long_args    1.707543     0.989116      0.802757
rest_with_named_parameter      1.862414     1.293406      1.255951
bench proc                     0.275176     0.263893      0.260555
lead_proc underflow_args       1.149043     0.801893      0.363017
opt_post_proc overflow_args    1.025754     0.717966      0.312920

chopraanmol1 (Anmol Chopra) wrote:

In this solution, args_setup_post_parameters can be further modified to use args->argv instead of args->rest which makes zero allocation for the following example:

args->argv and locals are pointing to same address so it is not feasible.

Note: This second patch is not the final implementation, there are few more changes. In second patch args_check_block_arg0/args_setup_opt_parameters function can still assign args->rest / modify args->rest_index, I'll look into this later (only if the second patch seems more acceptable over first) if it can be completely avoided. In case it can be avoided most of the method handling args->rest can be cleaned after that, which will also ensure that args->rest_index is never modified. As a result, we could even avoid calling rb_ary_behead.

#16

Updated by chopraanmol1 (Anmol Chopra) about 1 year ago

Updated by chopraanmol1 (Anmol Chopra) about 1 year ago

Limitation of patch 2.

  1. Patch 2 gets slower than Patch 1 for a large array. Array with length 100 - 200 have similar performance but beyond that patch 1 is faster in most of the case.

  2. Patch 2 results in segmentation fault for following: https://github.com/ruby/ruby/blob/8e66ffc1d756c42ee025a56672ad71f2200ca6be/test/ruby/test_method.rb#L951

Ignoring above limitation patch 2 do perform better for the small array. One Hack Solution can be to check the length of splat arg against some arbitrary number to decide if splat arg should be expanded to args->argv or should be assigned to args->rest. But it doesn't sound like a nice solution.

I'm not able to reproduce benchmark result for Patch 2. Even for splat args with size under 100 patch 2 has similar performance to patch 1. Patch 2 is only significantly faster on lead_proc underflow_args benchmark. Given the above limitation patch 2 is not worth it.

Updated by ko1 (Koichi Sasada) about 1 year ago

sorry, which patch should I review?

Updated by chopraanmol1 (Anmol Chopra) about 1 year ago

ko1 (Koichi Sasada) wrote:

sorry, which patch should I review?

Reduce-allocation-for-rest-parameters-v1.patch

#20

Updated by chopraanmol1 (Anmol Chopra) about 1 year ago

  • File Reduce-allocation-for-rest-parameters-v1.patch added
#21

Updated by chopraanmol1 (Anmol Chopra) about 1 year ago

  • File deleted (Reduce-allocation-for-rest-parameters-v1.patch)

Updated by ko1 (Koichi Sasada) about 1 year ago

Sorry for late response.

Idea (as my understanding)

~a rest parameter" is dup multiple times because of current implementation. Only 1 "dup" is needed. They should be eliminate.
The patch try to manage "dup'ed or not" by passing skip_rest_ary_dup, and if it is true, then we don't need to dup the rest parameter again.

Comment

I'm fine to introduce your idea.
Why don't you put a new field in args_info?

Updated by chopraanmol1 (Anmol Chopra) about 1 year ago

ko1 (Koichi Sasada) wrote:

Idea (as my understanding)

~a rest parameter" is dup multiple times because of current implementation. Only 1 "dup" is needed. They should be eliminate.
The patch try to manage "dup'ed or not" by passing skip_rest_ary_dup, and if it is true, then we don't need to dup the rest parameter again.

Yes, and once a rest parameter is duped it mutates the array in case if rest_index is modified (Previously, only args_setup_post_parameters used to mutate rest parameter).

Comment

I'm fine to introduce your idea.
Why don't you put a new field in args_info?

This suggestion makes a lot of sense as it will simplify this patch, I'll update the patch soon to reflect this.

#25

Updated by chopraanmol1 (Anmol Chopra) about 1 year ago

Updated by ko1 (Koichi Sasada) about 1 year ago

It seems fine.
I'll commit it.

#28

Updated by ko1 (Koichi Sasada) about 1 year ago

  • Status changed from Open to Closed

Applied in changeset trunk|r64583.


rest parameter optimization [Feature #15010]

  • vm_args.c: rb_ary_dup(args->rest) to be used at most once during
    parameter setup. [Feature #15010]
    A patch by chopraanmol1 (Anmol Chopra) chopraanmol1@gmail.com.

  • array.c (rb_ary_behead): added to remove first n elements.

Also available in: Atom PDF