Project

General

Profile

Feature #16119

Optimize Array#flatten and flatten! for already flattened arrays

Added by dylants (Dylan Thacker-Smith) 25 days ago. Updated 24 days ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:94501]

Description

Problem

When doing an object profile from stackprof, I noticed object allocations being made from Array#flatten! which was unlike other in-place methods like Array#uniq! and Array#compact!. In this case, I wanted to optimize for the array already being flattened and found that ary.flatten! if ary.any?(Array) was significantly faster. The code confirmed my suspicion that ary.flatten! was almost equivalent to ary.replace(ary.flatten) in implementation. The object allocations I noticed were from a temporary result array, a stack array to handle nesting and a hash table to prevent infinite recursion, all of which can be avoided in the simple case of an already flattened array.

Solution

Iterate over the array to find the first nested array. If no nested array is found, then flatten! just returns nil without creating additional objects and flatten returns a shared copy of the original array. If a nested array is found, then it creates and initializes the temporary objects to resume with the existing algorithm for flattening the array.

Benchmark

require 'benchmark'

N = 100000

def report(x, name)
  x.report(name) do
    N.times do
      yield
    end
  end
end

arrays = {
  small_flat_ary: 5.times.to_a,
  large_flat_ary: 100.times.to_a,
  small_pairs_ary: [[1, 2]] * 5,
  large_pairs_ary: [[1, 2]] * 100,
}

Benchmark.bmbm do |x|
  arrays.each do |name, ary|
    report(x, "#{name}.flatten!") do
      ary.flatten!
    end
    report(x, "#{name}.flatten") do
      ary.flatten
    end
  end
end

results on the latest master (ruby 2.7.0dev (2019-08-22T14:10:55Z master fd20b32130) [x86_64-darwin18])

                               user     system      total        real
small_flat_ary.flatten!    0.082001   0.000294   0.082295 (  0.082685)
small_flat_ary.flatten     0.078655   0.000211   0.078866 (  0.079176)
large_flat_ary.flatten!    0.552551   0.001643   0.554194 (  0.556166)
large_flat_ary.flatten     0.551520   0.001327   0.552847 (  0.554091)
small_pairs_ary.flatten!   0.100073   0.000302   0.100375 (  0.100687)
small_pairs_ary.flatten    0.109440   0.000232   0.109672 (  0.109850)
large_pairs_ary.flatten!   1.021200   0.001650   1.022850 (  1.024227)
large_pairs_ary.flatten    1.049046   0.002938   1.051984 (  1.055228)

results with the attached patch

                               user     system      total        real
small_flat_ary.flatten!    0.034868   0.000150   0.035018 (  0.035180)
small_flat_ary.flatten     0.040504   0.000148   0.040652 (  0.040914)
large_flat_ary.flatten!    0.458273   0.000786   0.459059 (  0.460005)
large_flat_ary.flatten     0.453846   0.000833   0.454679 (  0.455324)
small_pairs_ary.flatten!   0.055211   0.000205   0.055416 (  0.055673)
small_pairs_ary.flatten    0.060157   0.000226   0.060383 (  0.060540)
large_pairs_ary.flatten!   0.901698   0.001650   0.903348 (  0.905246)
large_pairs_ary.flatten    0.917180   0.001370   0.918550 (  0.920008)

Files

0001-Optimize-Array-flatten-and-flatten-for-already-fl.diff.txt (2.79 KB) 0001-Optimize-Array-flatten-and-flatten-for-already-fl.diff.txt [PATCH] Optimize Array#flatten and flatten! for already flattened arrays dylants (Dylan Thacker-Smith), 08/22/2019 11:54 PM

History

Updated by duerst (Martin Dürst) 25 days ago

I was afraid that this would be an optimization for flat arrays, but increase time for nested arrays. But that's not the case, because at the bottom, there will be flat arrays, and flattening these will be faster.

However, I expect this to be slower on arrays that are 'almost flat', i.e.

almost_flat_ary = 100.times.to_a + [1, 2]

Putting the nested array at the end will make sure all elements of the big array are checked, only to discover that actual flattening work is needed. The time needed for checking will not be offset by the allocation savings on the small array at the end.

Still I think that in general, this should be faster, and so it should be worth accepting this patch.

Updated by shevegen (Robert A. Heiler) 25 days ago

I use .flatten and .flatten! quite a lot in my ruby code, often setting the
initial input from ARGV but also from other sources, such as used from other
ruby classes, w hen I need to have an array - often something like:

def foobar(input)
  new_value = [input].flatten.compact

Or something like that. If this patch indeed improves this then that would
be quite nice to have.

Updated by Hanmac (Hans Mackowiak) 25 days ago

i see in your patch that you not only check for Array but for to_ary too, which is nice

shevegen (Robert A. Heiler) :

instead of [input] i would use Array(input) that doesn't create an extra array if input is already one

Updated by dylants (Dylan Thacker-Smith) 24 days ago

duerst (Martin Dürst) wrote:

However, I expect this to be slower on arrays that are 'almost flat', i.e.

almost_flat_ary = 100.times.to_a + [1, 2]

Putting the nested array at the end will make sure all elements of the big array are checked, only to discover that actual flattening work is needed. The time needed for checking will not be offset by the allocation savings on the small array at the end.

The optimization handles this case by doing a quick ary_memcpy of everything up to the first nested array to avoid redundantly rechecking if those preceding elements were an array.

I benchmarked it by replacing arrays and Benchmark.bmbm in my above bechmark script with

ary = 100.times.to_a.push([101, 102])

Benchmark.bmbm do |x|
  report(x, "flatten!") do
    ary.flatten!
  end
  report(x, "flatten") do
    ary.flatten
  end
end

and got the following results on master

               user     system      total        real
flatten!   0.577976   0.002275   0.580251 (  0.582700)
flatten    0.542454   0.001994   0.544448 (  0.546523)

and the following results with this patch

               user     system      total        real
flatten!   0.420922   0.001052   0.421974 (  0.422957)
flatten    0.430728   0.001173   0.431901 (  0.433274)

so I actually noticed improved performance for arrays with a large flat prefix.

Updated by dylants (Dylan Thacker-Smith) 24 days ago

Hanmac (Hans Mackowiak) wrote:

i see in your patch that you not only check for Array but for to_ary too, which is nice

Right, but that is just preserving the current behaviour. So the ary.flatten! if ary.any?(Array) workaround could actually be a breaking change if the given array had elements that responded to to_ary other than Array.

Also available in: Atom PDF