Project

General

Profile

Actions

Bug #19470

open

Frequent small range-reads from and then writes to a large array are very slow

Added by giner (Stanislav German-Evtushenko) about 1 year ago. Updated about 1 year ago.

Status:
Open
Assignee:
-
Target version:
-
ruby -v:
ruby 3.2.1 (2023-02-08 revision 31819e82c8) [x86_64-linux]
[ruby-core:112638]

Description

Write to a large array gets very slow when done after range-reading more than 3 items. In such case the original array gets marked as shared which triggers CoW on a small change afterwards. This leads to a significant performance impact and high memory utilization in cases when we need to range-read/write from/to the same array many times. While this issue can be avoided by reading <= 3 elements at a time the main problem is that this behaviour is not obvious and hard to catch on on-trivial projects.

times = []
arr = [0] * 100000

times.push 0
100000.times do
  time_start = Time.now
  arr[5] = 100  # takes 0.01662315899999512
  times[-1] += Time.now - time_start
end

times.push 0
100000.times do
  arr[0..2]
  time_start = Time.now
  arr[5] = 100  # takes 0.01826406799999659
  times[-1] += Time.now - time_start
end

times.push 0
100000.times do
  arr[0..3]
  time_start = Time.now
  arr[5] = 100  # takes 7.757753919000069
  times[-1] += Time.now - time_start
end

times.push 0
100000.times do
  arr.dup
  time_start = Time.now
  arr[5] = 100  # takes 7.626929300999957
  times[-1] += Time.now - time_start
end

times.push 0
100000.times do
  arr.clone
  time_start = Time.now
  arr[5] = 100  # takes 8.216933763000046
  times[-1] += Time.now - time_start
end

p times

Updated by mame (Yusuke Endoh) about 1 year ago

We the core team know the potential problem with the memory sharing technique. We could be wrong, but we currently believe its advantage (performance improvement in many typical and major cases) is larger enough than its disadvantage (potential performance degeneration in some rare cases).

The question is whether the potential problem is really rare. Do you have any evidence that this problem actually occurs frequently?

Updated by ioquatix (Samuel Williams) about 1 year ago

It's interesting to me that reading items can mark the array as shared, that seems non-trivial/non-obvious behaviour to me. What's the rational for it?

Updated by ioquatix (Samuel Williams) about 1 year ago

Sorry, I read the example more closely, it seems we are taking a small slice, which marks the original array as shared. So both the original array and the slice point at the same memory allocation, and on mutation, a copy is made. I think @mame (Yusuke Endoh) is right, in general this is a good optimisation. But I see the problem, there is a degenerate case here.

I wonder if we can propose some way to avoid this degenerate case if the mutation is frequent or the slice should definitely be a copy rather than invoke the CoW. Maybe a percentage based approach, or maybe we can detect the CoW frequently triggering a full copy = don't do further CoW.

Updated by Hanmac (Hans Mackowiak) about 1 year ago

i think maybe the GC could optimize something?

100000.times do
  arr[0..3] #=> GC notices that the result is unused?
  GC.stress # would the GC delete the shared copy?
  time_start = Time.now
  arr[5] = 100  # takes ?
  times[-1] += Time.now - time_start
end

if the GC could recognize that the slice of the array isn't assigned somewhere and would remove it
is the structure of the array object smart enough that there isn't any shared pieces anymore?

or would we need a shared count for this instead?

if the main array can notice that it isn't shared anymore because all the ones that shared it are gone, would it be smart enough to not make a copy?

Updated by giner (Stanislav German-Evtushenko) about 1 year ago

Do you have any evidence that this problem actually occurs frequently?

I cannot say whether this affects many projects. I've encountered this twice so far when optimizing code to reduce memory allocations (by writing to the same array instead of allocating new ones). In both cases it took quite a while to figure out why instead of being faster and using less memory it did the opposite, i.e. became slower and used more memory.

Updated by headius (Charles Nutter) about 1 year ago

This looks like a pretty typical effect of slicing, so I'm not sure there's something to "fix" here. JRuby has similar behavior.

Improving the sharing heuristic might help, but it's hard to know where to draw the line; should a slice of 5 elements trigger copy? Ten elements? Perhaps only share when slice size is greater than some percentage of the total size?

Perhaps we should make COW less hidden? Add something like Array#fresh_slice to opt into non-COW slicing when you know you're still mutating the original?

Updated by giner (Stanislav German-Evtushenko) about 1 year ago

Perhaps we should make COW less hidden? Add something like Array#fresh_slice to opt into non-COW slicing when you know you're still mutating the original?

Sounds like a good compromise

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0