Bug #19470: Frequent small range-reads from and then writes to a large array are very slow - Ruby - Ruby Issue Tracking System

Actions

Updated by mame (Yusuke Endoh) over 2 years ago

We the core team know the potential problem with the memory sharing technique. We could be wrong, but we currently believe its advantage (performance improvement in many typical and major cases) is larger enough than its disadvantage (potential performance degeneration in some rare cases).

The question is whether the potential problem is really rare. Do you have any evidence that this problem actually occurs frequently?

Actions

Copy link

#2 [ruby-core:112640]

Updated by ioquatix (Samuel Williams) over 2 years ago

It's interesting to me that reading items can mark the array as shared, that seems non-trivial/non-obvious behaviour to me. What's the rational for it?

Actions

Copy link

#3 [ruby-core:112641]

Updated by ioquatix (Samuel Williams) over 2 years ago

Sorry, I read the example more closely, it seems we are taking a small slice, which marks the original array as shared. So both the original array and the slice point at the same memory allocation, and on mutation, a copy is made. I think @mame (Yusuke Endoh) is right, in general this is a good optimisation. But I see the problem, there is a degenerate case here.

I wonder if we can propose some way to avoid this degenerate case if the mutation is frequent or the slice should definitely be a copy rather than invoke the CoW. Maybe a percentage based approach, or maybe we can detect the CoW frequently triggering a full copy = don't do further CoW.

Actions

Copy link

#4 [ruby-core:112644]

Updated by Hanmac (Hans Mackowiak) over 2 years ago

i think maybe the GC could optimize something?

100000.times do
  arr[0..3] #=> GC notices that the result is unused?
  GC.stress # would the GC delete the shared copy?
  time_start = Time.now
  arr[5] = 100  # takes ?
  times[-1] += Time.now - time_start
end

if the GC could recognize that the slice of the array isn't assigned somewhere and would remove it
is the structure of the array object smart enough that there isn't any shared pieces anymore?

or would we need a shared count for this instead?

if the main array can notice that it isn't shared anymore because all the ones that shared it are gone, would it be smart enough to not make a copy?

Actions

Copy link

#5 [ruby-core:112651]

Updated by giner (Stanislav German-Evtushenko) over 2 years ago

Do you have any evidence that this problem actually occurs frequently?

I cannot say whether this affects many projects. I've encountered this twice so far when optimizing code to reduce memory allocations (by writing to the same array instead of allocating new ones). In both cases it took quite a while to figure out why instead of being faster and using less memory it did the opposite, i.e. became slower and used more memory.

Actions

Copy link

#6 [ruby-core:112670]

Updated by headius (Charles Nutter) over 2 years ago

This looks like a pretty typical effect of slicing, so I'm not sure there's something to "fix" here. JRuby has similar behavior.

Improving the sharing heuristic might help, but it's hard to know where to draw the line; should a slice of 5 elements trigger copy? Ten elements? Perhaps only share when slice size is greater than some percentage of the total size?

Perhaps we should make COW less hidden? Add something like Array#fresh_slice to opt into non-COW slicing when you know you're still mutating the original?

Actions

Copy link

#7 [ruby-core:112709]

Updated by giner (Stanislav German-Evtushenko) over 2 years ago

Perhaps we should make COW less hidden? Add something like Array#fresh_slice to opt into non-COW slicing when you know you're still mutating the original?

Sounds like a good compromise

Project

General

Profile

Ruby

Tags

Custom queries

Bug #19470

Frequent small range-reads from and then writes to a large array are very slow

Updated by mame (Yusuke Endoh) over 2 years ago

Updated by ioquatix (Samuel Williams) over 2 years ago

Updated by ioquatix (Samuel Williams) over 2 years ago

Updated by Hanmac (Hans Mackowiak) over 2 years ago

Updated by giner (Stanislav German-Evtushenko) over 2 years ago

Updated by headius (Charles Nutter) over 2 years ago

Updated by giner (Stanislav German-Evtushenko) over 2 years ago