Project

General

Profile

Actions

Bug #20255

closed

Embedded arrays aren't moved correctly across ractors

Added by luke-gru (Luke Gruber) over 1 year ago. Updated about 1 month ago.


Description

ractor.send(ary, move: true) works incorrectly because if ary is embedded, the new moved object doesn't populate its own embedded space, it uses the MovedObject's embedded space.

example:

r = Ractor.new {
  inner_ary = receive
  values = {}
  values[:equal] = (inner_ary == ["",{},2,3,4,5,6])
  values[:string] = inner_ary.to_s
  values
}
ary = [String.new,Hash.new,2,3,4,5,6]
r.send(ary, move: true)
r_values = r.take
p r_values[:equal]
p r_values[:string]

# => false
# => "[\"\", {}, 2, 2.0, 21747991570, String, 3]"


Related issues 1 (0 open1 closed)

Related to Ruby - Bug #20165: Ractors moving a Struct breaks beyond the first 3 fieldsClosedractorActions

Updated by ko1 (Koichi Sasada) over 1 year ago

  • Assignee set to ko1 (Koichi Sasada)
Actions #3

Updated by hsbt (Hiroshi SHIBATA) about 1 year ago

  • Status changed from Open to Assigned
Actions #4

Updated by byroot (Jean Boussier) about 2 months ago

  • Status changed from Assigned to Closed

Applied in changeset git|0350290262ea0fbc4e1807901797ee8a6970c2b9.


Ractor: Fix moving embedded objects

[Bug #20271]
[Bug #20267]
[Bug #20255]

rb_obj_alloc(RBASIC_CLASS(obj)) will always allocate from the basic
40B pool, so if obj is larger than 40B, we'll create a corrupted
object when we later copy the shape_id.

Instead we can use the same logic than ractor copy, which is
to use rb_obj_clone, and later ask the GC to free the original
object.

We then must turn it into a T_OBJECT, because otherwise
just changing its class to RactorMoved leaves a lot of
ways to keep using the object, e.g.:

a = [1, 2, 3]
Ractor.new{}.send(a, move: true)
[].concat(a) # Should raise, but wasn't.

If it turns out that rb_obj_clone isn't performant enough
for some uses, we can always have carefully crafted specialized
paths for the types that would benefit from it.

Updated by luke-gru (Luke Gruber) about 2 months ago · Edited

Hi Jean, thanks for taking care of this issue. Your change makes sense to me but I have a small concern regarding the potential call to initialize_clone. When allowing user-defined hooks like this in Ractor move logic, the user can add a non-shareable object. For example:

a = [1, 2, 3]
NON_SHAREABLE = Object.new
p "outside: #{NON_SHAREABLE}"
def a.initialize_clone(other)
  super
  instance_variable_set("@non_shareable", NON_SHAREABLE)
end
r = Ractor.new do
  obj = Ractor.receive
  p "inside: #{obj.instance_variable_get("@non_shareable")}"
end
r.send(a, move: true)
r.take

One fix could be to check if there is a user-defined clone method for the object, and if so we need to traverse it again to either move any new objects or raise an error. We could disallow user-defined clone methods, but I don't know if that would be okay for users.

Updated by byroot (Jean Boussier) about 2 months ago

@luke-gru indeed.

Updated by luke-gru (Luke Gruber) about 2 months ago · Edited

This change seems to have a regression in this case as well.

a = Object.new
b = Object.new
p "outside: #{b}"
a.instance_variable_set("@b", b)

r = Ractor.new do
  obj = Ractor.receive
  p "inside: #{obj}"
  p "inside: #{obj.instance_variable_get("@b")}" # we get an error here now
end
r.send(a, move: true)
r.take

Updated by byroot (Jean Boussier) about 2 months ago

For context, even though I didn't precisely predict these two cases, I was expecting initialize_cone to be problematic. In the move: true case we certainly shouldn't invoke it.

But also, this patch isn't really meant as a final solution. Longer term I'd like to rely on the GC compaction routine to do the moving, but short term fixing this memory corruption issue, even in a non-ideal way, allow to unblock further efforts.

I'll see about cloning without invoking the callbacks as another short term fix.

Updated by byroot (Jean Boussier) about 2 months ago

Also I don't quite get why this is an issue with move: true but isn't when doing a copy.

Updated by luke-gru (Luke Gruber) about 2 months ago

Yes, copy as well is a problem. The best solution is to not allow these callbacks, because even if they run and you try to traverse the graph again to error or copy/move it, the callback could have spawned a thread that sets an unshareable on the object and then we're back to square one.

Updated by byroot (Jean Boussier) about 1 month ago

The best solution is to not allow these callbacks,

So it's not actually possible, because many types rely on it. e.g:

class Array
  def initialize_clone(o)
    # noop
  end
end

p [1, 2, 3, 4, 5].clone # => []

Updated by luke-gru (Luke Gruber) about 1 month ago

Yeah I was afraid that would be the case, but we could differentiate between user-defined clone callbacks and builtin callbacks. Either way we can probably worry about this later, I was just bringing it up in case you hadn't thought about it already.

Updated by byroot (Jean Boussier) about 1 month ago

Yeah, it's really tricky. There has been some talks of just removing the move: true capability.

Given: https://bugs.ruby-lang.org/issues/21208, I'm not sure it can ever be made correct, hence it might not be worth trying to improve it.

Updated by luke-gru (Luke Gruber) about 1 month ago

That would be fine with me, because actually move: true doesn't really "move" the object anymore, it just deep copies it since your change. It's not a less expensive operation anymore, so we may as well keep the original object around if invalidating it could cause issues.

Updated by byroot (Jean Boussier) about 1 month ago

doesn't really "move" the object anymore, it just deep copies it since your change. It's not a less expensive operation anymor

As mentioned previously, this isn't meant as an end state, but as a quick fix to allow further experimentation.

But either way, even with a real "move" we still need to allocate new object slots and traverse the object graph to update the references, so I don't think it ever was significantly cheaper than move: false.

Updated by luke-gru (Luke Gruber) about 1 month ago

I reread my message and I didn't mean to come across like I disagreed with your changes, because I don't :)

Updated by byroot (Jean Boussier) about 1 month ago

No offense taken.

Updated by byroot (Jean Boussier) about 1 month ago

For the record, I went back to a lower level copying code, but made it size pool aware: https://github.com/ruby/ruby/pull/13070

Actions #20

Updated by byroot (Jean Boussier) 8 days ago

  • Related to Bug #20165: Ractors moving a Struct breaks beyond the first 3 fields added
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0