Project

General

Profile

Actions

Bug #20255

closed

Embedded arrays aren't moved correctly across ractors

Added by luke-gru (Luke Gruber) about 1 year ago. Updated about 8 hours ago.


Description

ractor.send(ary, move: true) works incorrectly because if ary is embedded, the new moved object doesn't populate its own embedded space, it uses the MovedObject's embedded space.

example:

r = Ractor.new {
  inner_ary = receive
  values = {}
  values[:equal] = (inner_ary == ["",{},2,3,4,5,6])
  values[:string] = inner_ary.to_s
  values
}
ary = [String.new,Hash.new,2,3,4,5,6]
r.send(ary, move: true)
r_values = r.take
p r_values[:equal]
p r_values[:string]

# => false
# => "[\"\", {}, 2, 2.0, 21747991570, String, 3]"

Updated by ko1 (Koichi Sasada) about 1 year ago

  • Assignee set to ko1 (Koichi Sasada)
Actions #3

Updated by hsbt (Hiroshi SHIBATA) about 1 year ago

  • Status changed from Open to Assigned
Actions #4

Updated by byroot (Jean Boussier) 3 days ago

  • Status changed from Assigned to Closed

Applied in changeset git|0350290262ea0fbc4e1807901797ee8a6970c2b9.


Ractor: Fix moving embedded objects

[Bug #20271]
[Bug #20267]
[Bug #20255]

rb_obj_alloc(RBASIC_CLASS(obj)) will always allocate from the basic
40B pool, so if obj is larger than 40B, we'll create a corrupted
object when we later copy the shape_id.

Instead we can use the same logic than ractor copy, which is
to use rb_obj_clone, and later ask the GC to free the original
object.

We then must turn it into a T_OBJECT, because otherwise
just changing its class to RactorMoved leaves a lot of
ways to keep using the object, e.g.:

a = [1, 2, 3]
Ractor.new{}.send(a, move: true)
[].concat(a) # Should raise, but wasn't.

If it turns out that rb_obj_clone isn't performant enough
for some uses, we can always have carefully crafted specialized
paths for the types that would benefit from it.

Updated by luke-gru (Luke Gruber) 3 days ago · Edited

Hi Jean, thanks for taking care of this issue. Your change makes sense to me but I have a small concern regarding the potential call to initialize_clone. When allowing user-defined hooks like this in Ractor move logic, the user can add a non-shareable object. For example:

a = [1, 2, 3]
NON_SHAREABLE = Object.new
p "outside: #{NON_SHAREABLE}"
def a.initialize_clone(other)
  super
  instance_variable_set("@non_shareable", NON_SHAREABLE)
end
r = Ractor.new do
  obj = Ractor.receive
  p "inside: #{obj.instance_variable_get("@non_shareable")}"
end
r.send(a, move: true)
r.take

One fix could be to check if there is a user-defined clone method for the object, and if so we need to traverse it again to either move any new objects or raise an error. We could disallow user-defined clone methods, but I don't know if that would be okay for users.

Updated by byroot (Jean Boussier) 3 days ago

@luke-gru indeed.

Updated by luke-gru (Luke Gruber) 3 days ago · Edited

This change seems to have a regression in this case as well.

a = Object.new
b = Object.new
p "outside: #{b}"
a.instance_variable_set("@b", b)

r = Ractor.new do
  obj = Ractor.receive
  p "inside: #{obj}"
  p "inside: #{obj.instance_variable_get("@b")}" # we get an error here now
end
r.send(a, move: true)
r.take

Updated by byroot (Jean Boussier) 3 days ago

For context, even though I didn't precisely predict these two cases, I was expecting initialize_cone to be problematic. In the move: true case we certainly shouldn't invoke it.

But also, this patch isn't really meant as a final solution. Longer term I'd like to rely on the GC compaction routine to do the moving, but short term fixing this memory corruption issue, even in a non-ideal way, allow to unblock further efforts.

I'll see about cloning without invoking the callbacks as another short term fix.

Updated by byroot (Jean Boussier) 3 days ago

Also I don't quite get why this is an issue with move: true but isn't when doing a copy.

Updated by luke-gru (Luke Gruber) 3 days ago

Yes, copy as well is a problem. The best solution is to not allow these callbacks, because even if they run and you try to traverse the graph again to error or copy/move it, the callback could have spawned a thread that sets an unshareable on the object and then we're back to square one.

Updated by byroot (Jean Boussier) about 18 hours ago

The best solution is to not allow these callbacks,

So it's not actually possible, because many types rely on it. e.g:

class Array
  def initialize_clone(o)
    # noop
  end
end

p [1, 2, 3, 4, 5].clone # => []

Updated by luke-gru (Luke Gruber) about 11 hours ago

Yeah I was afraid that would be the case, but we could differentiate between user-defined clone callbacks and builtin callbacks. Either way we can probably worry about this later, I was just bringing it up in case you hadn't thought about it already.

Updated by byroot (Jean Boussier) about 11 hours ago

Yeah, it's really tricky. There has been some talks of just removing the move: true capability.

Given: https://bugs.ruby-lang.org/issues/21208, I'm not sure it can ever be made correct, hence it might not be worth trying to improve it.

Updated by luke-gru (Luke Gruber) about 8 hours ago

That would be fine with me, because actually move: true doesn't really "move" the object anymore, it just deep copies it since your change. It's not a less expensive operation anymore, so we may as well keep the original object around if invalidating it could cause issues.

Updated by byroot (Jean Boussier) about 8 hours ago

doesn't really "move" the object anymore, it just deep copies it since your change. It's not a less expensive operation anymor

As mentioned previously, this isn't meant as an end state, but as a quick fix to allow further experimentation.

But either way, even with a real "move" we still need to allocate new object slots and traverse the object graph to update the references, so I don't think it ever was significantly cheaper than move: false.

Updated by luke-gru (Luke Gruber) about 8 hours ago

I reread my message and I didn't mean to come across like I disagreed with your changes, because I don't :)

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0