Bug #20255
closedEmbedded arrays aren't moved correctly across ractors
Description
ractor.send(ary, move: true)
works incorrectly because if ary
is embedded, the new moved object doesn't populate its own embedded space, it uses the MovedObject's embedded space.
example:
r = Ractor.new {
inner_ary = receive
values = {}
values[:equal] = (inner_ary == ["",{},2,3,4,5,6])
values[:string] = inner_ary.to_s
values
}
ary = [String.new,Hash.new,2,3,4,5,6]
r.send(ary, move: true)
r_values = r.take
p r_values[:equal]
p r_values[:string]
# => false
# => "[\"\", {}, 2, 2.0, 21747991570, String, 3]"
Updated by luke-gru (Luke Gruber) about 1 year ago
I sent a PR here: https://github.com/ruby/ruby/pull/9918
Updated by ko1 (Koichi Sasada) about 1 year ago
- Assignee set to ko1 (Koichi Sasada)
Updated by hsbt (Hiroshi SHIBATA) about 1 year ago
- Status changed from Open to Assigned
Updated by byroot (Jean Boussier) 3 days ago
- Status changed from Assigned to Closed
Applied in changeset git|0350290262ea0fbc4e1807901797ee8a6970c2b9.
Ractor: Fix moving embedded objects
[Bug #20271]
[Bug #20267]
[Bug #20255]
rb_obj_alloc(RBASIC_CLASS(obj))
will always allocate from the basic
40B pool, so if obj
is larger than 40B
, we'll create a corrupted
object when we later copy the shape_id.
Instead we can use the same logic than ractor copy, which is
to use rb_obj_clone
, and later ask the GC to free the original
object.
We then must turn it into a T_OBJECT
, because otherwise
just changing its class to RactorMoved
leaves a lot of
ways to keep using the object, e.g.:
a = [1, 2, 3]
Ractor.new{}.send(a, move: true)
[].concat(a) # Should raise, but wasn't.
If it turns out that rb_obj_clone
isn't performant enough
for some uses, we can always have carefully crafted specialized
paths for the types that would benefit from it.
Updated by luke-gru (Luke Gruber) 3 days ago
· Edited
Hi Jean, thanks for taking care of this issue. Your change makes sense to me but I have a small concern regarding the potential call to initialize_clone
. When allowing user-defined hooks like this in Ractor move logic, the user can add a non-shareable object. For example:
a = [1, 2, 3]
NON_SHAREABLE = Object.new
p "outside: #{NON_SHAREABLE}"
def a.initialize_clone(other)
super
instance_variable_set("@non_shareable", NON_SHAREABLE)
end
r = Ractor.new do
obj = Ractor.receive
p "inside: #{obj.instance_variable_get("@non_shareable")}"
end
r.send(a, move: true)
r.take
One fix could be to check if there is a user-defined clone method for the object, and if so we need to traverse it again to either move any new objects or raise an error. We could disallow user-defined clone methods, but I don't know if that would be okay for users.
Updated by byroot (Jean Boussier) 3 days ago
@luke-gru indeed.
Updated by luke-gru (Luke Gruber) 3 days ago
· Edited
This change seems to have a regression in this case as well.
a = Object.new
b = Object.new
p "outside: #{b}"
a.instance_variable_set("@b", b)
r = Ractor.new do
obj = Ractor.receive
p "inside: #{obj}"
p "inside: #{obj.instance_variable_get("@b")}" # we get an error here now
end
r.send(a, move: true)
r.take
Updated by byroot (Jean Boussier) 3 days ago
For context, even though I didn't precisely predict these two cases, I was expecting initialize_cone
to be problematic. In the move: true
case we certainly shouldn't invoke it.
But also, this patch isn't really meant as a final solution. Longer term I'd like to rely on the GC compaction routine to do the moving, but short term fixing this memory corruption issue, even in a non-ideal way, allow to unblock further efforts.
I'll see about cloning without invoking the callbacks as another short term fix.
Updated by byroot (Jean Boussier) 3 days ago
Also I don't quite get why this is an issue with move: true
but isn't when doing a copy.
Updated by luke-gru (Luke Gruber) 3 days ago
Yes, copy as well is a problem. The best solution is to not allow these callbacks, because even if they run and you try to traverse the graph again to error or copy/move it, the callback could have spawned a thread that sets an unshareable on the object and then we're back to square one.
Updated by byroot (Jean Boussier) 3 days ago
Updated by byroot (Jean Boussier) about 18 hours ago
The best solution is to not allow these callbacks,
So it's not actually possible, because many types rely on it. e.g:
class Array
def initialize_clone(o)
# noop
end
end
p [1, 2, 3, 4, 5].clone # => []
Updated by luke-gru (Luke Gruber) about 11 hours ago
Yeah I was afraid that would be the case, but we could differentiate between user-defined clone callbacks and builtin callbacks. Either way we can probably worry about this later, I was just bringing it up in case you hadn't thought about it already.
Updated by byroot (Jean Boussier) about 11 hours ago
Yeah, it's really tricky. There has been some talks of just removing the move: true
capability.
Given: https://bugs.ruby-lang.org/issues/21208, I'm not sure it can ever be made correct, hence it might not be worth trying to improve it.
Updated by luke-gru (Luke Gruber) about 8 hours ago
That would be fine with me, because actually move: true
doesn't really "move" the object anymore, it just deep copies it since your change. It's not a less expensive operation anymore, so we may as well keep the original object around if invalidating it could cause issues.
Updated by byroot (Jean Boussier) about 8 hours ago
doesn't really "move" the object anymore, it just deep copies it since your change. It's not a less expensive operation anymor
As mentioned previously, this isn't meant as an end state, but as a quick fix to allow further experimentation.
But either way, even with a real "move" we still need to allocate new object slots and traverse the object graph to update the references, so I don't think it ever was significantly cheaper than move: false
.
Updated by luke-gru (Luke Gruber) about 8 hours ago
I reread my message and I didn't mean to come across like I disagreed with your changes, because I don't :)