Bug #20255
closedEmbedded arrays aren't moved correctly across ractors
Added by luke-gru (Luke Gruber) about 1 year ago. Updated about 17 hours ago.
Description
ractor.send(ary, move: true)
works incorrectly because if ary
is embedded, the new moved object doesn't populate its own embedded space, it uses the MovedObject's embedded space.
example:
r = Ractor.new {
inner_ary = receive
values = {}
values[:equal] = (inner_ary == ["",{},2,3,4,5,6])
values[:string] = inner_ary.to_s
values
}
ary = [String.new,Hash.new,2,3,4,5,6]
r.send(ary, move: true)
r_values = r.take
p r_values[:equal]
p r_values[:string]
# => false
# => "[\"\", {}, 2, 2.0, 21747991570, String, 3]"
Updated by luke-gru (Luke Gruber) about 1 year ago
I sent a PR here: https://github.com/ruby/ruby/pull/9918
Updated by ko1 (Koichi Sasada) about 1 year ago
- Assignee set to ko1 (Koichi Sasada)
Updated by hsbt (Hiroshi SHIBATA) about 1 year ago
- Status changed from Open to Assigned
Updated by byroot (Jean Boussier) 4 days ago
- Status changed from Assigned to Closed
Applied in changeset git|0350290262ea0fbc4e1807901797ee8a6970c2b9.
Ractor: Fix moving embedded objects
[Bug #20271]
[Bug #20267]
[Bug #20255]
rb_obj_alloc(RBASIC_CLASS(obj))
will always allocate from the basic
40B pool, so if obj
is larger than 40B
, we'll create a corrupted
object when we later copy the shape_id.
Instead we can use the same logic than ractor copy, which is
to use rb_obj_clone
, and later ask the GC to free the original
object.
We then must turn it into a T_OBJECT
, because otherwise
just changing its class to RactorMoved
leaves a lot of
ways to keep using the object, e.g.:
a = [1, 2, 3]
Ractor.new{}.send(a, move: true)
[].concat(a) # Should raise, but wasn't.
If it turns out that rb_obj_clone
isn't performant enough
for some uses, we can always have carefully crafted specialized
paths for the types that would benefit from it.
Updated by luke-gru (Luke Gruber) 3 days ago
· Edited
Hi Jean, thanks for taking care of this issue. Your change makes sense to me but I have a small concern regarding the potential call to initialize_clone
. When allowing user-defined hooks like this in Ractor move logic, the user can add a non-shareable object. For example:
a = [1, 2, 3]
NON_SHAREABLE = Object.new
p "outside: #{NON_SHAREABLE}"
def a.initialize_clone(other)
super
instance_variable_set("@non_shareable", NON_SHAREABLE)
end
r = Ractor.new do
obj = Ractor.receive
p "inside: #{obj.instance_variable_get("@non_shareable")}"
end
r.send(a, move: true)
r.take
One fix could be to check if there is a user-defined clone method for the object, and if so we need to traverse it again to either move any new objects or raise an error. We could disallow user-defined clone methods, but I don't know if that would be okay for users.
Updated by byroot (Jean Boussier) 3 days ago
@luke-gru indeed.
Updated by luke-gru (Luke Gruber) 3 days ago
· Edited
This change seems to have a regression in this case as well.
a = Object.new
b = Object.new
p "outside: #{b}"
a.instance_variable_set("@b", b)
r = Ractor.new do
obj = Ractor.receive
p "inside: #{obj}"
p "inside: #{obj.instance_variable_get("@b")}" # we get an error here now
end
r.send(a, move: true)
r.take
Updated by byroot (Jean Boussier) 3 days ago
For context, even though I didn't precisely predict these two cases, I was expecting initialize_cone
to be problematic. In the move: true
case we certainly shouldn't invoke it.
But also, this patch isn't really meant as a final solution. Longer term I'd like to rely on the GC compaction routine to do the moving, but short term fixing this memory corruption issue, even in a non-ideal way, allow to unblock further efforts.
I'll see about cloning without invoking the callbacks as another short term fix.
Updated by byroot (Jean Boussier) 3 days ago
Also I don't quite get why this is an issue with move: true
but isn't when doing a copy.
Updated by luke-gru (Luke Gruber) 3 days ago
Yes, copy as well is a problem. The best solution is to not allow these callbacks, because even if they run and you try to traverse the graph again to error or copy/move it, the callback could have spawned a thread that sets an unshareable on the object and then we're back to square one.
Updated by byroot (Jean Boussier) 3 days ago
Updated by byroot (Jean Boussier) 1 day ago
The best solution is to not allow these callbacks,
So it's not actually possible, because many types rely on it. e.g:
class Array
def initialize_clone(o)
# noop
end
end
p [1, 2, 3, 4, 5].clone # => []
Updated by luke-gru (Luke Gruber) 1 day ago
Yeah I was afraid that would be the case, but we could differentiate between user-defined clone callbacks and builtin callbacks. Either way we can probably worry about this later, I was just bringing it up in case you hadn't thought about it already.
Updated by byroot (Jean Boussier) 1 day ago
Yeah, it's really tricky. There has been some talks of just removing the move: true
capability.
Given: https://bugs.ruby-lang.org/issues/21208, I'm not sure it can ever be made correct, hence it might not be worth trying to improve it.
Updated by luke-gru (Luke Gruber) 1 day ago
That would be fine with me, because actually move: true
doesn't really "move" the object anymore, it just deep copies it since your change. It's not a less expensive operation anymore, so we may as well keep the original object around if invalidating it could cause issues.
Updated by byroot (Jean Boussier) 1 day ago
doesn't really "move" the object anymore, it just deep copies it since your change. It's not a less expensive operation anymor
As mentioned previously, this isn't meant as an end state, but as a quick fix to allow further experimentation.
But either way, even with a real "move" we still need to allocate new object slots and traverse the object graph to update the references, so I don't think it ever was significantly cheaper than move: false
.
Updated by luke-gru (Luke Gruber) 1 day ago
I reread my message and I didn't mean to come across like I disagreed with your changes, because I don't :)
Updated by byroot (Jean Boussier) about 17 hours ago
No offense taken.