Bug #15968
closedCustom marshal_load methods allow object instance variables to "leak" into other objects
Description
While working on a Rails app, I noticed some odd behavior where after marshalling and demarshalling an array of ActiveRecord objects, some elements were replaced with symbols and empty hashes (original Rails bug report).
It appears some of Rails' custom marshallization methods modify allow an object's unset instance variables to be set during marshallization. However, since these instance variables weren't counted at the start of marshallization, they overflow into subsequent array elements upon demarshallization.
Here is a test case (written in plain Ruby) demonstrating this behavior:
require 'test/unit'
class Foo
attr_accessor :bar, :baz
def initialize
self.bar = Bar.new(self)
end
end
class Bar
attr_accessor :foo
def initialize(foo)
self.foo = foo
end
def marshal_dump
self.foo.baz = :problem
{foo: self.foo}
end
def marshal_load(data)
self.foo = data[:foo]
end
end
class BugTest < Test::Unit::TestCase
def test_marshalization
foo = Foo.new
array = [foo, nil]
marshalled_array = Marshal.dump(array)
demarshalled_array = Marshal.load(marshalled_array)
assert_nil demarshalled_array[1]
end
end
I'm not positive this qualifies as a bug - if a programmer writes custom marshal_dump
and marshal_load
methods, perhaps it's their responsibility to avoid unintended side-effects like those demonstrated in my test case.
However, I think this issue might be altogether avoided by adding a reserved delimiter character to Ruby's core marshallization functionality (in marshal.c) representing the "end" of a serialized object. For instance, in the above test case, marshalled_array
comes out to:
\x04\b[\ao:\bFoo\x06:\t@barU:\bBar{\x06:\bfoo@\x06:\t@baz:\fproblem0
Suppose Ruby used a z
character to represent the end of a serialized object - in this case, marshalled_array
would come out to something like:
\x04\b[\ao:\bFoo\x06:\t@barU:\bBar{\x06:\bfoo@\x06:\t@baz:\fproblemz0
(Note the second-to-last character - z
.)
This way, when demarshalling an object, even if additional instance variables had somehow snuck in during marshallization process, the z
character could be used to mark the end of a serialized object, ensuring that the extra instance variables don't overflow into the next segment of serialized data.
I don't write much C, and I haven't fully grokked Ruby's marshal.c - so there may be dozens of reasons why this won't work. But I think a serialization strategy along those lines may help avoid unexpected behavior.