Bug #15968
Updated by nobu (Nobuyoshi Nakada) over 5 years ago
While working on a Rails app, I noticed some odd behavior where after marshalling and demarshalling an array of ActiveRecord objects, some elements were replaced with symbols and empty hashes ([original Rails bug report](https://github.com/rails/rails/issues/36522)). It appears some of Rails' custom marshallization methods modify allow an object's unset instance variables to be set during marshallization. However, since these instance variables weren't counted at the start of marshallization, they overflow into subsequent array elements upon demarshallization. Here is a test case (written in plain Ruby) demonstrating this behavior: ```ruby ``` require 'test/unit' class Foo attr_accessor :bar, :baz def initialize self.bar = Bar.new(self) end end class Bar attr_accessor :foo def initialize(foo) self.foo = foo end def marshal_dump self.foo.baz = :problem {foo: self.foo} end def marshal_load(data) self.foo = data[:foo] end end class BugTest < Test::Unit::TestCase def test_marshalization foo = Foo.new array = [foo, nil] marshalled_array = Marshal.dump(array) demarshalled_array = Marshal.load(marshalled_array) assert_nil demarshalled_array[1] end end ``` I'm not positive this qualifies as a bug - if a programmer writes custom `marshal_dump` marshal_dump and `marshal_load` marshal_load methods, perhaps it's their responsibility to avoid unintended side-effects like those demonstrated in my test case. However, I think this issue might be altogether avoided by adding a reserved delimiter character to Ruby's core marshallization functionality (in marshal.c) representing the "end" of a serialized object. For instance, in the above test case, `marshalled_array` comes out to: ``` \x04\b[\ao:\bFoo\x06:\t@barU:\bBar{\x06:\bfoo@\x06:\t@baz:\fproblem0 ``` Suppose Ruby used a `z` character to represent the end of a serialized object - in this case, `marshalled_array` would come out to something like: ``` \x04\b[\ao:\bFoo\x06:\t@barU:\bBar{\x06:\bfoo@\x06:\t@baz:\fproblemz0 ``` (Note the second-to-last character - `z`.) This way, when demarshalling an object, even if additional instance variables had somehow snuck in during marshallization process, the `z` character could be used to mark the end of a serialized object, ensuring that the extra instance variables don't overflow into the next segment of serialized data. I don't write much C, and I haven't fully grokked Ruby's marshal.c - so there may be dozens of reasons why this won't work. But I think a serialization strategy along those lines may help avoid unexpected behavior.