Feature #20669
closedAdd Marshal::MarshalError class to differentiate ArgumentErrors
Description
Make it possible to differentiate general non-marshal errors from failures to decode Marshal data using a specific exception class.
Background¶
There are a variety of error conditions that can cause Marshal to fail, including both corrupt data, and between versions of Ruby/Marshal binary formats, e.g.
> Marshal.load("foobar")
<internal:marshal>:34:in `load': incompatible marshal file format (can't be read) (TypeError)
> Marshal.load(Marshal.dump(Object.new).slice(0, 10))
<internal:marshal>:34:in `load': marshal data too short (ArgumentError)
> MyThing = Struct.new(:name, :age)
=> MyThing
> Marshal.dump(MyThing.new("Alice", 20))
=> "\x04\bS:\fMyThing\a:\tnameI\"\nAlice\x06:\x06ET:\bagei\x19"
(in a separate session)
> Marshal.load "\x04\bS:\fMyThing\a:\tnameI\"\nAlice\x06:\x06ET:\bagei\x19"
<internal:marshal>:34:in `load': undefined class/module MyThing (ArgumentError)
When data is corrupt, or incompatible, Marshal may raise either ArgumentError
or TypeError
with various messages. Not all errors are from marshal.c
.
Generally, TypeError
and ArgumentError
are used to indicate some semantic problem with the program, rather than a runtime issue with the data. For example, ArgumentError
may be used to indicate less than the required number of arguments passed to a method, and TypeError
may indicate than the wrong data type was given as an argument.
Dalli is a library for accessing memcached servers. By default it uses Marshal to store serialised Ruby objects. However, when updating major versions of Ruby, or application code, Marshal.load
may start raising TypeError
or ArgumentError
. This class of error needs to be handled differently from "normal" TypeError
and ArgumentError
. As such, the only way to differentiate right now is to pattern match the exception message, which is done here:
# TODO: Some of these error messages need to be validated. It's not obvious
# that all of them are actually generated by the invoked code
# in current systems
# rubocop:disable Layout/LineLength
TYPE_ERR_REGEXP = %r{needs to have method `_load'|exception class/object expected|instance of IO needed|incompatible marshal file format}.freeze
ARGUMENT_ERR_REGEXP = /undefined class|marshal data too short/.freeze
NAME_ERR_STR = 'uninitialized constant'
# rubocop:enable Layout/LineLength
def retrieve(value, bitflags)
serialized = (bitflags & FLAG_SERIALIZED) != 0
serialized ? serializer.load(value) : value
rescue TypeError => e
filter_type_error(e)
rescue ArgumentError => e
filter_argument_error(e)
rescue NameError => e
filter_name_error(e)
end
def filter_type_error(err)
raise err unless TYPE_ERR_REGEXP.match?(err.message)
raise UnmarshalError, "Unable to unmarshal value: #{err.message}"
end
def filter_argument_error(err)
raise err unless ARGUMENT_ERR_REGEXP.match?(err.message)
raise UnmarshalError, "Unable to unmarshal value: #{err.message}"
end
def filter_name_error(err)
raise err unless err.message.include?(NAME_ERR_STR)
raise UnmarshalError, "Unable to unmarshal value: #{err.message}"
end
Unfortunately, there have been several incidents where the complexity of the current behaviour has lead to outages in production.
Proposal¶
Ideally, the above code could use a specific (one or more) error class for detecting failed Marshal.load
.
def retrieve(value, bitflags)
serialized = (bitflags & FLAG_SERIALIZED) != 0
serialized ? serializer.load(value) : value
rescue Marshal::LoadError => error
raise UnmarshalError, "Unable to unmarshal value!"
end
One difficulty is retaining backwards compatibility. We may be able to create a sub-class of class LoadError < ArgumentError
but it fundamentally seems wrong to me (should really be a RuntimeError
).
See also¶
- Dalli PR where another error message would be added
-
https://github.com/search?q=Marshal.load+ArgumentError&type=code - Git code search showing some variations of the problem (
rescue ArgumentError
and similar).