Feature #20669
closedAdd Marshal::MarshalError class to differentiate ArgumentErrors
Description
Make it possible to differentiate general non-marshal errors from failures to decode Marshal data using a specific exception class.
Background¶
There are a variety of error conditions that can cause Marshal to fail, including both corrupt data, and between versions of Ruby/Marshal binary formats, e.g.
> Marshal.load("foobar")
<internal:marshal>:34:in `load': incompatible marshal file format (can't be read) (TypeError)
> Marshal.load(Marshal.dump(Object.new).slice(0, 10))
<internal:marshal>:34:in `load': marshal data too short (ArgumentError)
> MyThing = Struct.new(:name, :age)
=> MyThing
> Marshal.dump(MyThing.new("Alice", 20))
=> "\x04\bS:\fMyThing\a:\tnameI\"\nAlice\x06:\x06ET:\bagei\x19"
(in a separate session)
> Marshal.load "\x04\bS:\fMyThing\a:\tnameI\"\nAlice\x06:\x06ET:\bagei\x19"
<internal:marshal>:34:in `load': undefined class/module MyThing (ArgumentError)
When data is corrupt, or incompatible, Marshal may raise either ArgumentError
or TypeError
with various messages. Not all errors are from marshal.c
.
Generally, TypeError
and ArgumentError
are used to indicate some semantic problem with the program, rather than a runtime issue with the data. For example, ArgumentError
may be used to indicate less than the required number of arguments passed to a method, and TypeError
may indicate than the wrong data type was given as an argument.
Dalli is a library for accessing memcached servers. By default it uses Marshal to store serialised Ruby objects. However, when updating major versions of Ruby, or application code, Marshal.load
may start raising TypeError
or ArgumentError
. This class of error needs to be handled differently from "normal" TypeError
and ArgumentError
. As such, the only way to differentiate right now is to pattern match the exception message, which is done here:
# TODO: Some of these error messages need to be validated. It's not obvious
# that all of them are actually generated by the invoked code
# in current systems
# rubocop:disable Layout/LineLength
TYPE_ERR_REGEXP = %r{needs to have method `_load'|exception class/object expected|instance of IO needed|incompatible marshal file format}.freeze
ARGUMENT_ERR_REGEXP = /undefined class|marshal data too short/.freeze
NAME_ERR_STR = 'uninitialized constant'
# rubocop:enable Layout/LineLength
def retrieve(value, bitflags)
serialized = (bitflags & FLAG_SERIALIZED) != 0
serialized ? serializer.load(value) : value
rescue TypeError => e
filter_type_error(e)
rescue ArgumentError => e
filter_argument_error(e)
rescue NameError => e
filter_name_error(e)
end
def filter_type_error(err)
raise err unless TYPE_ERR_REGEXP.match?(err.message)
raise UnmarshalError, "Unable to unmarshal value: #{err.message}"
end
def filter_argument_error(err)
raise err unless ARGUMENT_ERR_REGEXP.match?(err.message)
raise UnmarshalError, "Unable to unmarshal value: #{err.message}"
end
def filter_name_error(err)
raise err unless err.message.include?(NAME_ERR_STR)
raise UnmarshalError, "Unable to unmarshal value: #{err.message}"
end
Unfortunately, there have been several incidents where the complexity of the current behaviour has lead to outages in production.
Proposal¶
Ideally, the above code could use a specific (one or more) error class for detecting failed Marshal.load
.
def retrieve(value, bitflags)
serialized = (bitflags & FLAG_SERIALIZED) != 0
serialized ? serializer.load(value) : value
rescue Marshal::LoadError => error
raise UnmarshalError, "Unable to unmarshal value!"
end
One difficulty is retaining backwards compatibility. We may be able to create a sub-class of class LoadError < ArgumentError
but it fundamentally seems wrong to me (should really be a RuntimeError
).
See also¶
- Dalli PR where another error message would be added
-
https://github.com/search?q=Marshal.load+ArgumentError&type=code - Git code search showing some variations of the problem (
rescue ArgumentError
and similar).
Updated by kjtsanaktsidis (KJ Tsanaktsidis) 5 months ago
I agree it might be nice if Marshal had its own error class like MarshalError
or some such. However, I don't really see why we need to differentiate between different kinds of unmarshaling errors? Any one of these things from dalli ("marshal data too short", "incompatible marshal file format", ...) are in the realm of "a human needs to look at the data and figure out what went wrong", I would have thought?
I.e. it's useful to know that an exception did come from Marshal, but I don't think I quite see the value in knowing what exception came from Marshal? Maybe I'm missing something obvious and you can enlighten me :)
Updated by olleolleolle (Olle Jonsson) 5 months ago
kjtsanaktsidis (KJ Tsanaktsidis) wrote in #note-1:
I agree it might be nice if Marshal had its own error class like
MarshalError
or some such. However, I don't really see why we need to differentiate between different kinds of unmarshaling errors? Any one of these things from dalli ("marshal data too short", "incompatible marshal file format", ...) are in the realm of "a human needs to look at the data and figure out what went wrong", I would have thought?I.e. it's useful to know that an exception did come from Marshal, but I don't think I quite see the value in knowing what exception came from Marshal? Maybe I'm missing something obvious and you can enlighten me :)
Thanks! Alright, that'd scope it down, making the patch smaller and easier to deal with. I will make an edit to my proposal to be about adding a "is the data you are trying to load broken?" ArgumentError subclass called MarshalError.
Updated by olleolleolle (Olle Jonsson) 5 months ago
- Subject changed from Add error classes to differentiate Marshal ArgumentErrors to Add Marshal::MarshalError class to differentiate ArgumentErrors
Updated by olleolleolle (Olle Jonsson) 5 months ago
- Description updated (diff)
Changed the proposal to be about one error class. Thanks, KJ!
Updated by byroot (Jean Boussier) 5 months ago
We may be able to create a sub-class of class LoadError < ArgumentError but it fundamentally seems wrong to me (should really be a RuntimeError).
There's little way around it, lots of code today expects ArgumentError
, if you start raising something that doesn't inherit from ArgumentError
you'll break that code.
Also I don't think it's wrong for it to be an argument error. You supply a string argument that isn't valid marshal data, so argument error make sense, just like how Integer("abcd")
raises ArgumentError
.
Updated by Eregon (Benoit Daloze) 4 months ago
Isn't it fair to assume any Exception from calling Marshal.load
is an error while de-marshaling the data?
I think it is.
If it's about serializer.load(value)
and serializer
not being always Marshal, that's easy to deal with: just do a if serializer == Marshal
and rescue
things differently there if you want to wrap in a UnmarshalError
.
Updated by olleolleolle (Olle Jonsson) 3 months ago ยท Edited
In the meantime, Dalli merged a change that sounds a lot like Eregon's comment. https://github.com/petergoldstein/dalli/pull/1011
My immediate needs are met, and I will close this feature request. EDIT: I can't find a button to close this Feature Request. Perhaps I don't have permissions to.
Updated by Eregon (Benoit Daloze) 3 months ago
- Status changed from Open to Closed