Feature #20669
Updated by ioquatix (Samuel Williams) 5 months ago
Make it possible to differentiate general non-marshal errors from failures to decode Marshal data using a specific exception class. ## Background There are a variety of error conditions that can cause Marshal to fail, including both corrupt data, and between versions of Ruby/Marshal binary formats, e.g. ``` ```ruby > Marshal.load("foobar") <internal:marshal>:34:in `load': incompatible marshal file format (can't be read) (TypeError) > Marshal.load(Marshal.dump(Object.new).slice(0, 10)) <internal:marshal>:34:in `load': marshal data too short (ArgumentError) > MyThing = Struct.new(:name, :age) => MyThing > Marshal.dump(MyThing.new("Alice", 20)) => "\x04\bS:\fMyThing\a:\tnameI\"\nAlice\x06:\x06ET:\bagei\x19" (in a separate session) > Marshal.load "\x04\bS:\fMyThing\a:\tnameI\"\nAlice\x06:\x06ET:\bagei\x19" <internal:marshal>:34:in `load': undefined class/module MyThing (ArgumentError) ``` When data is corrupt, or incompatible, Marshal may raise either `ArgumentError` or `TypeError` with various messages. Not all errors are from `marshal.c`. Generally, `TypeError` and `ArgumentError` are used to indicate some semantic problem with the program, rather than a runtime issue with the data. For example, `ArgumentError` may be used to indicate less than the required number of arguments passed to a method, and `TypeError` may indicate than the wrong data type was given as an argument. Dalli is a library for accessing memcached servers. By default it uses Marshal to store serialised Ruby objects. However, when updating major versions of Ruby, or application code, `Marshal.load` may start raising `TypeError` or `ArgumentError`. This class of error needs to be handled differently from "normal" `TypeError` and `ArgumentError`. As such, the only way to differentiate right now is to pattern match the exception message, [which is done here](https://github.com/petergoldstein/dalli/blob/1d4cbfc78e6470ad57a883801a783bf30890c400/lib/dalli/protocol/value_serializer.rb#L36-L72): ```ruby # TODO: Some of these error messages need to be validated. It's not obvious # that all of them are actually generated by the invoked code # in current systems # rubocop:disable Layout/LineLength TYPE_ERR_REGEXP = %r{needs to have method `_load'|exception class/object expected|instance of IO needed|incompatible marshal file format}.freeze ARGUMENT_ERR_REGEXP = /undefined class|marshal data too short/.freeze NAME_ERR_STR = 'uninitialized constant' # rubocop:enable Layout/LineLength def retrieve(value, bitflags) serialized = (bitflags & FLAG_SERIALIZED) != 0 serialized ? serializer.load(value) : value rescue TypeError => e filter_type_error(e) rescue ArgumentError => e filter_argument_error(e) rescue NameError => e filter_name_error(e) end def filter_type_error(err) raise err unless TYPE_ERR_REGEXP.match?(err.message) raise UnmarshalError, "Unable to unmarshal value: #{err.message}" end def filter_argument_error(err) raise err unless ARGUMENT_ERR_REGEXP.match?(err.message) raise UnmarshalError, "Unable to unmarshal value: #{err.message}" end def filter_name_error(err) raise err unless err.message.include?(NAME_ERR_STR) raise UnmarshalError, "Unable to unmarshal value: #{err.message}" end ``` ## Proposal Ideally, the above code could use a specific (one or more) error class for detecting failed `Marshal.load`. ```ruby def retrieve(value, bitflags) serialized = (bitflags & FLAG_SERIALIZED) != 0 serialized ? serializer.load(value) : value rescue Marshal::LoadError => error raise UnmarshalError, "Unable to unmarshal value!" end ``` One difficulty is retaining backwards compatibility. We may be able to create a sub-class of `class LoadError < ArgumentError` but it fundamentally seems wrong to me (should really be a `RuntimeError`). ## See also - [Dalli PR where another error message would be added](https://github.com/petergoldstein/dalli/pull/1008) - https://github.com/search?q=Marshal.load+ArgumentError&type=code - Git code search showing some variations of the problem (`rescue ArgumentError` and similar).