Project

General

Profile

Actions

Feature #20669

open

Add Marshal::MarshalError class to differentiate ArgumentErrors

Added by olleolleolle (Olle Jonsson) 5 days ago. Updated 4 days ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:118811]

Description

Make it possible to differentiate general non-marshal errors from failures to decode Marshal data using a specific exception class.

Background

There are a variety of error conditions that can cause Marshal to fail, including both corrupt data, and between versions of Ruby/Marshal binary formats, e.g.

> Marshal.load("foobar")
<internal:marshal>:34:in `load': incompatible marshal file format (can't be read) (TypeError)

> Marshal.load(Marshal.dump(Object.new).slice(0, 10))
<internal:marshal>:34:in `load': marshal data too short (ArgumentError)

> MyThing = Struct.new(:name, :age)
=> MyThing
> Marshal.dump(MyThing.new("Alice", 20))
=> "\x04\bS:\fMyThing\a:\tnameI\"\nAlice\x06:\x06ET:\bagei\x19"

(in a separate session)
> Marshal.load "\x04\bS:\fMyThing\a:\tnameI\"\nAlice\x06:\x06ET:\bagei\x19"
<internal:marshal>:34:in `load': undefined class/module MyThing (ArgumentError)

When data is corrupt, or incompatible, Marshal may raise either ArgumentError or TypeError with various messages. Not all errors are from marshal.c.

Generally, TypeError and ArgumentError are used to indicate some semantic problem with the program, rather than a runtime issue with the data. For example, ArgumentError may be used to indicate less than the required number of arguments passed to a method, and TypeError may indicate than the wrong data type was given as an argument.

Dalli is a library for accessing memcached servers. By default it uses Marshal to store serialised Ruby objects. However, when updating major versions of Ruby, or application code, Marshal.load may start raising TypeError or ArgumentError. This class of error needs to be handled differently from "normal" TypeError and ArgumentError. As such, the only way to differentiate right now is to pattern match the exception message, which is done here:

      # TODO: Some of these error messages need to be validated.  It's not obvious
      # that all of them are actually generated by the invoked code
      # in current systems
      # rubocop:disable Layout/LineLength
      TYPE_ERR_REGEXP = %r{needs to have method `_load'|exception class/object expected|instance of IO needed|incompatible marshal file format}.freeze
      ARGUMENT_ERR_REGEXP = /undefined class|marshal data too short/.freeze
      NAME_ERR_STR = 'uninitialized constant'
      # rubocop:enable Layout/LineLength

      def retrieve(value, bitflags)
        serialized = (bitflags & FLAG_SERIALIZED) != 0
        serialized ? serializer.load(value) : value
      rescue TypeError => e
        filter_type_error(e)
      rescue ArgumentError => e
        filter_argument_error(e)
      rescue NameError => e
        filter_name_error(e)
      end

      def filter_type_error(err)
        raise err unless TYPE_ERR_REGEXP.match?(err.message)

        raise UnmarshalError, "Unable to unmarshal value: #{err.message}"
      end

      def filter_argument_error(err)
        raise err unless ARGUMENT_ERR_REGEXP.match?(err.message)

        raise UnmarshalError, "Unable to unmarshal value: #{err.message}"
      end

      def filter_name_error(err)
        raise err unless err.message.include?(NAME_ERR_STR)

        raise UnmarshalError, "Unable to unmarshal value: #{err.message}"
      end

Unfortunately, there have been several incidents where the complexity of the current behaviour has lead to outages in production.

Proposal

Ideally, the above code could use a specific (one or more) error class for detecting failed Marshal.load.

      def retrieve(value, bitflags)
        serialized = (bitflags & FLAG_SERIALIZED) != 0
        serialized ? serializer.load(value) : value
      rescue Marshal::LoadError => error
        raise UnmarshalError, "Unable to unmarshal value!"
      end

One difficulty is retaining backwards compatibility. We may be able to create a sub-class of class LoadError < ArgumentError but it fundamentally seems wrong to me (should really be a RuntimeError).

See also

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0