Project

General

Profile

Actions

Feature #20669

open

Add Marshal::MarshalError class to differentiate ArgumentErrors

Added by olleolleolle (Olle Jonsson) 5 days ago. Updated 4 days ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:118811]

Description

Make it possible to differentiate general non-marshal errors from failures to decode Marshal data using a specific exception class.

Background

There are a variety of error conditions that can cause Marshal to fail, including both corrupt data, and between versions of Ruby/Marshal binary formats, e.g.

> Marshal.load("foobar")
<internal:marshal>:34:in `load': incompatible marshal file format (can't be read) (TypeError)

> Marshal.load(Marshal.dump(Object.new).slice(0, 10))
<internal:marshal>:34:in `load': marshal data too short (ArgumentError)

> MyThing = Struct.new(:name, :age)
=> MyThing
> Marshal.dump(MyThing.new("Alice", 20))
=> "\x04\bS:\fMyThing\a:\tnameI\"\nAlice\x06:\x06ET:\bagei\x19"

(in a separate session)
> Marshal.load "\x04\bS:\fMyThing\a:\tnameI\"\nAlice\x06:\x06ET:\bagei\x19"
<internal:marshal>:34:in `load': undefined class/module MyThing (ArgumentError)

When data is corrupt, or incompatible, Marshal may raise either ArgumentError or TypeError with various messages. Not all errors are from marshal.c.

Generally, TypeError and ArgumentError are used to indicate some semantic problem with the program, rather than a runtime issue with the data. For example, ArgumentError may be used to indicate less than the required number of arguments passed to a method, and TypeError may indicate than the wrong data type was given as an argument.

Dalli is a library for accessing memcached servers. By default it uses Marshal to store serialised Ruby objects. However, when updating major versions of Ruby, or application code, Marshal.load may start raising TypeError or ArgumentError. This class of error needs to be handled differently from "normal" TypeError and ArgumentError. As such, the only way to differentiate right now is to pattern match the exception message, which is done here:

      # TODO: Some of these error messages need to be validated.  It's not obvious
      # that all of them are actually generated by the invoked code
      # in current systems
      # rubocop:disable Layout/LineLength
      TYPE_ERR_REGEXP = %r{needs to have method `_load'|exception class/object expected|instance of IO needed|incompatible marshal file format}.freeze
      ARGUMENT_ERR_REGEXP = /undefined class|marshal data too short/.freeze
      NAME_ERR_STR = 'uninitialized constant'
      # rubocop:enable Layout/LineLength

      def retrieve(value, bitflags)
        serialized = (bitflags & FLAG_SERIALIZED) != 0
        serialized ? serializer.load(value) : value
      rescue TypeError => e
        filter_type_error(e)
      rescue ArgumentError => e
        filter_argument_error(e)
      rescue NameError => e
        filter_name_error(e)
      end

      def filter_type_error(err)
        raise err unless TYPE_ERR_REGEXP.match?(err.message)

        raise UnmarshalError, "Unable to unmarshal value: #{err.message}"
      end

      def filter_argument_error(err)
        raise err unless ARGUMENT_ERR_REGEXP.match?(err.message)

        raise UnmarshalError, "Unable to unmarshal value: #{err.message}"
      end

      def filter_name_error(err)
        raise err unless err.message.include?(NAME_ERR_STR)

        raise UnmarshalError, "Unable to unmarshal value: #{err.message}"
      end

Unfortunately, there have been several incidents where the complexity of the current behaviour has lead to outages in production.

Proposal

Ideally, the above code could use a specific (one or more) error class for detecting failed Marshal.load.

      def retrieve(value, bitflags)
        serialized = (bitflags & FLAG_SERIALIZED) != 0
        serialized ? serializer.load(value) : value
      rescue Marshal::LoadError => error
        raise UnmarshalError, "Unable to unmarshal value!"
      end

One difficulty is retaining backwards compatibility. We may be able to create a sub-class of class LoadError < ArgumentError but it fundamentally seems wrong to me (should really be a RuntimeError).

See also

Updated by kjtsanaktsidis (KJ Tsanaktsidis) 4 days ago

I agree it might be nice if Marshal had its own error class like MarshalError or some such. However, I don't really see why we need to differentiate between different kinds of unmarshaling errors? Any one of these things from dalli ("marshal data too short", "incompatible marshal file format", ...) are in the realm of "a human needs to look at the data and figure out what went wrong", I would have thought?

I.e. it's useful to know that an exception did come from Marshal, but I don't think I quite see the value in knowing what exception came from Marshal? Maybe I'm missing something obvious and you can enlighten me :)

Updated by olleolleolle (Olle Jonsson) 4 days ago

kjtsanaktsidis (KJ Tsanaktsidis) wrote in #note-1:

I agree it might be nice if Marshal had its own error class like MarshalError or some such. However, I don't really see why we need to differentiate between different kinds of unmarshaling errors? Any one of these things from dalli ("marshal data too short", "incompatible marshal file format", ...) are in the realm of "a human needs to look at the data and figure out what went wrong", I would have thought?

I.e. it's useful to know that an exception did come from Marshal, but I don't think I quite see the value in knowing what exception came from Marshal? Maybe I'm missing something obvious and you can enlighten me :)

Thanks! Alright, that'd scope it down, making the patch smaller and easier to deal with. I will make an edit to my proposal to be about adding a "is the data you are trying to load broken?" ArgumentError subclass called MarshalError.

Actions #3

Updated by olleolleolle (Olle Jonsson) 4 days ago

  • Subject changed from Add error classes to differentiate Marshal ArgumentErrors to Add Marshal::MarshalError class to differentiate ArgumentErrors

Updated by olleolleolle (Olle Jonsson) 4 days ago

  • Description updated (diff)

Changed the proposal to be about one error class. Thanks, KJ!

Actions #5

Updated by olleolleolle (Olle Jonsson) 4 days ago

  • Description updated (diff)
Actions #6

Updated by olleolleolle (Olle Jonsson) 4 days ago

  • Description updated (diff)
Actions #7

Updated by ioquatix (Samuel Williams) 4 days ago

  • Description updated (diff)
Actions #8

Updated by ioquatix (Samuel Williams) 4 days ago

  • Description updated (diff)
Actions #9

Updated by ioquatix (Samuel Williams) 4 days ago

  • Description updated (diff)

Updated by byroot (Jean Boussier) 4 days ago

We may be able to create a sub-class of class LoadError < ArgumentError but it fundamentally seems wrong to me (should really be a RuntimeError).

There's little way around it, lots of code today expects ArgumentError, if you start raising something that doesn't inherit from ArgumentError you'll break that code.

Also I don't think it's wrong for it to be an argument error. You supply a string argument that isn't valid marshal data, so argument error make sense, just like how Integer("abcd") raises ArgumentError.

Updated by Eregon (Benoit Daloze) 4 days ago

Isn't it fair to assume any Exception from calling Marshal.load is an error while de-marshaling the data?
I think it is.

If it's about serializer.load(value) and serializer not being always Marshal, that's easy to deal with: just do a if serializer == Marshal and rescue things differently there if you want to wrap in a UnmarshalError.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0