Project

General

Profile

Actions

Bug #20009

open

Marshal.load raises exception when load dumped class include non-ASCII

Added by ippachi (Kazuya Hatanaka) 3 months ago. Updated 3 months ago.

Status:
Open
Assignee:
-
Target version:
-
ruby -v:
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [arm64-darwin22]
[ruby-core:115422]

Description

Reproduction code

class Cクラス; end
Marshal.load(Marshal.dump(Cクラス))

Actual result

<internal:marshal>:34:in `load': undefined class/module C\xE3\x82\xAF\xE3\x83\xA9\xE3\x82\xB9 (ArgumentError)
        from marshal.rb:2:in `<main>'

Expected result

Returns Cクラス

Impacted area

An exception is raised in Rails under the following conditions

  • minitest is used with default settings
  • Parallel execution with parallelize
  • test class names contain non-ASCII characters

The default parallelization uses DRb, and Marshal is used inside DRb.

Other

After trying various things, I thought I could fix it by making rb_path_to_class support strings containing non-ASCII characters, but I couldn't find anything more than that.

Updated by byroot (Jean Boussier) 3 months ago

I dug into this bug, and I'm not sure if it's possible to fix it.

Classes are serialized this way:

          case T_CLASS:
            w_byte(TYPE_CLASS, arg);
            {
                VALUE path = class2path(obj);
                w_bytes(RSTRING_PTR(path), RSTRING_LEN(path), arg);
                RB_GC_GUARD(path);
            }
            break;

We write the TYPE_CLASS prefix, and then write the bytes of the class name, without any encoding indication.

Then on load, we just read the bytes and try to lookup the class:

      case TYPE_CLASS:
        {
            VALUE str = r_bytes(arg);

            v = path2class(str);

So on load we're looking for "Cクラス".b.to_sym, which doesn't match :"Cクラス".

To fix this we'd need to include the encoding in the format, but that would mean breaking backward and forward compatibility which is a huge deal.

Half-way solution

Some possible half-way solution would be:

  • Assume non-ASCII class names are UTF-8
  • Raise on dump for class names with non-UTF8 compatible class names.

It's far from ideal though.

Actions

Also available in: Atom PDF

Like0
Like0