Project

General

Profile

Actions

Bug #19439

closed

Marshal.load doesn't load Regexp instance variables

Added by andrykonchin (Andrew Konchin) about 1 year ago. Updated 10 months ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:112433]

Description

Hello, I've noticed this strange behaviour:

source_object = Regexp.new("a")
source_object.instance_variable_set(:@foo, :bar)

regexp = Marshal.load(Marshal.dump(source_object))
regexp.instance_variables # => []

I would expect instance variables to be loaded

Updated by Eregon (Benoit Daloze) about 1 year ago

As a note, if CRuby plans to make all Regexp immutable & no subclass instances (like TruffleRuby already does), then that's another way to fix this.

Updated by byroot (Jean Boussier) about 1 year ago

After a quick check, it's because the IVAR on Regexps is used to store the encoding:

      case TYPE_REGEXP:
        {
            VALUE str = r_bytes(arg);
            int options = r_byte(arg);
            int has_encoding = FALSE;
            st_index_t idx = r_prepare(arg);

            if (ivp) {
                r_ivar(str, &has_encoding, arg);
                *ivp = FALSE;
            }

I'll dig a bit more, but it may be a fundamental limitation of the format.

Updated by byroot (Jean Boussier) about 1 year ago

Ah! the explanation is quite funny.

So the ivars are loaded fine, however, since the goal is to store the encoding, and that the encoding isn't on the regexp itself, but on the regexp#source which is a string, the ivars are assigned on that internal, inacessible string.

require 'objspace'
ObjectSpace.trace_object_allocations_start
GC.start
gen = GC.count
r = Regexp.new("a".b)
r.instance_variable_set(:@ivar, [])
r2 = Marshal.load(Marshal.dump(r))
puts ObjectSpace.dump_all(output: :string, shapes: false, since: gen)
puts '-' * 40
puts ObjectSpace.dump(r2)

Filtered output:


{"address":"0x10323fd58", "type":"REGEXP", "shape_id":0, "slot_size":40, "class":"0x102fd2a98", "references":["0x10323fdf8"], "file":"<internal:marshal>", "line":35, "method":"load", "generation":6, "memsize":500, "flags":{"wb_protected":true}}
{"address":"0x10323fdf8", "type":"STRING", "shape_id":237, "slot_size":40, "class":"0x102fded98", "frozen":true, "embedded":true, "bytesize":1, "value":"a", "encoding":"US-ASCII", "coderange":"unknown", "references":["0x10323fd80"], "file":"<internal:marshal>", "line":35, "method":"load", "generation":6, "memsize":56, "flags":{"wb_protected":true}}
{"address":"0x10323fd80", "type":"ARRAY", "shape_id":0, "slot_size":40, "class":"0x102fd33f8", "length":0, "embedded":true, "file":"<internal:marshal>", "line":35, "method":"load", "generation":6, "memsize":40, "flags":{"wb_protected":true}}
----------------------------------------
{"address":"0x10323fd58", "type":"REGEXP", "shape_id":0, "slot_size":40, "class":"0x102fd2a98", "references":["0x10323fdf8"], "file":"<internal:marshal>", "line":35, "method":"load", "generation":6, "memsize":500, "flags":{"wb_protected":true}}

If you follow the references you see Regexp -> String -> Array(length: 1).

I think this may be fixable, but the fix may be dirty.

Updated by byroot (Jean Boussier) about 1 year ago

Ok, I have a patch for it, but it's ugly: https://github.com/ruby/ruby/pull/7323

Actions #5

Updated by byroot (Jean Boussier) about 1 year ago

  • Status changed from Open to Closed

Applied in changeset git|d2520b7b76759118071a16e6bca22726a5de9fb4.


Marshal.load: restore instance variables on Regexp

[Bug #19439]

The instance variables were restore on the Regexp source,
not the regexp itself.

Unfortunately we have a bit of a chicken and egg problem.

The source holds the encoding, and the encoding need to be set on
the source to be able to instantiate the Regexp.

So the instance variables have to be read on the source.
To correct this we transfert the instance variables after
instantiating the Regexp.

The only way to avoid this would be to read the instance variable
twice and rewind.

Actions #6

Updated by byroot (Jean Boussier) about 1 year ago

  • Backport changed from 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN to 2.7: WONTFIX, 3.0: WONTFIX, 3.1: REQUIRED, 3.2: REQUIRED

Updated by naruse (Yui NARUSE) about 1 year ago

  • Backport changed from 2.7: WONTFIX, 3.0: WONTFIX, 3.1: REQUIRED, 3.2: REQUIRED to 2.7: WONTFIX, 3.0: WONTFIX, 3.1: REQUIRED, 3.2: DONE

ruby_3_2 4e4a4e42b284d9309a7e51c97058093539e7a843 merged revision(s) d2520b7b76759118071a16e6bca22726a5de9fb4.

Updated by nagachika (Tomoyuki Chikanaga) about 1 year ago

When I backported d2520b7b76759118071a16e6bca22726a5de9fb4 into ruby_3_1 on my environment, it fails test with crash report.

[ 7489/21711] TestMarshal#test_regexp2<internal:marshal>:34: [BUG] unreachable
ruby 3.1.4p222 (2023-03-26 revision 19af12ff19) [x86_64-darwin21]

-- Crash Report log information --------------------------------------------
   See Crash Report log file in one of the following locations:             
     * ~/Library/Logs/DiagnosticReports                                     
     * /Library/Logs/DiagnosticReports                                      
   for more details.                                                        
Don't forget to include the above Crash Report log file in bug reports.     

C level backtrace


-- C level backtrace information -------------------------------------------
/Users/nagachika/opt/ruby-3.1/src/build/ruby(rb_print_backtrace+0xf) [0x10948177b] ../ruby_3_1/vm_dump.c:759
/Users/nagachika/opt/ruby-3.1/src/build/ruby(rb_vm_bugreport) ../ruby_3_1/vm_dump.c:1045
/Users/nagachika/opt/ruby-3.1/src/build/ruby(rb_vm_bugreport) (null):0
/Users/nagachika/opt/ruby-3.1/src/build/ruby(bug_report_end+0x0) [0x10928a748] ../ruby_3_1/error.c:798
/Users/nagachika/opt/ruby-3.1/src/build/ruby(rb_bug_without_die) ../ruby_3_1/error.c:798
/Users/nagachika/opt/ruby-3.1/src/build/ruby(rb_bug+0x6f) [0x1094b6319] ../ruby_3_1/error.c:806
/Users/nagachika/opt/ruby-3.1/src/build/ruby(0x1094b87b2) [0x1094b87b2]
/Users/nagachika/opt/ruby-3.1/src/build/ruby(ROBJECT_IV_INDEX_TBL_inline+0x0) [0x10944935f] ../ruby_3_1/variable.c:1648
/Users/nagachika/opt/ruby-3.1/src/build/ruby(obj_ivar_each) ../ruby_3_1/variable.c:1657
/Users/nagachika/opt/ruby-3.1/src/build/ruby(rb_ivar_foreach) ../ruby_3_1/variable.c:1796
/Users/nagachika/opt/ruby-3.1/src/build/ruby(r_entry0+0x0) [0x1092fd845] ../ruby_3_1/marshal.c:1943
/Users/nagachika/opt/ruby-3.1/src/build/ruby(r_object_for) ../ruby_3_1/marshal.c:1945
/Users/nagachika/opt/ruby-3.1/src/build/ruby(r_object0+0x1f) [0x1092fc32b] ../ruby_3_1/marshal.c:1729
/Users/nagachika/opt/ruby-3.1/src/build/ruby(r_object_for) ../ruby_3_1/marshal.c:1769
/Users/nagachika/opt/ruby-3.1/src/build/ruby(r_object0+0x1c) [0x1092f8417] ../ruby_3_1/marshal.c:1729
/Users/nagachika/opt/ruby-3.1/src/build/ruby(r_object) ../ruby_3_1/marshal.c:2197
/Users/nagachika/opt/ruby-3.1/src/build/ruby(rb_marshal_load_with_proc) ../ruby_3_1/marshal.c:2272
/Users/nagachika/opt/ruby-3.1/src/build/ruby(invoke_bf+0x18) [0x109458a6f] ../ruby_3_1/vm_insnhelper.c:5909
/Users/nagachika/opt/ruby-3.1/src/build/ruby(vm_invoke_builtin_delegate) ../ruby_3_1/vm_insnhelper.c:5936
/Users/nagachika/opt/ruby-3.1/src/build/ruby(vm_exec_core) ../ruby_3_1/insns.def:1487
/Users/nagachika/opt/ruby-3.1/src/build/ruby(rb_vm_exec+0xae5) [0x10946a8e5]
(snip)

I believe there may be additional bugs related to instance variable management that need to be backport.
However, this area of code has undergone significant changes with the introduction of Object Shapes in version 3.2/master. While I plan to continue investigating, I have decided to give up backporting this ticket for the time being.

Actions #10

Updated by usa (Usaku NAKAMURA) 10 months ago

  • Backport changed from 2.7: WONTFIX, 3.0: WONTFIX, 3.1: REQUIRED, 3.2: DONE to 2.7: WONTFIX, 3.0: WONTFIX, 3.1: DONE, 3.2: DONE
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0