Bug #19439
closedMarshal.load doesn't load Regexp instance variables
Description
Hello, I've noticed this strange behaviour:
source_object = Regexp.new("a")
source_object.instance_variable_set(:@foo, :bar)
regexp = Marshal.load(Marshal.dump(source_object))
regexp.instance_variables # => []
I would expect instance variables to be loaded
Updated by Eregon (Benoit Daloze) over 1 year ago
As a note, if CRuby plans to make all Regexp immutable & no subclass instances (like TruffleRuby already does), then that's another way to fix this.
Updated by byroot (Jean Boussier) over 1 year ago
After a quick check, it's because the IVAR on Regexps is used to store the encoding:
case TYPE_REGEXP:
{
VALUE str = r_bytes(arg);
int options = r_byte(arg);
int has_encoding = FALSE;
st_index_t idx = r_prepare(arg);
if (ivp) {
r_ivar(str, &has_encoding, arg);
*ivp = FALSE;
}
I'll dig a bit more, but it may be a fundamental limitation of the format.
Updated by byroot (Jean Boussier) over 1 year ago
Ah! the explanation is quite funny.
So the ivars are loaded fine, however, since the goal is to store the encoding, and that the encoding isn't on the regexp itself, but on the regexp#source
which is a string, the ivars are assigned on that internal, inacessible string.
require 'objspace'
ObjectSpace.trace_object_allocations_start
GC.start
gen = GC.count
r = Regexp.new("a".b)
r.instance_variable_set(:@ivar, [])
r2 = Marshal.load(Marshal.dump(r))
puts ObjectSpace.dump_all(output: :string, shapes: false, since: gen)
puts '-' * 40
puts ObjectSpace.dump(r2)
Filtered output:
{"address":"0x10323fd58", "type":"REGEXP", "shape_id":0, "slot_size":40, "class":"0x102fd2a98", "references":["0x10323fdf8"], "file":"<internal:marshal>", "line":35, "method":"load", "generation":6, "memsize":500, "flags":{"wb_protected":true}}
{"address":"0x10323fdf8", "type":"STRING", "shape_id":237, "slot_size":40, "class":"0x102fded98", "frozen":true, "embedded":true, "bytesize":1, "value":"a", "encoding":"US-ASCII", "coderange":"unknown", "references":["0x10323fd80"], "file":"<internal:marshal>", "line":35, "method":"load", "generation":6, "memsize":56, "flags":{"wb_protected":true}}
{"address":"0x10323fd80", "type":"ARRAY", "shape_id":0, "slot_size":40, "class":"0x102fd33f8", "length":0, "embedded":true, "file":"<internal:marshal>", "line":35, "method":"load", "generation":6, "memsize":40, "flags":{"wb_protected":true}}
----------------------------------------
{"address":"0x10323fd58", "type":"REGEXP", "shape_id":0, "slot_size":40, "class":"0x102fd2a98", "references":["0x10323fdf8"], "file":"<internal:marshal>", "line":35, "method":"load", "generation":6, "memsize":500, "flags":{"wb_protected":true}}
If you follow the references you see Regexp -> String -> Array(length: 1)
.
I think this may be fixable, but the fix may be dirty.
Updated by byroot (Jean Boussier) over 1 year ago
Ok, I have a patch for it, but it's ugly: https://github.com/ruby/ruby/pull/7323
Updated by byroot (Jean Boussier) over 1 year ago
- Status changed from Open to Closed
Applied in changeset git|d2520b7b76759118071a16e6bca22726a5de9fb4.
Marshal.load: restore instance variables on Regexp
[Bug #19439]
The instance variables were restore on the Regexp source,
not the regexp itself.
Unfortunately we have a bit of a chicken and egg problem.
The source holds the encoding, and the encoding need to be set on
the source to be able to instantiate the Regexp.
So the instance variables have to be read on the source
.
To correct this we transfert the instance variables after
instantiating the Regexp.
The only way to avoid this would be to read the instance variable
twice and rewind.
Updated by byroot (Jean Boussier) over 1 year ago
- Backport changed from 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN to 2.7: WONTFIX, 3.0: WONTFIX, 3.1: REQUIRED, 3.2: REQUIRED
Updated by andrykonchin (Andrew Konchin) over 1 year ago
Thank you!
Updated by naruse (Yui NARUSE) over 1 year ago
- Backport changed from 2.7: WONTFIX, 3.0: WONTFIX, 3.1: REQUIRED, 3.2: REQUIRED to 2.7: WONTFIX, 3.0: WONTFIX, 3.1: REQUIRED, 3.2: DONE
ruby_3_2 4e4a4e42b284d9309a7e51c97058093539e7a843 merged revision(s) d2520b7b76759118071a16e6bca22726a5de9fb4.
Updated by nagachika (Tomoyuki Chikanaga) over 1 year ago
When I backported d2520b7b76759118071a16e6bca22726a5de9fb4 into ruby_3_1 on my environment, it fails test with crash report.
[ 7489/21711] TestMarshal#test_regexp2<internal:marshal>:34: [BUG] unreachable
ruby 3.1.4p222 (2023-03-26 revision 19af12ff19) [x86_64-darwin21]
-- Crash Report log information --------------------------------------------
See Crash Report log file in one of the following locations:
* ~/Library/Logs/DiagnosticReports
* /Library/Logs/DiagnosticReports
for more details.
Don't forget to include the above Crash Report log file in bug reports.
C level backtrace
-- C level backtrace information -------------------------------------------
/Users/nagachika/opt/ruby-3.1/src/build/ruby(rb_print_backtrace+0xf) [0x10948177b] ../ruby_3_1/vm_dump.c:759
/Users/nagachika/opt/ruby-3.1/src/build/ruby(rb_vm_bugreport) ../ruby_3_1/vm_dump.c:1045
/Users/nagachika/opt/ruby-3.1/src/build/ruby(rb_vm_bugreport) (null):0
/Users/nagachika/opt/ruby-3.1/src/build/ruby(bug_report_end+0x0) [0x10928a748] ../ruby_3_1/error.c:798
/Users/nagachika/opt/ruby-3.1/src/build/ruby(rb_bug_without_die) ../ruby_3_1/error.c:798
/Users/nagachika/opt/ruby-3.1/src/build/ruby(rb_bug+0x6f) [0x1094b6319] ../ruby_3_1/error.c:806
/Users/nagachika/opt/ruby-3.1/src/build/ruby(0x1094b87b2) [0x1094b87b2]
/Users/nagachika/opt/ruby-3.1/src/build/ruby(ROBJECT_IV_INDEX_TBL_inline+0x0) [0x10944935f] ../ruby_3_1/variable.c:1648
/Users/nagachika/opt/ruby-3.1/src/build/ruby(obj_ivar_each) ../ruby_3_1/variable.c:1657
/Users/nagachika/opt/ruby-3.1/src/build/ruby(rb_ivar_foreach) ../ruby_3_1/variable.c:1796
/Users/nagachika/opt/ruby-3.1/src/build/ruby(r_entry0+0x0) [0x1092fd845] ../ruby_3_1/marshal.c:1943
/Users/nagachika/opt/ruby-3.1/src/build/ruby(r_object_for) ../ruby_3_1/marshal.c:1945
/Users/nagachika/opt/ruby-3.1/src/build/ruby(r_object0+0x1f) [0x1092fc32b] ../ruby_3_1/marshal.c:1729
/Users/nagachika/opt/ruby-3.1/src/build/ruby(r_object_for) ../ruby_3_1/marshal.c:1769
/Users/nagachika/opt/ruby-3.1/src/build/ruby(r_object0+0x1c) [0x1092f8417] ../ruby_3_1/marshal.c:1729
/Users/nagachika/opt/ruby-3.1/src/build/ruby(r_object) ../ruby_3_1/marshal.c:2197
/Users/nagachika/opt/ruby-3.1/src/build/ruby(rb_marshal_load_with_proc) ../ruby_3_1/marshal.c:2272
/Users/nagachika/opt/ruby-3.1/src/build/ruby(invoke_bf+0x18) [0x109458a6f] ../ruby_3_1/vm_insnhelper.c:5909
/Users/nagachika/opt/ruby-3.1/src/build/ruby(vm_invoke_builtin_delegate) ../ruby_3_1/vm_insnhelper.c:5936
/Users/nagachika/opt/ruby-3.1/src/build/ruby(vm_exec_core) ../ruby_3_1/insns.def:1487
/Users/nagachika/opt/ruby-3.1/src/build/ruby(rb_vm_exec+0xae5) [0x10946a8e5]
(snip)
I believe there may be additional bugs related to instance variable management that need to be backport.
However, this area of code has undergone significant changes with the introduction of Object Shapes in version 3.2/master. While I plan to continue investigating, I have decided to give up backporting this ticket for the time being.
Updated by usa (Usaku NAKAMURA) about 1 year ago
- Backport changed from 2.7: WONTFIX, 3.0: WONTFIX, 3.1: REQUIRED, 3.2: DONE to 2.7: WONTFIX, 3.0: WONTFIX, 3.1: DONE, 3.2: DONE