Bug #17984


[BUG] try to mark T_NONE object

Added by byroot (Jean Boussier) almost 2 years ago. Updated almost 2 years ago.

Target version:


About 1% of our CI jobs fail with this error.

If necessary I can try to find a repro, but these GC issues are hard to pin down because they require GC.stress which is very slow.

<OBJ_INFO:gc_mark_ptr@gc.c:6687> 0x00007f6717407ee8 [2 M   ] T_NONE
/tmp/bundle/ruby/3.1.0/gems/bootsnap-1.7.5/lib/bootsnap/compile_cache/yaml.rb:117: [BUG] try to mark T_NONE object
ruby 3.1.0dev (2021-06-12T19:33:15Z canary-fix 38bf83611e) [x86_64-linux]
-- Control frame information -----------------------------------------------
c:0119 p:---- s:0680 e:000679 CFUNC  :fetch
c:0118 p:0174 s:0672 e:000671 METHOD /tmp/bundle/ruby/3.1.0/gems/bootsnap-1.7.5/lib/bootsnap/compile_cache/yaml.rb:117
-- Ruby level backtrace information ----------------------------------------
/tmp/bundle/ruby/3.1.0/gems/bootsnap-1.7.5/lib/bootsnap/compile_cache/yaml.rb:117:in `load_file'
/tmp/bundle/ruby/3.1.0/gems/bootsnap-1.7.5/lib/bootsnap/compile_cache/yaml.rb:117:in `fetch'
-- C level backtrace information -------------------------------------------
/usr/local/bin/ruby(rb_print_backtrace+0x11) [0x5581a8716618] vm_dump.c:759
/usr/local/bin/ruby(rb_vm_bugreport) vm_dump.c:1041
/usr/local/bin/ruby(bug_report_end+0x0) [0x5581a8549aef] error.c:777
/usr/local/bin/ruby(rb_bug_without_die) error.c:777
/usr/local/bin/ruby(die+0x0) [0x5581a851857a] error.c:785
/usr/local/bin/ruby(rb_bug) error.c:787
/usr/local/bin/ruby(gc_mark_ptr+0x151) [0x5581a856f7b1] gc.c:6688
/usr/local/bin/ruby(each_insn_value+0x5) [0x5581a85a629d] iseq.c:327
/usr/local/bin/ruby(iseq_extract_values) iseq.c:178
/usr/local/bin/ruby(rb_iseq_each_value) iseq.c:246
/usr/local/bin/ruby(rb_iseq_mark) iseq.c:342
/usr/local/bin/ruby(gc_mark_imemo+0x60) [0x5581a8570720] gc.c:6802
/usr/local/bin/ruby(gc_mark_children) gc.c:6884
/usr/local/bin/ruby(gc_mark_stacked_objects+0x3d) [0x5581a8574a0d] gc.c:7068
/usr/local/bin/ruby(gc_mark_stacked_objects_incremental) gc.c:7102
/usr/local/bin/ruby(gc_marks_rest) gc.c:8084
/usr/local/bin/ruby(gc_rest+0x170) [0x5581a8574db0] gc.c:9003
/usr/local/bin/ruby(gc_rest+0x8) [0x5581a8572df8] gc.c:8875
/usr/local/bin/ruby(garbage_collect) gc.c:8864
/usr/local/bin/ruby(garbage_collect_with_gvl+0x83) [0x5581a8572f03] gc.c:9178
/usr/local/bin/ruby(objspace_malloc_fixup+0x17) [0x5581a8576341] gc.c:11282
/usr/local/bin/ruby(objspace_xmalloc0) gc.c:11353
/usr/local/bin/ruby(str_new0+0xda) [0x5581a868512a] string.c:797
/tmp/bundle/ruby/3.1.0/gems/bootsnap-1.7.5/lib/bootsnap/ [0x7f67ecc403e3] bootsnap.c:525
/tmp/bundle/ruby/3.1.0/gems/bootsnap-1.7.5/lib/bootsnap/ bootsnap.c:527
/tmp/bundle/ruby/3.1.0/gems/bootsnap-1.7.5/lib/bootsnap/ bootsnap.c:735
/tmp/bundle/ruby/3.1.0/gems/bootsnap-1.7.5/lib/bootsnap/ bootsnap.c:362
/usr/local/bin/ruby(vm_call_cfunc_with_frame+0x10e) [0x5581a86ec9de] vm_insnhelper.c:2943
/usr/local/bin/ruby(vm_sendish+0x30e) [0x5581a86f847e] vm_insnhelper.c:4521
/usr/local/bin/ruby(vm_exec_core+0xf7) [0x5581a87036a7] insns.def:773
/usr/local/bin/ruby(rb_vm_exec+0xbc) [0x5581a86f8c9c] vm.c:2169
/usr/local/bin/ruby(load_iseq_eval+0xa) [0x5581a85af74a] load.c:594
/usr/local/bin/ruby(require_internal) load.c:1065
/usr/local/bin/ruby(rb_require_string+0x24) [0x5581a85afa1e] load.c:1142
/usr/local/bin/ruby(rb_f_require) load.c:838


Updated by tenderlovemaking (Aaron Patterson) almost 2 years ago

  • Assignee set to tenderlovemaking (Aaron Patterson)

Can you send me a core file and I can take a look?

Updated by byroot (Jean Boussier) almost 2 years ago

I didn't have the core dump and this issue only happens very rarely. I just caught it again today and this time I have the core dump. I forwarded it to @tenderlovemaking (Aaron Patterson).

Updated by tenderlovemaking (Aaron Patterson) almost 2 years ago

I think I was finally able to track this down. The iseq marks references by walking the instructions looking for Ruby objects. The bug occurred here:

(lldb) f 6
frame #6: 0x000055886314df28 ruby`rb_iseq_mark at iseq.c:178
   175 	            {
   176 	              VALUE op = code[pos + op_no + 1];
   177 	              if (!SPECIAL_CONST_P(op)) {
-> 178 	                  VALUE newop = func(data, op);
   179 	                  if (newop != op) {
   180 	                      code[pos + op_no + 1] = newop;
   181 	                  }
(lldb) p code + pos
(VALUE *) $22 = 0x00007f0fd261e700

So the instruction living at 0x00007f0fd261e700 had a bad reference.

I was finally able to disassemble the iseq from the core file. Here is the instruction (including some surrounding):

0x7f0fd261e6e8 0093 getlocal_WC_0( 7 )
0x7f0fd261e6f8 0095 dup
0x7f0fd261e700 0096 opt_case_dispatch( CDHASH (VALUE)0x7f0fdabbc3a8, 24 )
0x7f0fd261e718 0099 putobject( (VALUE)0x7f0fea201e70 )
0x7f0fd261e728 0101 topn( 1 )

0x7f0fd261e700 corresponds to an opt_case_dispatch instruction, and it's trying to mark the CDHASH. Unfortunately the CDHASH reference has gone bad:

(lldb) rp 0x7f0fdabbc3a8
bits: [LM   ]
T_NONE: (RBasic) *$23 = (flags = 64, klass = 139706070975120)

It looks like b2fc592c3046e60fdfbb5692d52cc7cbf814b6d0 introduced a dup on the CDHASH when loading from IBF ( ). Nothing keeps that hash alive during the process of IBF loading, so if GC happens to run during an IBF load then this hash could be collected.

IBF loading keeps a temporary array of Ruby objects in order to maintain liveness during load. We actually only care about the copied hash (not the original), so I made a patch that replaces the original hash in our temporary object list.

The patch is here:

I've also attached it. Thanks @xrxr for helping me track this down!

Actions #4

Updated by tenderlovemaking (Aaron Patterson) almost 2 years ago

  • Status changed from Open to Closed

Applied in changeset git|2599d1a8dff29a2376f36c8cc301839b454fc064.

Store the dup'd CDHASH in the object list during IBF load

Since b2fc592c304 nothing was holding a reference to the dup'd CDHASH
during IBF loading. If a GC happened to run during IBF load then the
copied hash wouldn't have anything to keep it alive. We don't really
want to keep the originally loaded CDHASH hash, so this patch just
overwrites the original hash with the copied / modified hash.

[Bug #17984] [ruby-core:104259]


Also available in: Atom PDF