Project

General

Profile

Actions

Bug #18501

closed

[BUG] try to mark T_NONE object in RubyVM::InstructionSequence. load_from_binary

Added by byroot (Jean Boussier) about 2 years ago. Updated about 2 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 3.1.0p0 (2021-12-25 revision fb4df44d16) [x86_64-linux]
[ruby-core:107203]

Description

<OBJ_INFO:gc_mark_ptr@gc.c:6709> 0x00007fbf1fba1270 [2 M    ] T_NONE
/tmp/bundle/ruby/3.1.0/gems/bootsnap-1.10.1/lib/bootsnap/compile_cache/iseq.rb:49: [BUG] try to mark T_NONE object
ruby 3.1.0p0 (2021-12-25 revision fb4df44d16) [x86_64-linux]
 
-- Control frame information -----------------------------------------------
c:0024 p:---- s:0126 e:000125 CFUNC  :load_from_binary
c:0023 p:0017 s:0121 e:000120 METHOD /tmp/bundle/ruby/3.1.0/gems/bootsnap-1.10.1/lib/bootsnap/compile_cache/iseq.rb:49 [FINISH]
c:0022 p:---- s:0114 e:000113 CFUNC  :fetch
c:0021 p:0061 s:0106 e:000105 METHOD /tmp/bundle/ruby/3.1.0/gems/bootsnap-1.10.1/lib/bootsnap/compile_cache/iseq.rb:60
c:0020 p:0053 s:0099 e:000098 METHOD /tmp/bundle/ruby/3.1.0/gems/bootsnap-1.10.1/lib/bootsnap/compile_cache/iseq.rb:85 [FINISH]
c:0019 p:---- s:0093 e:000092 CFUNC  :require
c:0018 p:0065 s:0088 e:000087 METHOD /tmp/bundle/ruby/3.1.0/gems/bootsnap-1.10.1/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:28
c:0017 p:0066 s:0077 e:000076 METHOD /tmp/bundle/ruby/3.1.0/gems/zeitwerk-2.5.3/lib/zeitwerk/kernel.rb:35
c:0016 p:0005 s:0069 e:000068 BLOCK  /app/test/load_selected_tests.rb:11 [FINISH]
c:0015 p:---- s:0065 e:000064 CFUNC  :each
c:0014 p:0006 s:0061 e:000060 METHOD /app/test/load_selected_tests.rb:10
c:0013 p:0025 s:0057 e:000056 TOP    /app/test/load_selected_tests.rb:28 [FINISH]
c:0012 p:---- s:0054 e:000053 CFUNC  :require
c:0011 p:0016 s:0049 e:000048 BLOCK  /tmp/bundle/ruby/3.1.0/gems/ci-queue-0.22.0/lib/minitest/queue/runner.rb:287 [FINISH]
c:0010 p:---- s:0045 e:000044 CFUNC  :each
c:0009 p:0008 s:0041 e:000040 METHOD /tmp/bundle/ruby/3.1.0/gems/ci-queue-0.22.0/lib/minitest/queue/runner.rb:286
c:0008 p:0221 s:0037 E:001fd0 METHOD /tmp/bundle/ruby/3.1.0/gems/ci-queue-0.22.0/lib/minitest/queue/runner.rb:78 [FINISH]
c:0007 p:---- s:0031 e:000030 CFUNC  :public_send
c:0006 p:0073 s:0026 e:000025 METHOD /tmp/bundle/ruby/3.1.0/gems/ci-queue-0.22.0/lib/minitest/queue/runner.rb:35
c:0005 p:0007 s:0021 e:000020 METHOD /tmp/bundle/ruby/3.1.0/gems/ci-queue-0.22.0/lib/minitest/queue/runner.rb:19
c:0004 p:0034 s:0016 e:000015 TOP    /tmp/bundle/ruby/3.1.0/gems/ci-queue-0.22.0/exe/minitest-queue:5 [FINISH]
c:0003 p:---- s:0013 e:000012 CFUNC  :load
c:0002 p:0124 s:0008 E:0023a0 EVAL   /tmp/bundle/ruby/3.1.0/bin/minitest-queue:25 [FINISH]
c:0001 p:0000 s:0003 E:002230 (none) [FINISH]
 
-- Ruby level backtrace information ----------------------------------------
/tmp/bundle/ruby/3.1.0/bin/minitest-queue:25:in `<main>'
/tmp/bundle/ruby/3.1.0/bin/minitest-queue:25:in `load'
/tmp/bundle/ruby/3.1.0/gems/ci-queue-0.22.0/exe/minitest-queue:5:in `<top (required)>'
/tmp/bundle/ruby/3.1.0/gems/ci-queue-0.22.0/lib/minitest/queue/runner.rb:19:in `invoke'
/tmp/bundle/ruby/3.1.0/gems/ci-queue-0.22.0/lib/minitest/queue/runner.rb:35:in `run!'
/tmp/bundle/ruby/3.1.0/gems/ci-queue-0.22.0/lib/minitest/queue/runner.rb:35:in `public_send'
/tmp/bundle/ruby/3.1.0/gems/ci-queue-0.22.0/lib/minitest/queue/runner.rb:78:in `run_command'
/tmp/bundle/ruby/3.1.0/gems/ci-queue-0.22.0/lib/minitest/queue/runner.rb:286:in `load_tests'
/tmp/bundle/ruby/3.1.0/gems/ci-queue-0.22.0/lib/minitest/queue/runner.rb:286:in `each'
/tmp/bundle/ruby/3.1.0/gems/ci-queue-0.22.0/lib/minitest/queue/runner.rb:287:in `block in load_tests'
/tmp/bundle/ruby/3.1.0/gems/ci-queue-0.22.0/lib/minitest/queue/runner.rb:287:in `require'
/app/test/load_selected_tests.rb:28:in `<top (required)>'
/app/test/load_selected_tests.rb:10:in `require_tests'
/app/test/load_selected_tests.rb:10:in `each'
/app/test/load_selected_tests.rb:11:in `block in require_tests'
/tmp/bundle/ruby/3.1.0/gems/zeitwerk-2.5.3/lib/zeitwerk/kernel.rb:35:in `require'
/tmp/bundle/ruby/3.1.0/gems/bootsnap-1.10.1/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:28:in `require'
/tmp/bundle/ruby/3.1.0/gems/bootsnap-1.10.1/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:28:in `require'
/tmp/bundle/ruby/3.1.0/gems/bootsnap-1.10.1/lib/bootsnap/compile_cache/iseq.rb:85:in `load_iseq'
/tmp/bundle/ruby/3.1.0/gems/bootsnap-1.10.1/lib/bootsnap/compile_cache/iseq.rb:60:in `fetch'
/tmp/bundle/ruby/3.1.0/gems/bootsnap-1.10.1/lib/bootsnap/compile_cache/iseq.rb:60:in `fetch'
/tmp/bundle/ruby/3.1.0/gems/bootsnap-1.10.1/lib/bootsnap/compile_cache/iseq.rb:49:in `storage_to_output'
/tmp/bundle/ruby/3.1.0/gems/bootsnap-1.10.1/lib/bootsnap/compile_cache/iseq.rb:49:in `load_from_binary'
 
-- C level backtrace information -------------------------------------------
/usr/local/ruby/bin/ruby(rb_print_backtrace+0x11) [0x555912686188] vm_dump.c:759
/usr/local/ruby/bin/ruby(rb_vm_bugreport) vm_dump.c:1045
/usr/local/ruby/bin/ruby(bug_report_end+0x0) [0x555912732edf] error.c:797
/usr/local/ruby/bin/ruby(rb_bug_without_die) error.c:797
/usr/local/ruby/bin/ruby(die+0x0) [0x5559124a492c] error.c:805
/usr/local/ruby/bin/ruby(rb_bug) error.c:807
/usr/local/ruby/bin/ruby(gc_mark_ptr+0x138) [0x5559124c6468] gc.c:6710
/usr/local/ruby/bin/ruby(gc_mark+0xb) [0x5559124c6b9b] gc.c:6743
/usr/local/ruby/bin/ruby(mark_keyvalue) gc.c:6303
/usr/local/ruby/bin/ruby(apply_functor+0x13) [0x5559125e7516] st.c:1570
/usr/local/ruby/bin/ruby(st_general_foreach) st.c:1480
/usr/local/ruby/bin/ruby(rb_st_foreach) st.c:1577
/usr/local/ruby/bin/ruby(mark_hash+0xf) [0x5559124c81d7] gc.c:6335
/usr/local/ruby/bin/ruby(gc_mark_children) gc.c:6947
/usr/local/ruby/bin/ruby(gc_marks_wb_unprotected_objects_plane+0x1e) [0x5559124c997e] gc.c:7864
/usr/local/ruby/bin/ruby(rgengc_rememberset_mark_plane) gc.c:7856
/usr/local/ruby/bin/ruby(rgengc_rememberset_mark) gc.c:8319
/usr/local/ruby/bin/ruby(gc_marks_start) gc.c:7844
/usr/local/ruby/bin/ruby(gc_marks) gc.c:8145
/usr/local/ruby/bin/ruby(gc_start) gc.c:8963
/usr/local/ruby/bin/ruby(rb_multi_ractor_p+0x0) [0x5559124ce372] gc.c:8849
/usr/local/ruby/bin/ruby(rb_vm_lock_leave) vm_sync.h:92
/usr/local/ruby/bin/ruby(garbage_collect) gc.c:8851
/usr/local/ruby/bin/ruby(garbage_collect_with_gvl) gc.c:9221
/usr/local/ruby/bin/ruby(objspace_malloc_increase_body) gc.c:11294
/usr/local/ruby/bin/ruby(objspace_malloc_increase_body) gc.c:11272
/usr/local/ruby/bin/ruby(objspace_malloc_fixup) gc.c:11372
/usr/local/ruby/bin/ruby(objspace_xmalloc0) gc.c:11443
/usr/local/ruby/bin/ruby(rb_st_init_table_with_size+0x79) [0x5559125e4f59] st.c:551
/usr/local/ruby/bin/ruby(rebuild_table+0x1e7) [0x5559125e51f7] st.c:727
/usr/local/ruby/bin/ruby(rebuild_table_if_necessary+0xc) [0x5559125e6f14] st.c:1071
/usr/local/ruby/bin/ruby(st_add_direct_with_hash) st.c:1131
/usr/local/ruby/bin/ruby(rb_st_update) st.c:1428
/usr/local/ruby/bin/ruby(register_fstring+0x34) [0x55591260af6e] string.c:450
/usr/local/ruby/bin/ruby(rb_enc_interned_str) string.c:11960
/usr/local/ruby/bin/ruby(ibf_load_object+0xca) [0x5559126cb7aa] compile.c:12567
/usr/local/ruby/bin/ruby(ibf_load_code+0x427) [0x5559126f03e7] compile.c:12541
/usr/local/ruby/bin/ruby(ibf_load_iseq_each+0xca1) [0x5559126ec99b] compile.c:11866
/usr/local/ruby/bin/ruby(rb_ibf_load_iseq_complete) compile.c:12748
/usr/local/ruby/bin/ruby(ibf_load_iseq+0xf2) [0x5559126eff82] compile.c:12803
/usr/local/ruby/bin/ruby(ibf_load_code+0x478) [0x5559126f0438] compile.c:11166
/usr/local/ruby/bin/ruby(ibf_load_iseq_each+0xca1) [0x5559126ec99b] compile.c:11866
/usr/local/ruby/bin/ruby(rb_ibf_load_iseq_complete) compile.c:12748
/usr/local/ruby/bin/ruby(ibf_load_iseq+0xf2) [0x5559126eff82] compile.c:12803
/usr/local/ruby/bin/ruby(rb_iseq_ibf_load+0xb3) [0x5559126f1393] compile.c:12909
/usr/local/ruby/bin/ruby(iseqw_s_load_from_binary+0x10) [0x555912506dc0] iseq.c:3600
/usr/local/ruby/bin/ruby(vm_cfp_consistent_p+0x0) [0x555912666909] vm_insnhelper.c:3037
/usr/local/ruby/bin/ruby(vm_call_cfunc_with_frame) vm_insnhelper.c:3039
/usr/local/ruby/bin/ruby(vm_sendish+0xe) [0x555912677a93] vm_insnhelper.c:4751
/usr/local/ruby/bin/ruby(vm_exec_core) insns.def:778
/usr/local/ruby/bin/ruby(rb_vm_exec+0xb6) [0x555912668f96] vm.c:2211
/usr/local/ruby/bin/ruby(vm_call0_body+0x4db) [0x555912670e7b] vm_eval.c:205
/usr/local/ruby/bin/ruby(rb_funcallv_scope+0x1aa) [0x555912674caa] vm_eval.c:86
/tmp/bundle/ruby/3.1.0/gems/bootsnap-1.10.1/lib/bootsnap/bootsnap.so(try_storage_to_output+0x6c) [0x7fc01c93fc4c] bootsnap.c:943
/usr/local/ruby/bin/ruby(rb_protect+0xfc) [0x5559124aaeec] eval.c:967
/tmp/bundle/ruby/3.1.0/gems/bootsnap-1.10.1/lib/bootsnap/bootsnap.so(fetch_cached_data+0x48) [0x7fc01c9407e0] bootsnap.c:955
/tmp/bundle/ruby/3.1.0/gems/bootsnap-1.10.1/lib/bootsnap/bootsnap.so(bs_fetch) bootsnap.c:737
/tmp/bundle/ruby/3.1.0/gems/bootsnap-1.10.1/lib/bootsnap/bootsnap.so(bs_rb_fetch) bootsnap.c:359
/usr/local/ruby/bin/ruby(vm_cfp_consistent_p+0x0) [0x555912666909] vm_insnhelper.c:3037
/usr/local/ruby/bin/ruby(vm_call_cfunc_with_frame) vm_insnhelper.c:3039
/usr/local/ruby/bin/ruby(vm_sendish+0xe) [0x555912677a93] vm_insnhelper.c:4751
/usr/local/ruby/bin/ruby(vm_exec_core) insns.def:778
/usr/local/ruby/bin/ruby(rb_vm_exec+0xb6) [0x555912668f96] vm.c:2211
/usr/local/ruby/bin/ruby(vm_call0_body+0x4db) [0x555912670e7b] vm_eval.c:205
/usr/local/ruby/bin/ruby(rb_vm_call0+0xc7) [0x5559126716d7] vm_eval.c:57
/usr/local/ruby/bin/ruby(rb_vm_call_kw+0x1e) [0x555912672a1c] vm_eval.c:302
/usr/local/ruby/bin/ruby(rb_check_funcall_default_kw) vm_eval.c:690
/usr/local/ruby/bin/ruby(RB_IMMEDIATE_P+0x0) [0x55591250d79a] iseq.c:956
/usr/local/ruby/bin/ruby(RB_SPECIAL_CONST_P) ./include/ruby/internal/special_consts.h:262
/usr/local/ruby/bin/ruby(rb_iseq_load_iseq) iseq.c:958
/usr/local/ruby/bin/ruby(load_iseq_eval+0xa) [0x5559125152c6] load.c:644
/usr/local/ruby/bin/ruby(require_internal) load.c:1132
/usr/local/ruby/bin/ruby(rb_require_string+0x2c) [0x5559125154b9] load.c:1223
/usr/local/ruby/bin/ruby(rb_f_require) load.c:904
/usr/local/ruby/bin/ruby(vm_cfp_consistent_p+0x0) [0x555912666909] vm_insnhelper.c:3037
/usr/local/ruby/bin/ruby(vm_call_cfunc_with_frame) vm_insnhelper.c:3039
/usr/local/ruby/bin/ruby(vm_call_method_each_type+0x79) [0x55591266a0d9] vm_insnhelper.c:3639
/usr/local/ruby/bin/ruby(vm_call_alias+0x87) [0x55591266b987] vm_insnhelper.c:3189
/usr/local/ruby/bin/ruby(vm_sendish+0xe) [0x555912677a93] vm_insnhelper.c:4751
/usr/local/ruby/bin/ruby(vm_exec_core) insns.def:778
/usr/local/ruby/bin/ruby(rb_vm_exec+0xb6) [0x555912668f96] vm.c:2211
/usr/local/ruby/bin/ruby(rb_yield+0x256) [0x55591266e346] vm.c:1316
/usr/local/ruby/bin/ruby(RB_FL_TEST_RAW+0x0) [0x55591269630c] array.c:2522
/usr/local/ruby/bin/ruby(RB_FL_ANY_RAW) ./include/ruby/internal/fl_type.h:558
/usr/local/ruby/bin/ruby(rb_array_len) ./include/ruby/internal/core/rarray.h:302
/usr/local/ruby/bin/ruby(rb_ary_each) array.c:2521
/usr/local/ruby/bin/ruby(vm_cfp_consistent_p+0x0) [0x555912666909] vm_insnhelper.c:3037
/usr/local/ruby/bin/ruby(vm_call_cfunc_with_frame) vm_insnhelper.c:3039
/usr/local/ruby/bin/ruby(vm_sendish+0xc) [0x555912677b9d] vm_insnhelper.c:4751
/usr/local/ruby/bin/ruby(vm_exec_core) insns.def:759
/usr/local/ruby/bin/ruby(rb_vm_exec+0xb6) [0x555912668f96] vm.c:2211
/usr/local/ruby/bin/ruby(load_iseq_eval+0xa) [0x5559125152fa] load.c:656
/usr/local/ruby/bin/ruby(require_internal) load.c:1132
/usr/local/ruby/bin/ruby(rb_require_string+0x2c) [0x5559125154b9] load.c:1223
/usr/local/ruby/bin/ruby(rb_f_require) load.c:904
/usr/local/ruby/bin/ruby(vm_cfp_consistent_p+0x0) [0x555912666909] vm_insnhelper.c:3037
/usr/local/ruby/bin/ruby(vm_call_cfunc_with_frame) vm_insnhelper.c:3039
/usr/local/ruby/bin/ruby(vm_call_method_each_type+0x79) [0x55591266a0d9] vm_insnhelper.c:3639
/usr/local/ruby/bin/ruby(vm_call_alias+0x87) [0x55591266b987] vm_insnhelper.c:3189
/usr/local/ruby/bin/ruby(vm_call_method_each_type+0x269) [0x55591266a2c9] vm_insnhelper.c:3675
/usr/local/ruby/bin/ruby(vm_call_method+0xb4) [0x55591266a9a4] vm_insnhelper.c:3750
/usr/local/ruby/bin/ruby(vm_sendish+0xe) [0x555912677a93] vm_insnhelper.c:4751
/usr/local/ruby/bin/ruby(vm_exec_core) insns.def:778
/usr/local/ruby/bin/ruby(rb_vm_exec+0xb6) [0x555912668f96] vm.c:2211
/usr/local/ruby/bin/ruby(rb_yield+0x256) [0x55591266e346] vm.c:1316
/usr/local/ruby/bin/ruby(RB_FL_TEST_RAW+0x0) [0x55591269630c] array.c:2522
/usr/local/ruby/bin/ruby(RB_FL_ANY_RAW) ./include/ruby/internal/fl_type.h:558
/usr/local/ruby/bin/ruby(rb_array_len) ./include/ruby/internal/core/rarray.h:302
/usr/local/ruby/bin/ruby(rb_ary_each) array.c:2521
/usr/local/ruby/bin/ruby(vm_cfp_consistent_p+0x0) [0x555912666909] vm_insnhelper.c:3037
/usr/local/ruby/bin/ruby(vm_call_cfunc_with_frame) vm_insnhelper.c:3039
/usr/local/ruby/bin/ruby(vm_sendish+0xc) [0x555912677b9d] vm_insnhelper.c:4751
/usr/local/ruby/bin/ruby(vm_exec_core) insns.def:759
/usr/local/ruby/bin/ruby(rb_vm_exec+0xb6) [0x555912668f96] vm.c:2211
/usr/local/ruby/bin/ruby(vm_call0_body+0x4db) [0x555912670e7b] vm_eval.c:205
/usr/local/ruby/bin/ruby(rb_call0+0x20d) [0x555912673bad] vm_eval.c:86
/usr/local/ruby/bin/ruby(send_internal+0xef) [0x5559126742af] vm_eval.c:1261
/usr/local/ruby/bin/ruby(vm_cfp_consistent_p+0x0) [0x555912666909] vm_insnhelper.c:3037
/usr/local/ruby/bin/ruby(vm_call_cfunc_with_frame) vm_insnhelper.c:3039
/usr/local/ruby/bin/ruby(vm_call_method_each_type+0x79) [0x55591266a0d9] vm_insnhelper.c:3639
/usr/local/ruby/bin/ruby(vm_call_method+0xb4) [0x55591266a9a4] vm_insnhelper.c:3750
/usr/local/ruby/bin/ruby(vm_sendish+0xe) [0x555912677a93] vm_insnhelper.c:4751
/usr/local/ruby/bin/ruby(vm_exec_core) insns.def:778
/usr/local/ruby/bin/ruby(rb_vm_exec+0xb6) [0x555912668f96] vm.c:2211
/usr/local/ruby/bin/ruby(raise_load_if_failed+0x0) [0x5559125140cc] load.c:656
/usr/local/ruby/bin/ruby(rb_load_internal) load.c:719
/usr/local/ruby/bin/ruby(rb_f_load+0xb1) [0x555912514271] load.c:792
/usr/local/ruby/bin/ruby(vm_cfp_consistent_p+0x0) [0x555912666909] vm_insnhelper.c:3037
/usr/local/ruby/bin/ruby(vm_call_cfunc_with_frame) vm_insnhelper.c:3039
/usr/local/ruby/bin/ruby(vm_call_method_each_type+0x79) [0x55591266a0d9] vm_insnhelper.c:3639
/usr/local/ruby/bin/ruby(vm_call_method+0xb4) [0x55591266a9a4] vm_insnhelper.c:3750
/usr/local/ruby/bin/ruby(vm_sendish+0xe) [0x555912677a93] vm_insnhelper.c:4751
/usr/local/ruby/bin/ruby(vm_exec_core) insns.def:778
/usr/local/ruby/bin/ruby(rb_vm_exec+0xb6) [0x555912668f96] vm.c:2211
/usr/local/ruby/bin/ruby(rb_ec_exec_node+0xbb) [0x5559124a4efb] eval.c:280
/usr/local/ruby/bin/ruby(ruby_run_node+0x4f) [0x5559124aa13f] eval.c:321
/usr/local/ruby/bin/ruby(main+0x5f) [0x5559124a4c7f] error.c:3180

From my limited understanding it seems to happen if GC triggers at a very specific point.


Files

0001-Guard-hash-k-v.patch (881 Bytes) 0001-Guard-hash-k-v.patch tenderlovemaking (Aaron Patterson), 01/25/2022 11:55 PM

Updated by tenderlovemaking (Aaron Patterson) about 2 years ago

Interesting. Are you able to get a core file? I can poke around that code, but a core file would be very helpful.

Thanks!

Updated by byroot (Jean Boussier) about 2 years ago

I sent the core files to Aaron privately.

Updated by tenderlovemaking (Aaron Patterson) about 2 years ago

Hash is writing T_NONE references

I think it's possible that T_NONE objects are being written in to a hash. Basically we dup the string key, and if the hash needs to expand, it does so before actually inserting the key. The expansion of the hash causes a malloc which kicks the GC, and the compiler has optimized the code such that the GC doesn't see a reference to the object in the stack or a register.

The GC ends up collecting the string, then the Hash writes a T_NONE to the underlying table.

Lets follow the order of events for code like this:

hash["foo"] = "bar"

1. Ruby -> rb_hash_aset is called

This function checks if the key is a string, and if so it does something special. Namely, it calls RHASH_UPDATE_ITER with a special callback hash_aset_str. RHASH_UPDATE_ITER is just a wrapper for tbl_update.

2. rb_hash_aset -> tbl_update.

This function just calls in to rb_hash_stlike_update along with a struct. The func member of the struct is our callback hash_aset_str.

3. tbl_update -> rb_hash_stlike_update

This function just checks if we have an AR table or not. In this case, we have an AR table, so we just call ar_update. arg is passed to ar_update and it has the hash_aset_str function pointer. We also pass tbl_update_modify as a callback.

4. rb_hash_stlike_update -> ar_update

ar_update does some checks, then eventually calls tbl_update_modify as a callback, passing arg.

5. ar_update -> tbl_update_modify

tbl_update_modify finally calls hash_aset_str as a callback

6. tbl_update_modify -> hash_aset_str

hash_aset_str checks the key. The key doesn't exist (it's new), but it's also not frozen. So we allocate a new frozen string object and assign that to *key so that the caller can read it.

Now we return up the stack. hash_aset_str returns control to tbl_update_modify

7. tbl_update_modify

tbl_update_modify executes a write barrier on the key and value.

This is odd because the key and value haven't been written yet. Neither are reachable via the hash yet. Since we have allocated a new string, the string's liveness depends on being found in the C stack or a register. It's not referenced from the hash yet.

I think it's OK if the write barrier happens here, but it is odd that we execute the barrier but the reference isn't "real" yet.

tbl_update_modify returns control to ar_update

8. ar_update

ar_update finally adds the reference to the hash.

I think this is where the bug is. Adding a reference to the hash could cause the hash to expand which would cause xmalloc to execute, and this could cause GC to run. The compiler could have optimized this code in such a way that the reference to the key is not on the stack and no longer in a register.

Updated by byroot (Jean Boussier) about 2 years ago

Thanks @tenderlove, your explanation makes sense to me.

From your description we should be able to reproduce with GC.stress = true no?

Updated by byroot (Jean Boussier) about 2 years ago

  • Backport changed from 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN to 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: REQUIRED

@tenderlovemaking (Aaron Patterson) implemented a patch for this: https://github.com/ruby/ruby/pull/5525, we deployed it on our infra and so far the bug is gone. I'd like to let it run for a few more days to be fully certain though.

@tenderlovemaking (Aaron Patterson), do you think this bug might have been present on any older version?

Actions #6

Updated by tenderlovemaking (Aaron Patterson) about 2 years ago

  • Status changed from Open to Closed

Applied in changeset git|2a76440fac62bb0f6e53ccada07caf4b47b78cf9.


[Bug #18501] Fire write barrier after hash has been written

Before this change the write barrier was executed before the key and
value were actually reachable via the Hash. This could cause
inconsistencies in object coloration which would lead to accidental
collection of dup'd keys.

Example:

  1. Object O is grey, Object P is white.
  2. Write barrier fires O -> P
  3. Write barrier does nothing
  4. Malloc happens, which starts GC
  5. GC colors O black
  6. P is written in to O (now we have O -> P reference)
  7. P is now accidentally treated as garbage

Updated by byroot (Jean Boussier) about 2 years ago

  • Backport changed from 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: REQUIRED to 2.6: DONTNEED, 2.7: DONTNEED, 3.0: DONTNEED, 3.1: REQUIRED

I'd like to let it run for a few more days to be fully certain though.

We ran it quite enough and saw no more crashes, so we're quite confident the patches work.

I'm marking 3.0 and older as DONTNEED because ne never noticed that crash when we were running those versions.

Updated by naruse (Yui NARUSE) about 2 years ago

  • Backport changed from 2.6: DONTNEED, 2.7: DONTNEED, 3.0: DONTNEED, 3.1: REQUIRED to 2.6: DONTNEED, 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONE

ruby_3_1 86c8e15170484fe23b311e567717053f147ffd9c merged revision(s) 2a76440fac62b.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0