Project

General

Profile

Actions

Bug #18786

closed

Rescuing NotImplementedError from `GC.verify_compaction_references` can lead to infinite loop on platforms

Added by jprokop (Jarek Prokop) about 1 month ago. Updated about 1 month ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [powerpc64le-linux]
[ruby-core:108569]

Description

GC.verify_compaction_references throws an NotImplementedError on platforms that do not support it as is expected.
However, rescuing this exception after calling the method can lead to an infinite loop later.
The infinite loop was observed on ppc64le and sometimes aarch64 mainly during package builds on Fedora Rawhide, RHEL 8, RHEL 9 (all are Ruby 3.1), with test suites of the gems mysql2 [0] and pg [1].

Minimal reproducer:

count = 0

begin
  GC.verify_compaction_references(double_heap: true, toward: :empty)
rescue NotImplementedError
  count += 1
end

100_000.times do |i|
  puts "i: #{i}, count: #{count}"
end

When I run this script, value of count is 1 and the script gets stuck before value of i reaches 10k in an infinite loop. Script ends execution normally after commenting out the line GC.verify_compaction_references(double_heap: true, toward: :empty).

The infinite loop occurs in this while loop: https://github.com/ruby/ruby/blob/v3_1_2/gc.c#L2483

GDB backtrace after attaching to the process:

#0  heap_next_freepage (heap=<optimized out>, size_pool=<optimized out>, objspace=<optimized out>) at gc.c:2391
#1  ractor_cache_slots (size_pool_idx=<optimized out>, cr=<optimized out>, objspace=<optimized out>) at gc.c:2424
#2  newobj_slowpath (size_pool_idx=0, wb_protected=1, cr=0x100359c2180, objspace=0x100359c1390, flags=5, klass=0) at gc.c:2484
#3  newobj_slowpath_wb_protected.constprop.0 (klass=0, flags=5, objspace=0x100359c1390, cr=0x100359c2180, size_pool_idx=0) at gc.c:2504
#4  0x00007fff885e7750 in newobj_of0 (alloc_size=<optimized out>, cr=<optimized out>, wb_protected=1, flags=<optimized out>, klass=0)
    at gc.c:2545
#5  newobj_of (alloc_size=<optimized out>, wb_protected=1, v3=0, v2=0, v1=0, flags=<optimized out>, klass=0) at gc.c:2555
#6  rb_wb_protected_newobj_of (klass=0, flags=<optimized out>, size=<optimized out>) at gc.c:2577
#7  0x00007fff88750b14 in str_alloc (size=<optimized out>, klass=0) at string.c:845
#8  str_alloc_embed (capa=<error reading variable: value has been optimized out>, klass=0) at string.c:858
#9  str_new0 (klass=klass@entry=0, ptr=0x10035a2ba40 "i: 8298, count: 1", len=len@entry=17, termlen=termlen@entry=1) at string.c:888
#10 0x00007fff88752118 in str_new (len=17, ptr=<optimized out>, klass=0) at string.c:914
#11 str_new_frozen_buffer (klass=0, orig=1100411484720, copy_encoding=<optimized out>) at string.c:1439
#12 0x00007fff88616e28 in io_fwritev (argc=argc@entry=2, argv=argv@entry=0x7fffeac41e48, fptr=fptr@entry=0x10035aa8dc0) at io.c:1963
#13 0x00007fff886175a0 in io_writev (argc=<optimized out>, argv=0x7fffeac41e48, io=<optimized out>) at io.c:2015
#14 0x00007fff887c1438 in ractor_safe_call_cfunc_m1 (recv=<optimized out>, argc=<optimized out>, argv=<optimized out>, func=<optimized out>)
    at vm_insnhelper.c:2847
#15 0x00007fff887e01cc in vm_call0_cfunc_with_frame (argv=0x7fffeac41e48, calling=0x7fffeac41c58, ec=0x100359c1b30) at vm_eval.c:149
#16 vm_call0_cfunc (argv=0x7fffeac41e48, calling=0x7fffeac41c58, ec=0x100359c1b30) at vm_eval.c:163
#17 vm_call0_body (ec=ec@entry=0x100359c1b30, calling=calling@entry=0x7fffeac41d18, argv=argv@entry=0x7fffeac41e48) at vm_eval.c:209
#18 0x00007fff887e1da0 in vm_call0_cc (kw_splat=0, cc=0x10035ee3e38, argv=0x7fffeac41e48, argc=2, id=8961, recv=1100411616120,
    ec=0x100359c1b30) at vm_eval.c:86
#19 rb_funcallv_scope (scope=CALL_FCALL, argv=0x7fffeac41e48, argc=2, mid=8961, recv=1100411616120) at vm_eval.c:1050
#20 rb_funcallv (recv=1100411616120, mid=8961, argc=<optimized out>, argv=0x7fffeac41e48) at vm_eval.c:1065
#21 0x00007fff88815d74 in rb_io_writev.isra.0 (io=1100411616120, argc=<optimized out>, argv=0x7fffeac41e48) at io.c:2082
#22 0x00007fff88623ab8 in rb_io_puts (out=<optimized out>, argv=<optimized out>, argc=<optimized out>) at io.c:8234
#23 rb_io_puts (argc=<optimized out>, argv=<optimized out>, out=1100411616120) at io.c:8208
#24 0x00007fff887c1438 in ractor_safe_call_cfunc_m1 (recv=<optimized out>, argc=<optimized out>, argv=<optimized out>, func=<optimized out>)
    at vm_insnhelper.c:2847
#25 0x00007fff887e01cc in vm_call0_cfunc_with_frame (argv=0x7fff87ec0090, calling=0x7fffeac41f48, ec=0x100359c1b30) at vm_eval.c:149
#26 vm_call0_cfunc (argv=0x7fff87ec0090, calling=0x7fffeac41f48, ec=0x100359c1b30) at vm_eval.c:163
#27 vm_call0_body (ec=ec@entry=0x100359c1b30, calling=calling@entry=0x7fffeac42038, argv=argv@entry=0x7fff87ec0090) at vm_eval.c:209
#28 0x00007fff887dedb0 in vm_call0_cc (kw_splat=0, cc=0x10035ee3e60, argv=0x7fff87ec0090, argc=1, id=16417, recv=1100411616120,
    ec=0x100359c1b30) at vm_eval.c:86
#29 rb_call0 (ec=0x100359c1b30, recv=1100411616120, mid=16417, argc=<optimized out>, argv=0x7fff87ec0090, call_scope=<optimized out>,
    self=1100411831000) at vm_eval.c:550
#30 0x00007fff88623db0 in rb_f_puts (argc=<optimized out>, argv=0x7fff87ec0090, recv=<optimized out>)
    at ./include/ruby/internal/symbol.h:296
#31 0x00007fff887c1438 in ractor_safe_call_cfunc_m1 (recv=<optimized out>, argc=<optimized out>, argv=<optimized out>, func=<optimized out>)
    at vm_insnhelper.c:2847
#32 0x00007fff887c5348 in vm_call_cfunc_with_frame (ec=0x100359c1b30, reg_cfp=0x7fff87fbff10, calling=<optimized out>)
    at vm_insnhelper.c:3037
#33 0x00007fff887c6714 in vm_sendish (ec=0x100359c1b30, reg_cfp=0x7fff87fbff10, cd=0x10035cd3dc0, block_handler=<optimized out>,
    method_explorer=<optimized out>) at vm_callinfo.h:349
#34 0x00007fff887d1cb0 in vm_exec_core (ec=0x100359c1b30, initial=<optimized out>) at insns.def:778
#35 0x00007fff887f28d4 in rb_vm_exec (ec=0x100359c1b30, mjit_enable_p=<optimized out>) at vm.c:2228
#36 0x00007fff88697cd0 in invoke_block (captured=<optimized out>, captured=<optimized out>, opt_pc=<optimized out>, type=<optimized out>,
    cref=0x0, self=1100411831000, iseq=0x10035d00418, ec=0x100359c1b30) at vm_insnhelper.c:403
#37 invoke_iseq_block_from_c (me=0x0, is_lambda=<optimized out>, cref=0x0, passed_block_handler=0, kw_splat=0, argv=0x7fffeac42940, argc=1,
    self=1100411831000, captured=<optimized out>, ec=0x100359c1b30) at vm.c:1380
#38 invoke_block_from_c_bh (force_blockarg=<optimized out>, is_lambda=<optimized out>, cref=<optimized out>,
    passed_block_handler=<optimized out>, kw_splat=<optimized out>, argv=<optimized out>, argc=<optimized out>,
    block_handler=<optimized out>, ec=<optimized out>) at vm.c:1398
#39 vm_yield_with_cref (is_lambda=0, cref=0x0, kw_splat=0, argv=0x7fffeac42940, argc=1, ec=0x100359c1b30) at vm.c:1435
#40 vm_yield (kw_splat=0, argv=0x7fffeac42940, argc=1, ec=0x100359c1b30) at vm.c:1443
#41 rb_yield_0 (argv=0x7fffeac42940, argc=1) at vm_eval.c:1347
#42 rb_yield_1 (val=<optimized out>) at vm_eval.c:1353
#43 int_dotimes (num=2000001) at numeric.c:5701
#44 0x00007fff887c1484 in ractor_safe_call_cfunc_0 (recv=<optimized out>, argc=<optimized out>, argv=<optimized out>, func=<optimized out>)
    at vm_insnhelper.c:2854
#45 0x00007fff887c5348 in vm_call_cfunc_with_frame (ec=0x100359c1b30, reg_cfp=0x7fff87fbff90, calling=<optimized out>)
    at vm_insnhelper.c:3037
#46 0x00007fff887c6714 in vm_sendish (ec=0x100359c1b30, reg_cfp=0x7fff87fbff90, cd=0x10035d58820, block_handler=<optimized out>,
    method_explorer=<optimized out>) at vm_callinfo.h:349
#47 0x00007fff887d2a28 in vm_exec_core (ec=0x100359c1b30, initial=<optimized out>) at insns.def:759
#48 0x00007fff887f28d4 in rb_vm_exec (ec=0x100359c1b30, mjit_enable_p=<optimized out>) at vm.c:2228
#49 0x00007fff887f4224 in rb_iseq_eval_main (iseq=<optimized out>) at vm.c:2477
#50 0x00007fff885d3b78 in rb_ec_exec_node (ec=ec@entry=0x100359c1b30, n=n@entry=0x10035d00620) at eval.c:280
#51 0x00007fff885d3cdc in ruby_run_node (n=0x10035d00620) at eval.c:321
#52 0x0000000116370a88 in main (argc=<optimized out>, argv=<optimized out>) at ./main.c:47

[0] https://github.com/brianmario/mysql2/issues/1261
[1] Relevant issue: https://bugs.ruby-lang.org/issues/18560

Updated by jprokop (Jarek Prokop) about 1 month ago

I was able to investigate a bit more.
This problem is Ruby 3.1 specific.

I tested the following rubies built from git: 3.0, 3.2preview1, and master branch. Neither of which went into the infinite loop.

I found that if the gc_verify_compaction_references C method enters the if branch for the argument toward_empty [0] and executes the method gc_sort_heap_by_empty_slots, then and only then is the infinite loop possible. Commenting out that method call lets the reproducer finish and exit normally. No other change has any notable effect on the issue.

[0] https://github.com/ruby/ruby/blob/v3_1_2/gc.c#L10254

Updated by byroot (Jean Boussier) about 1 month ago

  • Status changed from Open to Closed

Fixed via [Bug #18779]

Actions

Also available in: Atom PDF