Project

General

Profile

Actions

Bug #18511

closed

ruby in infinite loop consuming 100% CPU

Added by shenningz (Henning Sackewitz) 4 months ago. Updated about 2 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.7.5p203 (2021-11-24 revision f69aeb8314) [x86_64-linux]
[ruby-core:107264]

Description

After running Fluentd (Td-agent) workload for several hours or days, Ruby hangs in an infinite loop.

gdb stack trace:

#0  lookup_method_table (id=3377, klass=<optimized out>) at vm_method.c:188
#1  search_method0 (skip_refined=<optimized out>, defined_class_ptr=<optimized out>, id=<optimized out>, klass=<optimized out>) at vm_method.c:747
#2  search_method (defined_class_ptr=<optimized out>, id=<optimized out>, klass=<optimized out>) at vm_method.c:823
#3  method_entry_get_without_cache (defined_class_ptr=<optimized out>, id=<optimized out>, klass=<optimized out>) at vm_method.c:776
#4  method_entry_get (klass=139675557693280, id=id@entry=3377, defined_class_ptr=defined_class_ptr@entry=0x7f08cbb76210) at vm_method.c:843
#5  0x00007f08d03a00f4 in rb_callable_method_entry (klass=<optimized out>, id=id@entry=3377) at vm_method.c:891
#6  0x00007f08d03b5942 in rb_search_method_entry (mid=3377, recv=<optimized out>) at vm_eval.c:654
#7  rb_call0 (self=<optimized out>, call_scope=CALL_FCALL, argv=0x7f08cbb76280, argc=1, mid=3377, recv=<optimized out>, ec=0x55cc930e85b0) at vm_eval.c:384
#8  rb_call (scope=CALL_FCALL, argv=0x7f08cbb76280, argc=1, mid=3377, recv=139675557693296) at vm_eval.c:718
#9  rb_funcallv (recv=139675557693296, mid=3377, argc=argc@entry=1, argv=argv@entry=0x7f08cbb76280) at vm_eval.c:958
#10 0x00007f08cc74ff3b in yajl_encode_part (wrapper=wrapper@entry=0x7f08c02ced20, obj=94337115946160, io=io@entry=8) at ../../../../ext/yajl/yajl_ext.c:181
#11 0x00007f08cc750258 in yajl_encode_part (wrapper=wrapper@entry=0x7f08c02ced20, obj=obj@entry=139675467402240, io=io@entry=8) at ../../../../ext/yajl/yajl_ext.c:197
#12 0x00007f08cc750338 in rb_yajl_encoder_encode (argc=argc@entry=1, argv=argv@entry=0x7f08cbb76398, self=<optimized out>) at ../../../../ext/yajl/yajl_ext.c:1115
#13 0x00007f08cc750f8a in rb_yajl_json_ext_hash_to_json (argc=0, argv=0x7f08cbb76fe8, self=<optimized out>) at ../../../../ext/yajl/yajl_ext.c:1179
#14 0x00007f08d039cef8 in vm_call_cfunc_with_frame (empty_kw_splat=<optimized out>, cd=0x55cc92d5f950, calling=<optimized out>, reg_cfp=0x7f08cbc76a40,
    ec=0x55cc930e85b0) at vm_insnhelper.c:2514
#15 vm_call_cfunc (ec=0x55cc930e85b0, reg_cfp=0x7f08cbc76a40, calling=<optimized out>, cd=0x55cc92d5f950) at vm_insnhelper.c:2539
#16 0x00007f08d0398076 in vm_sendish (ec=ec@entry=0x55cc930e85b0, reg_cfp=reg_cfp@entry=0x7f08cbc76a40, cd=0x55cc92d5f950, block_handler=block_handler@entry=0,
    method_explorer=method_explorer@entry=0x7f08d03a0350 <vm_search_method_wrap>) at vm_insnhelper.c:4023
#17 0x00007f08d03a8c0a in vm_exec_core (ec=<optimized out>, initial=<optimized out>) at insns.def:801
#18 0x00007f08d03af000 in rb_vm_exec (ec=0x55cc930e85b0, mjit_enable_p=1) at vm.c:1929
#19 0x00007f08d03bbfa1 in invoke_block (captured=<optimized out>, opt_pc=<optimized out>, type=<optimized out>, cref=<optimized out>, self=<optimized out>,
    iseq=<optimized out>, ec=<optimized out>) at vm.c:1044
#20 invoke_iseq_block_from_c (me=<optimized out>, is_lambda=<optimized out>, cref=<optimized out>, passed_block_handler=<optimized out>, kw_splat=<optimized out>,
    argv=<optimized out>, argc=<optimized out>, self=<optimized out>, captured=<optimized out>, ec=<optimized out>) at vm.c:1116
#21 invoke_block_from_c_bh (force_blockarg=<optimized out>, is_lambda=<optimized out>, cref=<optimized out>, passed_block_handler=<optimized out>,
    kw_splat=<optimized out>, argv=<optimized out>, argc=<optimized out>, block_handler=<optimized out>, ec=<optimized out>) at vm.c:1134
#22 vm_yield (kw_splat=<optimized out>, argv=<optimized out>, argc=<optimized out>, ec=<optimized out>) at vm.c:1179
#23 rb_yield_0 (argv=<optimized out>, argc=<optimized out>) at vm_eval.c:1227
#24 rb_yield_1 (val=<optimized out>) at vm_eval.c:1233
#25 rb_yield (val=<optimized out>) at vm_eval.c:1243
#26 0x00007f08cc6a398c in ?? () from /net/usr.work/d057770_ls3512/fb/ruby_issue/opt/td-agent/lib/ruby/gems/2.7.0/gems/msgpack-1.4.2/lib/msgpack/msgpack.so
#27 0x00007f08d0242154 in rb_vrescue2 (b_proc=0x7f08cc6a3960, data1=139675441636960, r_proc=0x7f08cc6a3490, data2=139675441636960, args=args@entry=0x7f08cbb76840)
    at eval.c:990
#28 0x00007f08d024236e in rb_rescue2 (b_proc=<optimized out>, data1=<optimized out>, r_proc=<optimized out>, data2=<optimized out>) at eval.c:967
#29 0x00007f08d039cef8 in vm_call_cfunc_with_frame (empty_kw_splat=<optimized out>, cd=0x55cc92499be0, calling=<optimized out>, reg_cfp=0x7f08cbc76b20,
    ec=0x55cc930e85b0) at vm_insnhelper.c:2514
#30 vm_call_cfunc (ec=0x55cc930e85b0, reg_cfp=0x7f08cbc76b20, calling=<optimized out>, cd=0x55cc92499be0) at vm_insnhelper.c:2539
#31 0x00007f08d0398076 in vm_sendish (ec=ec@entry=0x55cc930e85b0, reg_cfp=reg_cfp@entry=0x7f08cbc76b20, cd=cd@entry=0x55cc92499be0, block_handler=<optimized out>,
    method_explorer=method_explorer@entry=0x7f08d03a0350 <vm_search_method_wrap>) at vm_insnhelper.c:4023
#32 0x00007f08d03a8c81 in vm_exec_core (ec=<optimized out>, initial=<optimized out>) at insns.def:782
#33 0x00007f08d03af000 in rb_vm_exec (ec=0x55cc930e85b0, mjit_enable_p=1) at vm.c:1929
#34 0x00007f08d03b2986 in invoke_block (captured=0x55cc92db1260, opt_pc=<optimized out>, type=<optimized out>, cref=0x0, self=94337116697720, iseq=0x55cc925508c0,
    ec=0x55cc930e85b0) at vm.c:1044
--Type <RET> for more, q to quit, c to continue without paging--
#35 invoke_iseq_block_from_c (me=0x0, is_lambda=0, cref=0x0, passed_block_handler=0, kw_splat=<optimized out>, argv=<optimized out>, argc=<optimized out>,
    self=94337116697720, captured=0x55cc92db1260, ec=<optimized out>) at vm.c:1116
#36 invoke_block_from_c_proc (me=0x0, is_lambda=0, passed_block_handler=0, kw_splat=<optimized out>, argv=<optimized out>, argc=<optimized out>, self=94337116697720,
    proc=0x55cc92547478, ec=<optimized out>) at vm.c:1216
#37 vm_invoke_proc (passed_block_handler=0, kw_splat=<optimized out>, argv=<optimized out>, argc=<optimized out>, self=94337116697720, proc=0x55cc92547478,
    ec=<optimized out>) at vm.c:1238
#38 rb_vm_invoke_proc (ec=<optimized out>, proc=proc@entry=0x55cc92db1260, argc=<optimized out>, argv=<optimized out>, kw_splat=<optimized out>,
    passed_block_handler=passed_block_handler@entry=0) at vm.c:1259
#39 0x00007f08d0374613 in thread_do_start (th=0x55cc930e83e0) at thread.c:692
#40 0x00007f08d0376840 in thread_start_func_2 (th=<optimized out>, stack_start=<optimized out>) at thread.c:740
#41 0x00007f08d0376c7d in thread_start_func_1 (th_ptr=<optimized out>) at thread_pthread.c:969
#42 0x00007f08cff5d609 in ?? ()
#43 0x0000000000000000 in ?? ()

It looks like Ruby is looping here: https://github.com/ruby/ruby/blob/v2_7_5/vm_method.c#L745

A core file (and matching Ruby binaries with debug symbols) can be provided (size 55MB)

Updated by jeremyevans0 (Jeremy Evans) 4 months ago

  • Status changed from Open to Feedback

This happens inside the yajl-ruby C extension. Can you reproduce this without yajl-ruby?

Updated by shenningz (Henning Sackewitz) 4 months ago

Right now we cannot reproduce this systematically, it just occurs with our Fluentd workload sooner or later.
Fluentd Td-agent is packaged this way, we cannot just remove parts from it.

Is there any way to replace the yajl-ruby C extension with another json extension ? We already had issues with yajl-ruby before:
https://github.com/fluent/fluentd/issues/3547
https://github.com/brianmario/yajl-ruby/issues/205

Updated by alanwu (Alan Wu) 4 months ago

@shenningz (Henning Sackewitz)

Is there any way to replace the yajl-ruby C extension with another json extension?

Depending on your needs, maybe the pure Ruby json_pure gem is good enough. If you have performance requirements, the oj gem seems fairly popular and I've heard success stories. It's supposedly faster than yajl-ruby at least for some workloads. Oj uses a C extension, though.

Updated by shenningz (Henning Sackewitz) 4 months ago

Depending on your needs, maybe the pure Ruby json_pure gem is good enough. If you have performance requirements, the oj gem seems fairly popular and I've heard success stories. It's supposedly faster than yajl-ruby at least for some workloads. Oj uses a C extension, though.

Thank you for the pointers. However, since we are just users and not maintainers of the Fluentd packages, we are not in the position of exchanging components.

To get back to the original issue: could someone look into the core dump ? Is there any possibility to upload a 55MB file ?

Updated by alanwu (Alan Wu) 4 months ago

could someone look into the core dump ? Is there any possibility to upload a 55MB file ?

I can't guarantee that anyone is willing to spend time on investigating the core dump, but if you are willing to make publicly available with no strings attached the dump and the relevant binaries (the ruby executable and share libraries), it'll enable such investigations. You can push the dumps to GitHub and post a link here, for example.

If you are looking to make private arrangement with someone so they would keep the information they see in the dump secret, I think that is going to be challenging. Speaking for myself, I wouldn't be willing to engage in such arrangement, especially when unpaid.

Actions #6

Updated by jeremyevans0 (Jeremy Evans) about 2 months ago

  • Status changed from Feedback to Closed
Actions

Also available in: Atom PDF