Project

General

Profile

Actions

Bug #21571

closed

Ruby forked process sporadically hanging on exit

Added by dmorner (Daniel Orner) 1 day ago. Updated 1 day ago.

Status:
Rejected
Assignee:
-
Target version:
-
ruby -v:
ruby 3.4.5 (2025-07-16 revision 20cda200d3) +YJIT +PRISM [x86_64-linux]
[ruby-core:123224]

Description

This is my first bug report, so please let me know if there's anything I can do to improve it.

We have a production-grade Rails app that's been running for many years. We recently moved to EKS and upgraded it to the latest Ruby and Rails. We have a number of delayed_job processes that fork on every job that comes in so that the OS can reclaim the memory used in executing it (we implemented this a long time ago because Ruby never gives up any memory that it takes, and some jobs use way more memory than others).

In the last couple of weeks, we've noticed a rare occurrence where the delayed job hangs when exiting. The code looks like this:

    Process.fork do
      ActiveRecord::Base.establish_connection
      execute_job
    end
    Process.wait

The forked child process doesn't exit when this bug occurs, it's just stuck forever, doing nothing.

Obviously I don't have a way to reproduce this because it happens maybe once every few thousand jobs, and it happens across all job types.

If I run gdb on the child process, I always see something that looks like this (note: I am a total gdb newbie):

#0  __futex_abstimed_wait_common
    (futex_word=futex_word@entry=0x7fb6af41400c, expected=expected@entry=3, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=, cancel=cancel@entry=false) at ./nptl/futex-internal.c:103
#1  0x00007fb6d5677f68 in __GI___futex_abstimed_wait64
    (futex_word=futex_word@entry=0x7fb6af41400c, expected=expected@entry=3, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=) at ./nptl/futex-internal.c:128
#2  0x00007fb6d568138c in __pthread_rwlock_wrlock_full64 (abstime=0x0, clockid=0, rwlock=0x7fb6af414000) at ./nptl/pthread_rwlock_common.c:730
#3  ___pthread_rwlock_wrlock (rwlock=0x7fb6af414000) at ./nptl/pthread_rwlock_wrlock.c:26
#4  0x00007fb6aee22989 in CRYPTO_THREAD_write_lock () at /lib/x86_64-linux-gnu/libcrypto.so.3
#5  0x00007fb6aee15c6a in  () at /lib/x86_64-linux-gnu/libcrypto.so.3
#6  0x00007fb6aee15fa9 in OPENSSL_thread_stop () at /lib/x86_64-linux-gnu/libcrypto.so.3
#7  0x00007fb6aee153b5 in OPENSSL_cleanup () at /lib/x86_64-linux-gnu/libcrypto.so.3
#8  0x00007fb6d563055d in __run_exit_handlers
    (status=0, listp=0x7fb6d57c5820 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true)
    at ./stdlib/exit.c:116
#9  0x00007fb6d563069a in __GI_exit (status=) at ./stdlib/exit.c:146
#10 0x00007fb6d5ad3a80 in ruby_stop (ex=) at eval.c:290
#11 0x00007fb6d5bc47b4 in rb_f_fork (obj=) at process.c:4388
#12 rb_f_fork (obj=) at process.c:4378
#13 0x00007fb6d5cad5cc in vm_call_cfunc_with_frame_
    (stack_bottom=, argv=, argc=0, calling=, reg_cfp=0x7fb6d4f68280, ec=0x7fb6d4e4d550)
    at /usr/src/ruby/vm_insnhelper.c:3794
#14 vm_call_cfunc_with_frame (ec=0x7fb6d4e4d550, reg_cfp=0x7fb6d4f68280, calling=) at /usr/src/ruby/vm_insnhelper.c:3840
#15 0x00007fb6d5cb3fef in vm_sendish
    (ec=0x7fb6d4e4d550, reg_cfp=0x7fb6d4f68280, cd=0x7fb69fb17650, block_handler=, method_explorer=mexp_search_method)
    at /usr/src/ruby/vm_callinfo.h:415
#16 0x00007fb6d5cc1e59 in vm_exec_core (ec=0x7fb6af41400c, ec@entry=0x7fb6d4e4d550) at /usr/src/ruby/insns.def:851
#17 0x00007fb6d5cc7ba9 in rb_vm_exec (ec=0x7fb6d4e4d550) at vm.c:2595
#18 0x00007fb6b13e73b9 in  ()
#19 0x00007fb6d4f68328 in  ()
...etc, I can paste more if needed

I can't seem to get call rb_backtrace() working in gdb, it never prints anything.

This seems to indicate that there's some kind of thread lock when OpenSSL is shutting down. The crazy thing is that there is only one thread for most of the processes I inspect.

Any help would be greatly appreciated!

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0