Actions
Bug #21959
closedrb_internal_thread_event_hooks_rw_lock is not reinitialized after fork causing deadlocks
Bug #21959:
rb_internal_thread_event_hooks_rw_lock is not reinitialized after fork causing deadlocks
Description
Ruby's GVL Instrumentation API uses a read-write lock (rb_internal_thread_event_hooks_rw_lock) to protect the list of thread event hooks:
- Read lock — acquired on every GVL transition to iterate and call hook callbacks (rb_thread_execute_hooks)
- Write lock — acquired when adding/removing hooks (rb_internal_thread_add_event_hook, rb_internal_thread_remove_event_hook)
After fork(), Ruby reinitializes several internal locks (e.g. vm->ractor.sched.lock, timer_th.waiting_lock), but not rb_internal_thread_event_hooks_rw_lock. This wasn't added with the GVL Instrumentation API.
The full reproducer is available here: https://github.com/anmarchenko/ruby-locks-fork-bug
Deadlock sequence¶
- Parent process has thread event hooks registered (e.g. by a profiler like dd-trace-rb)
- Multiple threads run concurrently, causing GVL transitions — each transition acquires a read lock on the rwlock
- fork() happens while a thread holds the read lock
- In the child, only the forking thread survives — the thread that held the lock is gone, but the lock state is copied as-is
- Child tries to add or remove a hook → needs write lock → blocks forever on a lock that will never be released
- Deadlock
Impact¶
This affects any Ruby C extension using the GVL Instrumentation API in combination with fork-based servers (Resque, Unicorn, Passenger, etc.). The original report comes from dd-trace-rb's profiler deadlocking Resque workers on Alpine Linux (musl libc): https://github.com/DataDog/dd-trace-rb/issues/4967
Actions