Project

General

Profile

Actions

Bug #21959

closed

rb_internal_thread_event_hooks_rw_lock is not reinitialized after fork causing deadlocks

Bug #21959: rb_internal_thread_event_hooks_rw_lock is not reinitialized after fork causing deadlocks

Added by anmarchenko_datadog (Andrey Marchenko) 7 days ago. Updated 3 days ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:125078]

Description

Ruby's GVL Instrumentation API uses a read-write lock (rb_internal_thread_event_hooks_rw_lock) to protect the list of thread event hooks:

  • Read lock — acquired on every GVL transition to iterate and call hook callbacks (rb_thread_execute_hooks)
  • Write lock — acquired when adding/removing hooks (rb_internal_thread_add_event_hook, rb_internal_thread_remove_event_hook)

After fork(), Ruby reinitializes several internal locks (e.g. vm->ractor.sched.lock, timer_th.waiting_lock), but not rb_internal_thread_event_hooks_rw_lock. This wasn't added with the GVL Instrumentation API.

The full reproducer is available here: https://github.com/anmarchenko/ruby-locks-fork-bug

Deadlock sequence

  1. Parent process has thread event hooks registered (e.g. by a profiler like dd-trace-rb)
  2. Multiple threads run concurrently, causing GVL transitions — each transition acquires a read lock on the rwlock
  3. fork() happens while a thread holds the read lock
  4. In the child, only the forking thread survives — the thread that held the lock is gone, but the lock state is copied as-is
  5. Child tries to add or remove a hook → needs write lock → blocks forever on a lock that will never be released
  6. Deadlock

Impact

This affects any Ruby C extension using the GVL Instrumentation API in combination with fork-based servers (Resque, Unicorn, Passenger, etc.). The original report comes from dd-trace-rb's profiler deadlocking Resque workers on Alpine Linux (musl libc): https://github.com/DataDog/dd-trace-rb/issues/4967

Actions

Also available in: PDF Atom