Project

General

Profile

Actions

Bug #21790

closed

`Socket.getaddrinfo` hangs after `fork()` on macOS 26.1 (Tahoe) for IPv4-only hosts

Bug #21790: `Socket.getaddrinfo` hangs after `fork()` on macOS 26.1 (Tahoe) for IPv4-only hosts

Added by adamoffat (Adam Moffat) 7 months ago. Updated 18 days ago.

Status:
Third Party's Issue
Assignee:
-
Target version:
-
[ruby-core:124288]

Description

Ruby's Socket.getaddrinfo hangs indefinitely in forked child processes on macOS 26.1 (Tahoe) when resolving IPv4-only hostnames. This is a regression that does not occur on macOS 15.x (Sonoma) or earlier.

Ruby version:
ruby 3.3.8 (2025-04-09 revision b200bad6cd) [arm64-darwin24]
Also confirmed this affects Ruby 3.2.6 and 3.4.1.

Reproducible script:

require "socket"
require "timeout"

puts "Ruby #{RUBY_VERSION} on #{RUBY_PLATFORM}"
Socket.getaddrinfo("api.segment.io", 443, nil, :STREAM)
puts "Parent: DNS completed"

pid = fork do
  puts "Child: Attempting DNS resolution..."
  begin
    Timeout.timeout(90) do
      Socket.getaddrinfo("api.segment.io", 443, nil, :STREAM)
    end
    puts "Child: SUCCESS"
    exit 0
  rescue Timeout::Error
      puts "Child: FAILED - hung for 90 seconds"
      exit 1
  end
end

Process.wait(pid)

Note: Remove the Timeout.timeout(90) wrapper to observe the hang indefinitely. The timeout is included only to allow the script to exit for testing purposes.

Result of reproduce process:

Ruby 3.3.8 on arm64-darwin24
Parent: DNS completed
Child: Attempting DNS resolution...
Child: FAILED - hung for 90 seconds

The child process hangs with one thread consuming 100% CPU.

Expected result: The child process should complete DNS resolution successfully, as it does on macOS 15.x and earlier.

Analysis:
Stack trace shows:
Main thread: Blocked in wait_getaddrinfo_pthread_cond_wait
DNS thread: Spinning in _gai_nat64_second_passnw_path_access_agent_cache_os_log_preferences_refreshSIGSEGV

The crash occurs in macOS's NAT64 synthesis code path. Ruby's signal handler catches the SIGSEGV but cannot recover, causing the DNS thread to spin.

Key observations:

  • Only affects IPv4-only hosts. Hosts with IPv6 (like google.com) work correctly.
  • Using AF_INET instead of AF_UNSPEC works. Socket.getaddrinfo("api.segment.io", 443, Socket::AF_INET, :STREAM) succeeds.
  • Python is not affected. Python calls getaddrinfo() synchronously without a background thread.
  • Parent must do DNS before fork. If the parent has not called getaddrinfo(), the child works correctly.

Workaround:

  • Use resolv-replace to bypass the native DNS resolver: require "resolv-replace"

Impact:
This breaks all Ruby applications using pre-forking worker models (Resque, Unicorn, Puma, Sidekiq, Passenger) on macOS Tahoe.

Apple Bug Report:
Filed with Apple as Feedback Assistant #FB21364061


Files

stack_trace.txt (66.6 KB) stack_trace.txt Stack Trace for Bug adamoffat (Adam Moffat), 12/17/2025 05:56 PM
ruby_dns_fork_bug.rb (1.02 KB) ruby_dns_fork_bug.rb Reproduction Script adamoffat (Adam Moffat), 12/17/2025 06:02 PM
ruby_3.2.6_crash_output.txt (1.79 KB) ruby_3.2.6_crash_output.txt Ruby 3.2.6 stacktrace adamoffat (Adam Moffat), 12/18/2025 03:44 PM
python_dns_fork_test.py (1.8 KB) python_dns_fork_test.py Python reproduction script adamoffat (Adam Moffat), 12/18/2025 06:26 PM
python_dns_fork_test.py (1.97 KB) python_dns_fork_test.py Python Reproduction Script adamoffat (Adam Moffat), 12/18/2025 06:39 PM
python_crash_output.txt (1.28 KB) python_crash_output.txt Python Crash output adamoffat (Adam Moffat), 12/18/2025 06:39 PM

Related issues 3 (0 open3 closed)

Related to Ruby - Bug #15490: socket.rb - recurring segmentation faultsThird Party's IssueActions
Related to Ruby - Bug #15794: Can not start Puma with Rails after bundle installThird Party's IssueActions
Has duplicate Ruby - Bug #21969: fork() + Socket.getaddrinfo() triggers SIGSEGV/SIGABRT via libsystem_trace.dylib on macOS 26 (darwin25) x86_64 and ARM64Third Party's IssueActions

Updated by adamoffat (Adam Moffat) 7 months ago Actions #1 [ruby-core:124296]

To confirm: MacOS Sequoia also did not have this issue.

Updated by adamoffat (Adam Moffat) 7 months ago Actions #2 [ruby-core:124297]

I saw that this was added in 3.4.0: https://github.com/ruby/ruby/pull/10864

Seen here: (https://github.com/ruby/ruby/releases/tag/v3_4_0_preview2)

But I also tested this using 3.4.1 and it was still an issue.

Updated by mame (Yusuke Endoh) 7 months ago Actions #3 [ruby-core:124299]

Thank you for the report.

Since I don't have access to Tahoe, I cannot test this in my own environment. However, I have a few questions to clarify the situation.

The change to perform DNS lookups in a dedicated background thread was introduced in Ruby 3.3.0. You mentioned that this affects Ruby 3.2.6 as well. Are you certain it reproduces on 3.2.6?

If it fails on 3.2.6, the cause might be unrelated to the background thread, as its behavior should be similar to Python's. Would it be possible to provide a stack trace from the 3.2.6 crash?

Though it's just a guess, this might be a bug with getaddrinfo on Tahoe itself, but I could be wrong.

Updated by adamoffat (Adam Moffat) 7 months ago · Edited Actions #4 [ruby-core:124303]

mame (Yusuke Endoh) wrote in #note-3:

Thank you for the report.

Since I don't have access to Tahoe, I cannot test this in my own environment. However, I have a few questions to clarify the situation.

The change to perform DNS lookups in a dedicated background thread was introduced in Ruby 3.3.0. You mentioned that this affects Ruby 3.2.6 as well. Are you certain it reproduces on 3.2.6?

If it fails on 3.2.6, the cause might be unrelated to the background thread, as its behavior should be similar to Python's. Would it be possible to provide a stack trace from the 3.2.6 crash?

Though it's just a guess, this might be a bug with getaddrinfo on Tahoe itself, but I could be wrong.

Ah yes, sorry I should have clarified this in my post. I tested this in 3.2.6 but it manifests differently in that version.

When I ran the same reproduction script with Ruby 3.2.6, rather than hanging indefinitely, it crashed immediately with a segmentation fault when the child process attempts DNS resolution.

The crash occurs at the getaddrinfo call in the forked child. The backtrace shows the fault originating in macOS system libraries, specifically in libsystem_trace.dylib at _os_log_preferences_refresh.

This confirms Ruby 3.2.6 is also affected by the same underlying issue - it just manifests as an immediate crash rather than a hang.

I've attached the full crash output for reference.

Updated by mame (Yusuke Endoh) 7 months ago Actions #5 [ruby-core:124306]

Thank you. This looks like the same issue reported multiple times in the past, but we were previously stuck without a way to investigate.

https://bugs.ruby-lang.org/issues/15490
https://bugs.ruby-lang.org/issues/15794
https://github.com/redis/redis-rb/issues/859
https://github.com/hanami/hanami/issues/993

It is greatly appreciated that the reproduction conditions are now much clearer.

This issue does not affect Python even in a forked child process, right? If Python avoids this error, checking how it calls getaddrinfo might give us a hint for a fix or workaround.

It is difficult for me to debug this without a reproducing environment. Are there any committers or contributors who can reproduce the issue and investigate?

Updated by mame (Yusuke Endoh) 7 months ago Actions #6

  • Related to Bug #15490: socket.rb - recurring segmentation faults added
  • Related to Bug #15794: Can not start Puma with Rails after bundle install added

Updated by adamoffat (Adam Moffat) 7 months ago Actions #8 [ruby-core:124309]

Ah my earlier Python script had a bug.

My initial Python test incorrectly reported success. The script used os.WEXITSTATUS() to check the child's exit status, but this function only works for processes that exit normally. When a process is killed by a signal (SIGSEGV), it returns 0, giving a false positive.

After fixing the script to check os.WIFSIGNALED(), I was able to confirm the child is killed by signal 11 (SIGSEGV). The crash logs show the identical stack trace to Ruby: _gai_nat64_second_passnw_path_access_agent_cache → _os_log_preferences_refresh.

This is an OS-level bug in macOS Tahoe, not language-specific. My apologies.

Updated by mame (Yusuke Endoh) 6 months ago Actions #9 [ruby-core:124514]

  • Status changed from Open to Third Party's Issue

Thank you for your confirmation. This is most likely a macOS bug, so I'd close this as a third-party issue.
It would be the best for macOS to fix the issue, but if someone finds a workaround, I'd consider importing it in the Ruby side.

Updated by mame (Yusuke Endoh) 3 months ago Actions #10

  • Has duplicate Bug #21969: fork() + Socket.getaddrinfo() triggers SIGSEGV/SIGABRT via libsystem_trace.dylib on macOS 26 (darwin25) x86_64 and ARM64 added

Updated by adamoffat (Adam Moffat) 18 days ago Actions #11 [ruby-core:125789]

Update: Apple has formally declined to fix this (still present in macOS 27 beta).

I'm the reporter of the underlying Apple Feedback (FB21364061). Posting a status update since this issue is the canonical landing spot for anyone hitting it.

Apple's position (via DTS): I raised this on the Apple Developer Forums and got a definitive answer from Apple DTS (https://developer.apple.com/forums/thread/834537). Summary: calling getaddrinfo in a child after fork without exec is officially unsupported, so they consider this a compatibility issue rather than a bug they'll fix, and they confirmed it is still not fixed in the macOS 27 beta. So this is permanent on macOS 26+; it will not be resolved at the OS level.

Confirmed root cause (and it is not Ruby-specific): The trigger is os_log state that is initialized in the parent and inherited invalid across fork. The parent initializes it either explicitly, or implicitly via its own first getaddrinfo of an IPv4-only host. In the child, an AF_UNSPEC lookup of an IPv4-only host enters the NAT64 synthesis path and dereferences that stale state:

_os_log_preferences_refresh        (libsystem_trace.dylib)   <- faults, EXC_BAD_ACCESS
os_log_type_enabled                (libsystem_trace.dylib)
nw_path_access_agent_cache         (Network)
nw_path_evaluator_evaluate / nw_path_snapshot_path
nw_nat64_v4_address_requires_synthesis
_gai_nat64_second_pass             (libsystem_info.dylib)
si_addrinfo -> getaddrinfo

Minimal reproduction in plain C (no Ruby), which crashes with the identical stack, proving the bug is in the OS and affects any runtime that initializes os_log before forking:

#include <netdb.h>
#include <os/log.h>
#include <unistd.h>
int main(void) {
    os_log_t log = os_log_create("com.example.repro", "repro");
    os_log(log, "init");
    struct addrinfo hints = { .ai_family = AF_UNSPEC, .ai_socktype = SOCK_STREAM }, *res;
    getaddrinfo("api.stripe.com", "443", &hints, &res);   // parent, IPv4-only host
    if (fork() == 0) {
        getaddrinfo("api.stripe.com", "443", &hints, &res); // child: crashes
    }
}

Python is affected identically (crashes with signal 11). Verified across macOS 26.1 through 26.5.1 and 27 beta; not reproducible on macOS 15.x.

Boundaries: only AF_UNSPEC lookups of IPv4-only hostnames are affected. IPv6-capable hosts, an AF_INET hint, numeric literals, and localhost are all immune.

Workarounds (for anyone landing here):

  • require "resolv-replace" before any DNS. Bypasses the system resolver entirely with the pure-Ruby Resolv. Caveat: Resolv only reads /etc/resolv.conf, so it ignores macOS scoped resolvers, which can break VPN split-DNS / internal hostnames.
  • Set OS_ACTIVITY_MODE=disable in the worker's environment. This suppresses the crash (the os_log_type_enabled check short-circuits before the faulting preferences refresh) while keeping the native resolver, so scoped/VPN DNS still works. Lighter than monkeypatching sockets. Tradeoff: disables unified-logging output for that process tree, which is negligible for typical worker processes.

One Ruby-side question worth considering, independent of the OS bug: in Ruby 3.x the faulting DNS helper thread does not crash the process cleanly; it spins at 100% CPU and the main thread blocks in wait_getaddrinfo forever, so the process becomes an unkillable hang. (Ruby 2.6 aborts instead.) Even though the underlying getaddrinfo-after-fork is unsupported, would it be worth making this fail fast (raise or exit) rather than spin indefinitely? The longer-term clean fix is presumably a fork-safe userspace resolver (#19430).

Actions

Also available in: PDF Atom