Feature #20590: Ensure `fork` isn't called when `raddrinfo`'s background thread is in `getaddrinfo` - Ruby - Ruby Issue Tracking System

Actions

Copy link

Feature #20590

closed

Ensure `fork` isn't called when `raddrinfo`'s background thread is in `getaddrinfo`

Feature #20590: Ensure `fork` isn't called when `raddrinfo`'s background thread is in `getaddrinfo`

Added by byroot (Jean Boussier) over 1 year ago. Updated over 1 year ago.

Status:

Closed

Assignee:

Target version:

[ruby-core:118368]

Description

NB: Opening this as a feature because I don't have any clear bug report or repro script, but in a way this is a bug.

Context¶

For better or for worse, fork(2) remains the primary provider of parallelism in Ruby programs, and will likely continue to be for the foreseeable future.

Even though it's frowned upon in many circles, and a lot of literature will simply state that only async-signal safe APIs are safe to use after fork(),
in practice, most APIs work well as long as you are careful about not to fork while another thread is holding a pthread mutex, and the general advice is simply to not start any thread before calling fork(2).

This became much harder (if not impossible) to ensure in Ruby 3.3 following [Feature #19965]. Every call to Addrinfo.getaddrinfo now starts a native thread that will call getaddrinfo(3).
And unless I'm not reading the code correctly, that thread may even be left running if the call is interrupted. This is particularly problematic because, at least in the glibc implementation, getaddrinfo(3)
do acquire a mutex. So if a fork happens while this mutex is held, the resulting child will be corrupted, and any call to getaddrinfo(3) in the child will deadlock.
This is a fairly well-known fork-safety problem (just one example).

I don't have a reproducer to demonstrate this bug, but I heard several reports of deadlocked processes after fork issues happening to people upgrading to Ruby 3.3 that seem related or could be explained by this issue.

Proposal¶

I think we could reduce the impact of this problem by locking around fork(2) and getaddrinfo(3) with a read-write lock.

Process.fork would acquire the lock in write mode, and getaddrinfo would acquire it in read mode.

The obvious downside of course is that an interrupted addrinfo call may take a very long time to timeout and release the lock,
delaying the Process.fork call for a while, and that's far from ideal, but I don't have any better idea.

I implemented a proof of concept at: https://github.com/ruby/ruby/pull/10864

cc @mame (Yusuke Endoh) @ko1 (Koichi Sasada)

Related issues 1 (0 open — 1 closed)

Updated by byroot (Jean Boussier) over 1 year ago Actions
Copy link
#1

Description updated (diff)

Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 1 year ago Actions
Copy link
#2 [ruby-core:118369]

Big +1 to this idea - I've run into issues with forking while calling getaddrinfo(3) before. A particularly common instance of this in my experience is a web server process where, on boot:

The main thread forks a bunch of worker processes, like Unicorn,
A background thread is running a monitoring agent and sending traces/metrics to some service, and looking that up with getaddrinfo(3).

Updated by mame (Yusuke Endoh) over 1 year ago Actions
Copy link
#3 [ruby-core:118370]

Your patch does not expose any new C API, right? I think that's good. Thank you for your work.

(What a troublesome API getaddrinfo is.)

Updated by byroot (Jean Boussier) over 1 year ago Actions
Copy link
#4 [ruby-core:118371]

Your patch does not expose any new C API, right? I think that's good.

I believe it's possible to do so without exposing any new public API, as internal extensions like socket can link to private functions. It seems the way I do it in the proof of concept doesn't fully pass CI, but I'm sure it can be done.

Updated by byroot (Jean Boussier) over 1 year ago Actions
Copy link
#5

Status changed from Open to Closed

Applied in changeset git|63cbe3f6ac9feb44a2e43b1f853e2ca7e049316c.

Proof of Concept: Allow to prevent fork from happening in known fork unsafe API

[Feature #20590]

For better of for worse, fork(2) remain the primary provider of
parallelism in Ruby programs. Even though it's frowned uppon in
many circles, and a lot of literature will simply state that only
async-signal safe APIs are safe to use after fork(), in practice
most APIs work well as long as you are careful about not forking
while another thread is holding a pthread mutex.

One of the APIs that is known cause fork safety issues is getaddrinfo.
If you fork while another thread is inside getaddrinfo, a mutex
may be left locked in the child, with no way to unlock it.

I think we could reduce the impact of these problem by preventing
in for the most notorious and common cases, by locking around
fork(2) and known unsafe APIs with a read-write lock.

Updated by hsbt (Hiroshi SHIBATA) over 1 year ago Actions
Copy link
#6

Related to Bug #20734: Test failure at FreeBSD 14.1 added

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Feature #20590

Ensure `fork` isn't called when `raddrinfo`'s background thread is in `getaddrinfo`

Context¶

Proposal¶

Updated by byroot (Jean Boussier) over 1 year ago Actions
Copy link
#1

Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 1 year ago Actions
Copy link
#2 [ruby-core:118369]

Updated by mame (Yusuke Endoh) over 1 year ago Actions
Copy link
#3 [ruby-core:118370]

Updated by byroot (Jean Boussier) over 1 year ago Actions
Copy link
#4 [ruby-core:118371]

Updated by byroot (Jean Boussier) over 1 year ago Actions
Copy link
#5

Updated by hsbt (Hiroshi SHIBATA) over 1 year ago Actions
Copy link
#6

Project

General

Profile

Ruby

Custom queries

Feature #20590

Ensure `fork` isn't called when `raddrinfo`'s background thread is in `getaddrinfo`

Context¶

Proposal¶

Updated by byroot (Jean Boussier) over 1 year ago ActionsCopy link #1

Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 1 year ago ActionsCopy link #2 [ruby-core:118369]

Updated by mame (Yusuke Endoh) over 1 year ago ActionsCopy link #3 [ruby-core:118370]

Updated by byroot (Jean Boussier) over 1 year ago ActionsCopy link #4 [ruby-core:118371]

Updated by byroot (Jean Boussier) over 1 year ago ActionsCopy link #5

Updated by hsbt (Hiroshi SHIBATA) over 1 year ago ActionsCopy link #6

Updated by byroot (Jean Boussier) over 1 year ago Actions
Copy link
#1

Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 1 year ago Actions
Copy link
#2 [ruby-core:118369]

Updated by mame (Yusuke Endoh) over 1 year ago Actions
Copy link
#3 [ruby-core:118370]

Updated by byroot (Jean Boussier) over 1 year ago Actions
Copy link
#4 [ruby-core:118371]

Updated by byroot (Jean Boussier) over 1 year ago Actions
Copy link
#5

Updated by hsbt (Hiroshi SHIBATA) over 1 year ago Actions
Copy link
#6