Project

General

Profile

Actions

Feature #19443

closed

Cache `Process.pid`

Added by byroot (Jean Boussier) about 1 year ago. Updated about 1 year ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:112457]

Description

It's not uncommon for database client and similar network libraries to protect themselves from Process.fork by regularly checking Process.pid

Until recently most libc would cache getpid() so this was a cheap check to make.

However as of glibc version 2.25 the PID cache is removed and calls to getpid() always invoke the actual system call which significantly degrades the performance of existing applications.

The reason glibc removed the cache is that some libraries were bypassing fork(2) by issuing system calls themselves, causing stale cache issues.

That isn't a concern for Ruby as bypassing MRI's primitive for forking would render the VM unusable, so we can safely cache the PID.

An example of the issue: https://github.com/rails/rails/issues/47418

Patch: https://github.com/ruby/ruby/pull/7326

Actions #1

Updated by byroot (Jean Boussier) about 1 year ago

  • Description updated (diff)

Updated by ko1 (Koichi Sasada) about 1 year ago

However as of glibc version 2.25 the PID cache is removed and calls to getpid() always invoke the actual system call which significantly degrades the performance of existing applications.

Could you show some benchmark results with/without your patch?
As I understand getpid() system call is well tuned so I surprised that there is an impact on the app.

Updated by byroot (Jean Boussier) about 1 year ago

require 'benchmark/ips'

Benchmark.ips do |x|
  x.report("Process.pid") { Process.pid }
end

On macOS where getpid() is still cached:

ruby 3.2.0 (2022-12-25 revision a528908271) [arm64-darwin22]
Warming up --------------------------------------
         Process.pid     1.879M i/100ms
Calculating -------------------------------------
         Process.pid     18.682M (± 2.1%) i/s -     93.968M in   5.032405s

On the same machine, but using the docker ruby:3.2 image (glibc based)

ruby 3.2.0 (2022-12-25 revision a528908271) [aarch64-linux]
Warming up --------------------------------------
         Process.pid   356.920k i/100ms
Calculating -------------------------------------
         Process.pid      3.539M (± 1.3%) i/s -     17.846M in   5.042975s

My branch on macOS:

ruby 3.3.0dev (2023-02-16T18:42:31Z cache-process-pid 0cd4797132) [arm64-darwin22]
Warming up --------------------------------------
         Process.pid     1.804M i/100ms
Calculating -------------------------------------
         Process.pid     18.465M (± 1.3%) i/s -     93.812M in   5.081288s

I'll try to build that branch in a docker container to benchmark it on glibc, but given the implementation I expect the same performance.

Updated by byroot (Jean Boussier) about 1 year ago

Here, from my branch built in a Ubuntu jammy (22.04) based image:

ruby 3.3.0dev (2023-02-16T18:42:31Z cache-process-pid 0cd4797132) [aarch64-linux]
Warming up --------------------------------------
         Process.pid     1.848M i/100ms
Calculating -------------------------------------
         Process.pid     18.561M (± 1.5%) i/s -     94.245M in   5.078802s

So it's a bit over 5x faster.

Updated by akr (Akira Tanaka) about 1 year ago

I think detecting fork using PID is not a good idea.
PID can conflict because PID is recycled.

We can define Process.fork_level as follows.

% ruby -e '
class << Process
  attr_accessor :fork_level
end
Process.fork_level = 0
module ForkLevel
  def _fork
    pid = super
    Process.fork_level += 1 if pid == 0
    pid
  end
end
class << Process; self end.prepend ForkLevel
puts "parent_fork_level: #{Process.fork_level}"
Process.wait(fork { puts "child_fork_level: #{Process.fork_level}" })
'
parent_fork_level: 0
child_fork_level: 1

fork can be detected by comparing the result of Process.fork_level.

This doesn't use PID (and getpid system call).
So, it has no overhead by getpid and no problem with PID recycling.

Updated by byroot (Jean Boussier) about 1 year ago

PID can conflict because PID is recycled.

I don't think it's a big concern for this use case, even with PID recycling, the PID of the child can't possibly be the same than the parent.
So unless you fork several time without ever triggering the check, you can't possibly be by this.

We can define Process.fork_level as follows.

Yes, on Ruby 3.1+ we can decorate Process._fork for that purpose. I already submitted PRs to major libraries to do that when possible, however:

  • It doesn't work for Process.daemonize (not a big deal, but still)
  • There is a long tail of existing code doing this, and fixing it all may take a very long time.

Also, regardless of what Process.pid is used for, if we can make it 5x faster with extremely little code, and as far as I can tell no downsides, why shouldn't we?

Updated by byroot (Jean Boussier) about 1 year ago

I deployed a ruby shim of this cache to half of our servers: https://github.com/Shopify/pid_cache

Average latency: -2ms
Median latency: -2ms
p75 latency: -2ms
p99 latency: -10ms
p99.9: -30ms

Updated by ko1 (Koichi Sasada) about 1 year ago

Thank you.
How to read comment #7 results?

Updated by byroot (Jean Boussier) about 1 year ago

How to read comment #7 results?

It's a flat reduction on our latency (server response time) metrics.

On average, with the pid_cache shim, our server response time is 2 milliseconds faster.

Also to note, we're still seeing quite a lot of getpid() syscalls coming from dependencies using $$, and from some C extensions. So hopefully https://github.com/ruby/ruby/pull/7326 would be even more effective.

Actions #10

Updated by Anonymous about 1 year ago

"ko1 (Koichi Sasada) via ruby-core" wrote:

As I understand getpid() system call is well tuned so I surprised that there is an impact on the app.

It's not whether or not a system call is expensive or not,
it's the fact a system call needs to be made at all.

With modern CPU vulnerability mitigations, all system calls got
more expensive. Perhaps Linux vDSO mechanism can be extended
to support getpid as it does gettimeofday/clock_gettime

Anyways, caching getpid() is much appreciated.


Updated by byroot (Jean Boussier) about 1 year ago

@dalehamel (Dale Hamel) noticed via tracing that we're also calling getpid() quite a lot in the thread scheduler.

I think in that case we can simply use GET_VM()->fork_gen, so I prepared a second patch for that https://github.com/ruby/ruby/pull/7434

Updated by ko1 (Koichi Sasada) about 1 year ago

byroot (Jean Boussier) wrote in #note-9:

How to read comment #7 results?

It's a flat reduction on our latency (server response time) metrics.

On average, with the pid_cache shim, our server response time is 2 milliseconds faster.

Thank you.
BTW it is easy to understand how 2ms has impact or not by showing the measured values of before/after.
Anyway, I agree that is valuable improvements.

Anonymous wrote in #note-10:

With modern CPU vulnerability mitigations, all system calls got
more expensive. Perhaps Linux vDSO mechanism can be extended
to support getpid as it does gettimeofday/clock_gettime

I see, especially on virtualization techniques.

Updated by byroot (Jean Boussier) about 1 year ago

it is easy to understand how 2ms has impact or not by showing the measured values of before/after.

Yes, unfortunately, being a public company, we have all these rules about material informations and such, so I wasn't sure what I could share exactly... But I realize it makes it harder to understand, sorry :/

Updated by byroot (Jean Boussier) about 1 year ago

Relaying here what Javier Honduvilla Coto said on one of the PRs:

wondering if it would be possible/make sense to override libc's getpid with a custom implementation that does the caching in there. That way not only Process.getpid would use the faster method but also any other part of the runtime such as what @dalehamel (Dale Hamel) mentioned above or any other getpid calls from native libraries?

I don't know if it's a good idea or not, as far as I know it would be a first for ruby to override a symbol defined by libc, it doesn't do so with malloc and free for instance.

Updated by matz (Yukihiro Matsumoto) about 1 year ago

Caching process id sounds OK for me. Go ahead.

Matz.

Actions #16

Updated by byroot (Jean Boussier) about 1 year ago

  • Status changed from Open to Closed

Applied in changeset git|1db8951d3a8be6a756c9d3d3b87231997b301985.


Cache Process.pid

[Feature #19443]

It's not uncommon for database client and similar network libraries
to protect themselves from Process.fork by regularly checking Process.pid

Until recently most libc would cache getpid() so this was a cheap
check to make.

However as of glibc version 2.25 the PID cache is removed and calls to
getpid() always invoke the actual system call which significantly degrades
the performance of existing applications.

The reason glibc removed the cache is that some libraries were bypassing
fork(2) by issuing system calls themselves, causing stale cache issues.

That isn't a concern for Ruby as bypassing MRI's primitive for forking
would render the VM unusable, so we can safely cache the PID.

Updated by byroot (Jean Boussier) about 1 year ago

Thank you Matz! I merged the caching of Process.pid and $$.

The thread scheduler still call getpid() a lot, and I'll try to eliminate that in a follow-up (even though according to @ko1 (Koichi Sasada) most of that code will be replaced before 3.3).

Actions

Also available in: Atom PDF

Like1
Like0Like0Like0Like0Like0Like1Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0