Feature #19443: Cache `Process.pid` - Ruby - Ruby Issue Tracking System

Actions

Copy link

Feature #19443

closed

Cache `Process.pid`

Added by byroot (Jean Boussier) over 2 years ago. Updated over 2 years ago.

Status:

Closed

Assignee:

Target version:

[ruby-core:112457]

Description

It's not uncommon for database client and similar network libraries to protect themselves from Process.fork by regularly checking Process.pid

Until recently most libc would cache getpid() so this was a cheap check to make.

However as of glibc version 2.25 the PID cache is removed and calls to getpid() always invoke the actual system call which significantly degrades the performance of existing applications.

The reason glibc removed the cache is that some libraries were bypassing fork(2) by issuing system calls themselves, causing stale cache issues.

That isn't a concern for Ruby as bypassing MRI's primitive for forking would render the VM unusable, so we can safely cache the PID.

An example of the issue: https://github.com/rails/rails/issues/47418

Patch: https://github.com/ruby/ruby/pull/7326

Actions

Copy link

Updated by byroot (Jean Boussier) over 2 years ago

Description updated (diff)

Actions

Copy link

#2 [ruby-core:112460]

Updated by ko1 (Koichi Sasada) over 2 years ago

However as of glibc version 2.25 the PID cache is removed and calls to getpid() always invoke the actual system call which significantly degrades the performance of existing applications.

Could you show some benchmark results with/without your patch?
As I understand getpid() system call is well tuned so I surprised that there is an impact on the app.

Actions

Copy link

#3 [ruby-core:112463]

Updated by byroot (Jean Boussier) over 2 years ago

require 'benchmark/ips'

Benchmark.ips do |x|
  x.report("Process.pid") { Process.pid }
end

On macOS where getpid() is still cached:

ruby 3.2.0 (2022-12-25 revision a528908271) [arm64-darwin22]
Warming up --------------------------------------
         Process.pid     1.879M i/100ms
Calculating -------------------------------------
         Process.pid     18.682M (± 2.1%) i/s -     93.968M in   5.032405s

On the same machine, but using the docker ruby:3.2 image (glibc based)

ruby 3.2.0 (2022-12-25 revision a528908271) [aarch64-linux]
Warming up --------------------------------------
         Process.pid   356.920k i/100ms
Calculating -------------------------------------
         Process.pid      3.539M (± 1.3%) i/s -     17.846M in   5.042975s

My branch on macOS:

ruby 3.3.0dev (2023-02-16T18:42:31Z cache-process-pid 0cd4797132) [arm64-darwin22]
Warming up --------------------------------------
         Process.pid     1.804M i/100ms
Calculating -------------------------------------
         Process.pid     18.465M (± 1.3%) i/s -     93.812M in   5.081288s

I'll try to build that branch in a docker container to benchmark it on glibc, but given the implementation I expect the same performance.

Actions

Copy link

#4 [ruby-core:112464]

Updated by byroot (Jean Boussier) over 2 years ago

Here, from my branch built in a Ubuntu jammy (22.04) based image:

ruby 3.3.0dev (2023-02-16T18:42:31Z cache-process-pid 0cd4797132) [aarch64-linux]
Warming up --------------------------------------
         Process.pid     1.848M i/100ms
Calculating -------------------------------------
         Process.pid     18.561M (± 1.5%) i/s -     94.245M in   5.078802s

So it's a bit over 5x faster.

Actions

Copy link

#5 [ruby-core:112471]

Updated by akr (Akira Tanaka) over 2 years ago

I think detecting fork using PID is not a good idea.
PID can conflict because PID is recycled.

We can define Process.fork_level as follows.

% ruby -e '
class << Process
  attr_accessor :fork_level
end
Process.fork_level = 0
module ForkLevel
  def _fork
    pid = super
    Process.fork_level += 1 if pid == 0
    pid
  end
end
class << Process; self end.prepend ForkLevel
puts "parent_fork_level: #{Process.fork_level}"
Process.wait(fork { puts "child_fork_level: #{Process.fork_level}" })
'
parent_fork_level: 0
child_fork_level: 1

fork can be detected by comparing the result of Process.fork_level.

This doesn't use PID (and getpid system call).
So, it has no overhead by getpid and no problem with PID recycling.

Actions

Copy link

#6 [ruby-core:112473]

Updated by byroot (Jean Boussier) over 2 years ago

PID can conflict because PID is recycled.

I don't think it's a big concern for this use case, even with PID recycling, the PID of the child can't possibly be the same than the parent.
So unless you fork several time without ever triggering the check, you can't possibly be by this.

We can define Process.fork_level as follows.

Yes, on Ruby 3.1+ we can decorate Process._fork for that purpose. I already submitted PRs to major libraries to do that when possible, however:

It doesn't work for Process.daemonize (not a big deal, but still)
There is a long tail of existing code doing this, and fixing it all may take a very long time.

Also, regardless of what Process.pid is used for, if we can make it 5x faster with extremely little code, and as far as I can tell no downsides, why shouldn't we?

Actions

Copy link

#7 [ruby-core:112520]

Updated by byroot (Jean Boussier) over 2 years ago

I deployed a ruby shim of this cache to half of our servers: https://github.com/Shopify/pid_cache

Average latency: -2ms
Median latency: -2ms
p75 latency: -2ms
p99 latency: -10ms
p99.9: -30ms

Actions

Copy link

#8 [ruby-core:112529]

Updated by ko1 (Koichi Sasada) over 2 years ago

Thank you.
How to read comment #7 results?

Actions

Copy link

#9 [ruby-core:112531]

Updated by byroot (Jean Boussier) over 2 years ago

How to read comment #7 results?

It's a flat reduction on our latency (server response time) metrics.

On average, with the pid_cache shim, our server response time is 2 milliseconds faster.

Also to note, we're still seeing quite a lot of getpid() syscalls coming from dependencies using $$, and from some C extensions. So hopefully https://github.com/ruby/ruby/pull/7326 would be even more effective.

Actions

Copy link

#10

Updated by Anonymous over 2 years ago

"ko1 (Koichi Sasada) via ruby-core" ruby-core@ml.ruby-lang.org wrote:

As I understand getpid() system call is well tuned so I surprised that there is an impact on the app.

It's not whether or not a system call is expensive or not,
it's the fact a system call needs to be made at all.

With modern CPU vulnerability mitigations, all system calls got
more expensive. Perhaps Linux vDSO mechanism can be extended
to support getpid as it does gettimeofday/clock_gettime

Anyways, caching getpid() is much appreciated.

Actions

Copy link

#11 [ruby-core:112673]

Updated by byroot (Jean Boussier) over 2 years ago

@dalehamel noticed via tracing that we're also calling getpid() quite a lot in the thread scheduler.

I think in that case we can simply use GET_VM()->fork_gen, so I prepared a second patch for that https://github.com/ruby/ruby/pull/7434

Actions

Copy link

#12 [ruby-core:112674]

Updated by ko1 (Koichi Sasada) over 2 years ago

byroot (Jean Boussier) wrote in #note-9:

How to read comment #7 results?

It's a flat reduction on our latency (server response time) metrics.

On average, with the pid_cache shim, our server response time is 2 milliseconds faster.

Thank you.
BTW it is easy to understand how 2ms has impact or not by showing the measured values of before/after.
Anyway, I agree that is valuable improvements.

Anonymous wrote in #note-10:

With modern CPU vulnerability mitigations, all system calls got
more expensive. Perhaps Linux vDSO mechanism can be extended
to support getpid as it does gettimeofday/clock_gettime

I see, especially on virtualization techniques.

Actions

Copy link

#13 [ruby-core:112680]

Updated by byroot (Jean Boussier) over 2 years ago

it is easy to understand how 2ms has impact or not by showing the measured values of before/after.

Yes, unfortunately, being a public company, we have all these rules about material informations and such, so I wasn't sure what I could share exactly... But I realize it makes it harder to understand, sorry :/

Actions

Copy link

#14 [ruby-core:112743]

Updated by byroot (Jean Boussier) over 2 years ago

Relaying here what Javier Honduvilla Coto said on one of the PRs:

wondering if it would be possible/make sense to override libc's getpid with a custom implementation that does the caching in there. That way not only Process.getpid would use the faster method but also any other part of the runtime such as what @dalehamel mentioned above or any other getpid calls from native libraries?

I don't know if it's a good idea or not, as far as I know it would be a first for ruby to override a symbol defined by libc, it doesn't do so with malloc and free for instance.

Actions

Copy link

#15 [ruby-core:112945]

Updated by matz (Yukihiro Matsumoto) over 2 years ago

Caching process id sounds OK for me. Go ahead.

Matz.

Actions

Copy link

#16

Updated by byroot (Jean Boussier) over 2 years ago

Status changed from Open to Closed

Applied in changeset git|1db8951d3a8be6a756c9d3d3b87231997b301985.

Cache Process.pid

[Feature #19443]

It's not uncommon for database client and similar network libraries
to protect themselves from Process.fork by regularly checking Process.pid

Until recently most libc would cache getpid() so this was a cheap
check to make.

However as of glibc version 2.25 the PID cache is removed and calls to
getpid() always invoke the actual system call which significantly degrades
the performance of existing applications.

The reason glibc removed the cache is that some libraries were bypassing
fork(2) by issuing system calls themselves, causing stale cache issues.

That isn't a concern for Ruby as bypassing MRI's primitive for forking
would render the VM unusable, so we can safely cache the PID.

Actions

Copy link

#17 [ruby-core:112949]

Updated by byroot (Jean Boussier) over 2 years ago

Thank you Matz! I merged the caching of Process.pid and $$.

The thread scheduler still call getpid() a lot, and I'll try to eliminate that in a follow-up (even though according to @ko1 (Koichi Sasada) most of that code will be replaced before 3.3).

Actions

Copy link

Also available in: Atom PDF

Like1

Like0Like0Like0Like0Like0Like1Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0

Project

General

Profile

Ruby

Tags

Custom queries

Feature #19443

Cache `Process.pid`

Updated by byroot (Jean Boussier) over 2 years ago

Updated by ko1 (Koichi Sasada) over 2 years ago

Updated by byroot (Jean Boussier) over 2 years ago

Updated by byroot (Jean Boussier) over 2 years ago

Updated by akr (Akira Tanaka) over 2 years ago

Updated by byroot (Jean Boussier) over 2 years ago

Updated by byroot (Jean Boussier) over 2 years ago

Updated by ko1 (Koichi Sasada) over 2 years ago

Updated by byroot (Jean Boussier) over 2 years ago

Updated by Anonymous over 2 years ago

Updated by byroot (Jean Boussier) over 2 years ago

Updated by ko1 (Koichi Sasada) over 2 years ago

Updated by byroot (Jean Boussier) over 2 years ago

Updated by byroot (Jean Boussier) over 2 years ago

Updated by matz (Yukihiro Matsumoto) over 2 years ago

Updated by byroot (Jean Boussier) over 2 years ago

Updated by byroot (Jean Boussier) over 2 years ago