Project

General

Profile

Actions

Bug #21560

open

RUBY_MN_THREADS=1 causes large performance regression in Puma 7

Added by schneems (Richard Schneeman) 5 days ago. Updated 4 days ago.

Status:
Open
Assignee:
Target version:
-
[ruby-core:123172]

Description

Expected

Running a webserver with RUBY_MN_THREADS=1 will increase performance or keep it the same.

Actual

  • Before: 22919.85 Requests/sec
  • After: 2287.13 Requests/sec

Reproduction

Tracking this at https://github.com/puma/puma/issues/3720. I would like to get a smaller reproduction, but until then here's how we're able to induce the failure:

Prepare puma:

$ git clone https://github.com/puma/puma
$ cd puma
$ git checkout v7.0.0.pre1
$ rake compile

Boot the server to deliver a "hello world" response:

$ bundle exec ruby -Ilib bin/puma -w 1 -t1 --preload test/rackup/hello.ru

Exercise the server with wrk (brew install wrk):

$ wrk -H 'Host: tfb-server' -H 'Accept: text/plain,text/html;q=0.9,application/xhtml+xml;q=0.9,application/xml;q=0.8,*/*;q=0.7' -H 'Connection: keep-alive' --latency -d 15 -c 16 --timeout 8 -t 12 http://localhost:9292

Results

Before:

$ bundle exec ruby -Ilib bin/puma -w 1 -t1 --preload test/rackup/hello.ru
$ wrk -H 'Host: tfb-server' -H 'Accept: text/plain,text/html;q=0.9,application/xhtml+xml;q=0.9,application/xml;q=0.8,*/*;q=0.7' -H 'Connection: keep-alive' --latency -d 15 -c 16 --timeout 8 -t 12 http://localhost:9292
Running 15s test @ http://localhost:9292
  12 threads and 16 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   786.74us    1.45ms  45.57ms   90.62%
    Req/Sec     1.92k   188.55     4.20k    83.22%
  Latency Distribution
     50%  386.00us
     75%  523.00us
     90%    2.11ms
     99%    4.63ms
  344638 requests in 15.04s, 25.23MB read
Requests/sec:  22919.85
Transfer/sec:      1.68MB

After:

$ env RUBY_MN_THREADS=1 bundle exec ruby -Ilib bin/puma -w 1 -t1 --preload test/rackup/hello.ru
$ wrk -H 'Host: tfb-server' -H 'Accept: text/plain,text/html;q=0.9,application/xhtml+xml;q=0.9,application/xml;q=0.8,*/*;q=0.7' -H 'Connection: keep-alive' --latency -d 15 -c 16 --timeout 8 -t 12 http://localhost:9292
Running 15s test @ http://localhost:9292
  12 threads and 16 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    28.89ms   40.82ms 129.37ms   78.94%
    Req/Sec   207.34     64.09   252.00     70.70%
  Latency Distribution
     50%  212.00us
     75%   56.77ms
     90%   99.48ms
     99%  127.01ms
  34365 requests in 15.03s, 2.52MB read
Requests/sec:   2287.13
Transfer/sec:    171.45KB

(Notice the dramatic drop in requests per second).

Updated by dentarg (Patrik Ragnarsson) 5 days ago

Here's how I did my tests:

$ git clone https://github.com/puma/puma
$ cd puma
$ git checkout v7.0.0
$ rake compile
$ echo 'silence_single_worker_warning; app { [200, {}, ["OK"]] }' | RUBY_MN_THREADS=1 ruby -Ilib bin/puma --config /dev/stdin --worker 1 --threads 1:1 --port 7777

Run https://github.com/rakyll/hey, it takes ~55s

$ time hey -disable-keepalive -n 5000 -c 100 http://127.0.0.1:7777
$ time hey -disable-keepalive -n 5000 -c 100 http://127.0.0.1:7777

Summary:
  Total:  54.5585 secs
  Slowest:  1.5201 secs
  Fastest:  0.0139 secs
  Average:  1.0798 secs
  Requests/sec: 91.6448

  Total data: 10000 bytes
  Size/request: 2 bytes

Response time histogram:
  0.014 [1] |
  0.165 [18]  |
  0.315 [14]  |
  0.466 [14]  |
  0.616 [14]  |
  0.767 [25]  |
  0.918 [34]  |
  1.068 [75]  |■
  1.219 [4764]  |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  1.369 [27]  |
  1.520 [14]  |


Latency distribution:
  10% in 1.0862 secs
  25% in 1.0885 secs
  50% in 1.0907 secs
  75% in 1.0931 secs
  90% in 1.1007 secs
  95% in 1.1087 secs
  99% in 1.1736 secs

Details (average, fastest, slowest):
  DNS+dialup: 0.0005 secs, 0.0139 secs, 1.5201 secs
  DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs
  req write:  0.0000 secs, 0.0000 secs, 0.0037 secs
  resp wait:  1.0791 secs, 0.0088 secs, 1.5193 secs
  resp read:  0.0001 secs, 0.0000 secs, 0.0036 secs

Status code distribution:
  [200] 5000 responses



hey -disable-keepalive -n 5000 -c 100 http://127.0.0.1:7777  1.96s user 2.30s system 7% cpu 54.608 total

If the sleep at https://github.com/puma/puma/blob/89a448e51b9ba22d46a7a6108fb4787623290dc2/lib/puma/server.rb#L391 is removed, the hey run is more or less instant:

$ time hey -disable-keepalive -n 5000 -c 100 http://127.0.0.1:7777
$ time hey -disable-keepalive -n 5000 -c 100 http://127.0.0.1:7777

Summary:
  Total:  0.4382 secs
  Slowest:  0.0622 secs
  Fastest:  0.0028 secs
  Average:  0.0087 secs
  Requests/sec: 11409.6531

  Total data: 10000 bytes
  Size/request: 2 bytes

Response time histogram:
  0.003 [1] |
  0.009 [4651]  |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.015 [248] |■■
  0.021 [0] |
  0.027 [0] |
  0.032 [0] |
  0.038 [0] |
  0.044 [0] |
  0.050 [0] |
  0.056 [25]  |
  0.062 [75]  |■


Latency distribution:
  10% in 0.0072 secs
  25% in 0.0074 secs
  50% in 0.0076 secs
  75% in 0.0079 secs
  90% in 0.0082 secs
  95% in 0.0094 secs
  99% in 0.0575 secs

Details (average, fastest, slowest):
  DNS+dialup: 0.0003 secs, 0.0028 secs, 0.0622 secs
  DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs
  req write:  0.0000 secs, 0.0000 secs, 0.0008 secs
  resp wait:  0.0083 secs, 0.0026 secs, 0.0612 secs
  resp read:  0.0000 secs, 0.0000 secs, 0.0010 secs

Status code distribution:
  [200] 5000 responses



hey -disable-keepalive -n 5000 -c 100 http://127.0.0.1:7777  0.23s user 0.27s system 111% cpu 0.451 total

Updated by luke-gru (Luke Gruber) 4 days ago

  • Assignee set to ractor

Thanks for the benchmarks, they're very helpful. We (Shopify) have been working on Ractor issues and performance a lot for 3.5.0, but we haven't had time to dig into the scheduler much yet. I'll try to take a look soon.

Actions

Also available in: Atom PDF

Like1
Like0Like0