Bug #21560
open
RUBY_MN_THREADS=1 causes large performance regression in Puma 7
Description
Expected¶
Running a webserver with RUBY_MN_THREADS=1
will increase performance or keep it the same.
Actual¶
- Before: 22919.85 Requests/sec
- After: 2287.13 Requests/sec
Reproduction¶
Tracking this at https://github.com/puma/puma/issues/3720. I would like to get a smaller reproduction, but until then here's how we're able to induce the failure:
Prepare puma:
$ git clone https://github.com/puma/puma
$ cd puma
$ git checkout v7.0.0.pre1
$ rake compile
Boot the server to deliver a "hello world" response:
$ bundle exec ruby -Ilib bin/puma -w 1 -t1 --preload test/rackup/hello.ru
Exercise the server with wrk
(brew install wrk):
$ wrk -H 'Host: tfb-server' -H 'Accept: text/plain,text/html;q=0.9,application/xhtml+xml;q=0.9,application/xml;q=0.8,*/*;q=0.7' -H 'Connection: keep-alive' --latency -d 15 -c 16 --timeout 8 -t 12 http://localhost:9292
Results¶
Before:
$ bundle exec ruby -Ilib bin/puma -w 1 -t1 --preload test/rackup/hello.ru
$ wrk -H 'Host: tfb-server' -H 'Accept: text/plain,text/html;q=0.9,application/xhtml+xml;q=0.9,application/xml;q=0.8,*/*;q=0.7' -H 'Connection: keep-alive' --latency -d 15 -c 16 --timeout 8 -t 12 http://localhost:9292
Running 15s test @ http://localhost:9292
12 threads and 16 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 786.74us 1.45ms 45.57ms 90.62%
Req/Sec 1.92k 188.55 4.20k 83.22%
Latency Distribution
50% 386.00us
75% 523.00us
90% 2.11ms
99% 4.63ms
344638 requests in 15.04s, 25.23MB read
Requests/sec: 22919.85
Transfer/sec: 1.68MB
After:
$ env RUBY_MN_THREADS=1 bundle exec ruby -Ilib bin/puma -w 1 -t1 --preload test/rackup/hello.ru
$ wrk -H 'Host: tfb-server' -H 'Accept: text/plain,text/html;q=0.9,application/xhtml+xml;q=0.9,application/xml;q=0.8,*/*;q=0.7' -H 'Connection: keep-alive' --latency -d 15 -c 16 --timeout 8 -t 12 http://localhost:9292
Running 15s test @ http://localhost:9292
12 threads and 16 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 28.89ms 40.82ms 129.37ms 78.94%
Req/Sec 207.34 64.09 252.00 70.70%
Latency Distribution
50% 212.00us
75% 56.77ms
90% 99.48ms
99% 127.01ms
34365 requests in 15.03s, 2.52MB read
Requests/sec: 2287.13
Transfer/sec: 171.45KB
(Notice the dramatic drop in requests per second).
Updated by dentarg (Patrik Ragnarsson) 5 days ago
Here's how I did my tests:
$ git clone https://github.com/puma/puma
$ cd puma
$ git checkout v7.0.0
$ rake compile
$ echo 'silence_single_worker_warning; app { [200, {}, ["OK"]] }' | RUBY_MN_THREADS=1 ruby -Ilib bin/puma --config /dev/stdin --worker 1 --threads 1:1 --port 7777
Run https://github.com/rakyll/hey, it takes ~55s
$ time hey -disable-keepalive -n 5000 -c 100 http://127.0.0.1:7777
$ time hey -disable-keepalive -n 5000 -c 100 http://127.0.0.1:7777
Summary:
Total: 54.5585 secs
Slowest: 1.5201 secs
Fastest: 0.0139 secs
Average: 1.0798 secs
Requests/sec: 91.6448
Total data: 10000 bytes
Size/request: 2 bytes
Response time histogram:
0.014 [1] |
0.165 [18] |
0.315 [14] |
0.466 [14] |
0.616 [14] |
0.767 [25] |
0.918 [34] |
1.068 [75] |■
1.219 [4764] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
1.369 [27] |
1.520 [14] |
Latency distribution:
10% in 1.0862 secs
25% in 1.0885 secs
50% in 1.0907 secs
75% in 1.0931 secs
90% in 1.1007 secs
95% in 1.1087 secs
99% in 1.1736 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0005 secs, 0.0139 secs, 1.5201 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs
req write: 0.0000 secs, 0.0000 secs, 0.0037 secs
resp wait: 1.0791 secs, 0.0088 secs, 1.5193 secs
resp read: 0.0001 secs, 0.0000 secs, 0.0036 secs
Status code distribution:
[200] 5000 responses
hey -disable-keepalive -n 5000 -c 100 http://127.0.0.1:7777 1.96s user 2.30s system 7% cpu 54.608 total
If the sleep
at https://github.com/puma/puma/blob/89a448e51b9ba22d46a7a6108fb4787623290dc2/lib/puma/server.rb#L391 is removed, the hey
run is more or less instant:
$ time hey -disable-keepalive -n 5000 -c 100 http://127.0.0.1:7777
$ time hey -disable-keepalive -n 5000 -c 100 http://127.0.0.1:7777
Summary:
Total: 0.4382 secs
Slowest: 0.0622 secs
Fastest: 0.0028 secs
Average: 0.0087 secs
Requests/sec: 11409.6531
Total data: 10000 bytes
Size/request: 2 bytes
Response time histogram:
0.003 [1] |
0.009 [4651] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.015 [248] |■■
0.021 [0] |
0.027 [0] |
0.032 [0] |
0.038 [0] |
0.044 [0] |
0.050 [0] |
0.056 [25] |
0.062 [75] |■
Latency distribution:
10% in 0.0072 secs
25% in 0.0074 secs
50% in 0.0076 secs
75% in 0.0079 secs
90% in 0.0082 secs
95% in 0.0094 secs
99% in 0.0575 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0003 secs, 0.0028 secs, 0.0622 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs
req write: 0.0000 secs, 0.0000 secs, 0.0008 secs
resp wait: 0.0083 secs, 0.0026 secs, 0.0612 secs
resp read: 0.0000 secs, 0.0000 secs, 0.0010 secs
Status code distribution:
[200] 5000 responses
hey -disable-keepalive -n 5000 -c 100 http://127.0.0.1:7777 0.23s user 0.27s system 111% cpu 0.451 total
Updated by luke-gru (Luke Gruber) 4 days ago
- Assignee set to ractor
Thanks for the benchmarks, they're very helpful. We (Shopify) have been working on Ractor issues and performance a lot for 3.5.0, but we haven't had time to dig into the scheduler much yet. I'll try to take a look soon.