Project

General

Profile

Actions

Bug #21560

closed

RUBY_MN_THREADS=1 causes large performance regression in Puma 7

Bug #21560: RUBY_MN_THREADS=1 causes large performance regression in Puma 7

Added by schneems (Richard Schneeman) 2 months ago. Updated 8 days ago.

Status:
Closed
Assignee:
Target version:
-
[ruby-core:123172]

Description

Expected

Running a webserver with RUBY_MN_THREADS=1 will increase performance or keep it the same.

Actual

  • Before: 22919.85 Requests/sec
  • After: 2287.13 Requests/sec

Reproduction

Tracking this at https://github.com/puma/puma/issues/3720. I would like to get a smaller reproduction, but until then here's how we're able to induce the failure:

Prepare puma:

$ git clone https://github.com/puma/puma
$ cd puma
$ git checkout v7.0.0.pre1
$ rake compile

Boot the server to deliver a "hello world" response:

$ bundle exec ruby -Ilib bin/puma -w 1 -t1 --preload test/rackup/hello.ru

Exercise the server with wrk (brew install wrk):

$ wrk -H 'Host: tfb-server' -H 'Accept: text/plain,text/html;q=0.9,application/xhtml+xml;q=0.9,application/xml;q=0.8,*/*;q=0.7' -H 'Connection: keep-alive' --latency -d 15 -c 16 --timeout 8 -t 12 http://localhost:9292

Results

Before:

$ bundle exec ruby -Ilib bin/puma -w 1 -t1 --preload test/rackup/hello.ru
$ wrk -H 'Host: tfb-server' -H 'Accept: text/plain,text/html;q=0.9,application/xhtml+xml;q=0.9,application/xml;q=0.8,*/*;q=0.7' -H 'Connection: keep-alive' --latency -d 15 -c 16 --timeout 8 -t 12 http://localhost:9292
Running 15s test @ http://localhost:9292
  12 threads and 16 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   786.74us    1.45ms  45.57ms   90.62%
    Req/Sec     1.92k   188.55     4.20k    83.22%
  Latency Distribution
     50%  386.00us
     75%  523.00us
     90%    2.11ms
     99%    4.63ms
  344638 requests in 15.04s, 25.23MB read
Requests/sec:  22919.85
Transfer/sec:      1.68MB

After:

$ env RUBY_MN_THREADS=1 bundle exec ruby -Ilib bin/puma -w 1 -t1 --preload test/rackup/hello.ru
$ wrk -H 'Host: tfb-server' -H 'Accept: text/plain,text/html;q=0.9,application/xhtml+xml;q=0.9,application/xml;q=0.8,*/*;q=0.7' -H 'Connection: keep-alive' --latency -d 15 -c 16 --timeout 8 -t 12 http://localhost:9292
Running 15s test @ http://localhost:9292
  12 threads and 16 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    28.89ms   40.82ms 129.37ms   78.94%
    Req/Sec   207.34     64.09   252.00     70.70%
  Latency Distribution
     50%  212.00us
     75%   56.77ms
     90%   99.48ms
     99%  127.01ms
  34365 requests in 15.03s, 2.52MB read
Requests/sec:   2287.13
Transfer/sec:    171.45KB

(Notice the dramatic drop in requests per second).

Updated by dentarg (Patrik Ragnarsson) 2 months ago Actions #1 [ruby-core:123174]

Here's how I did my tests:

$ git clone https://github.com/puma/puma
$ cd puma
$ git checkout v7.0.0
$ rake compile
$ echo 'silence_single_worker_warning; app { [200, {}, ["OK"]] }' | RUBY_MN_THREADS=1 ruby -Ilib bin/puma --config /dev/stdin --worker 1 --threads 1:1 --port 7777

Run https://github.com/rakyll/hey, it takes ~55s

$ time hey -disable-keepalive -n 5000 -c 100 http://127.0.0.1:7777
$ time hey -disable-keepalive -n 5000 -c 100 http://127.0.0.1:7777

Summary:
  Total:  54.5585 secs
  Slowest:  1.5201 secs
  Fastest:  0.0139 secs
  Average:  1.0798 secs
  Requests/sec: 91.6448

  Total data: 10000 bytes
  Size/request: 2 bytes

Response time histogram:
  0.014 [1] |
  0.165 [18]  |
  0.315 [14]  |
  0.466 [14]  |
  0.616 [14]  |
  0.767 [25]  |
  0.918 [34]  |
  1.068 [75]  |■
  1.219 [4764]  |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  1.369 [27]  |
  1.520 [14]  |


Latency distribution:
  10% in 1.0862 secs
  25% in 1.0885 secs
  50% in 1.0907 secs
  75% in 1.0931 secs
  90% in 1.1007 secs
  95% in 1.1087 secs
  99% in 1.1736 secs

Details (average, fastest, slowest):
  DNS+dialup: 0.0005 secs, 0.0139 secs, 1.5201 secs
  DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs
  req write:  0.0000 secs, 0.0000 secs, 0.0037 secs
  resp wait:  1.0791 secs, 0.0088 secs, 1.5193 secs
  resp read:  0.0001 secs, 0.0000 secs, 0.0036 secs

Status code distribution:
  [200] 5000 responses



hey -disable-keepalive -n 5000 -c 100 http://127.0.0.1:7777  1.96s user 2.30s system 7% cpu 54.608 total

If the sleep at https://github.com/puma/puma/blob/89a448e51b9ba22d46a7a6108fb4787623290dc2/lib/puma/server.rb#L391 is removed, the hey run is more or less instant:

$ time hey -disable-keepalive -n 5000 -c 100 http://127.0.0.1:7777
$ time hey -disable-keepalive -n 5000 -c 100 http://127.0.0.1:7777

Summary:
  Total:  0.4382 secs
  Slowest:  0.0622 secs
  Fastest:  0.0028 secs
  Average:  0.0087 secs
  Requests/sec: 11409.6531

  Total data: 10000 bytes
  Size/request: 2 bytes

Response time histogram:
  0.003 [1] |
  0.009 [4651]  |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.015 [248] |■■
  0.021 [0] |
  0.027 [0] |
  0.032 [0] |
  0.038 [0] |
  0.044 [0] |
  0.050 [0] |
  0.056 [25]  |
  0.062 [75]  |■


Latency distribution:
  10% in 0.0072 secs
  25% in 0.0074 secs
  50% in 0.0076 secs
  75% in 0.0079 secs
  90% in 0.0082 secs
  95% in 0.0094 secs
  99% in 0.0575 secs

Details (average, fastest, slowest):
  DNS+dialup: 0.0003 secs, 0.0028 secs, 0.0622 secs
  DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs
  req write:  0.0000 secs, 0.0000 secs, 0.0008 secs
  resp wait:  0.0083 secs, 0.0026 secs, 0.0612 secs
  resp read:  0.0000 secs, 0.0000 secs, 0.0010 secs

Status code distribution:
  [200] 5000 responses



hey -disable-keepalive -n 5000 -c 100 http://127.0.0.1:7777  0.23s user 0.27s system 111% cpu 0.451 total

Updated by luke-gru (Luke Gruber) 2 months ago Actions #2 [ruby-core:123179]

  • Assignee set to ractor

Thanks for the benchmarks, they're very helpful. We (Shopify) have been working on Ractor issues and performance a lot for 3.5.0, but we haven't had time to dig into the scheduler much yet. I'll try to take a look soon.

Updated by jhawthorn (John Hawthorn) 2 months ago Actions #3 [ruby-core:123193]

I suspect this might be the issue: it seems like under M:N threads sleeps < 10ms are rounded up.

Thread.new do
  5.times do
    before = Process.clock_gettime(Process::CLOCK_MONOTONIC)
    sleep 0.0001
    after = Process.clock_gettime(Process::CLOCK_MONOTONIC)
    puts(after - before)
  end
end.join
$ RUBY_MN_THREADS=1 ruby test.rb
0.010995999909937382
0.011000000406056643
0.011068999767303467
0.0102849998511374
0.011036999989300966

$ ruby test.rb
0.0001360001042485237
0.00013200007379055023
0.00013400008901953697
0.00013200007379055023
0.00013299984857439995

Updated by byroot (Jean Boussier) 2 months ago Actions #4 [ruby-core:123204]

Seems not limited to sleep, all scheduling seem impacted:

th = Thread.new { sleep 10 }

Thread.new do
  5.times do
    before = Process.clock_gettime(Process::CLOCK_MONOTONIC)
    th.join 0.0001
    after = Process.clock_gettime(Process::CLOCK_MONOTONIC)
    puts(after - before)
  end
end.join

Behaves the same.

Updated by andremuta (Andre Muta) 16 days ago · Edited Actions #5 [ruby-core:123542]

I run a git bisect building/testing with these examples (to raise if >2ms avg) and this is the commit that started the issue:

[be1bbd5b7d40ad863ab35097765d3754726bbd54] M:N thread scheduler for Ractors

Which is the commit the introduced the MN scheduler itself.

Edit: original feature with notes about performance: #19842

Updated by luke-gru (Luke Gruber) 12 days ago · Edited Actions #7 [ruby-core:123602]

With the commit from @andremuta:

Without RUBY_MN_THREADS:

Running 15s test @ http://localhost:9292
  12 threads and 16 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.00ms    1.52ms  11.44ms   82.67%
    Req/Sec     2.62k   102.88     2.78k    69.39%
  Latency Distribution
     50%  192.00us
     75%    1.20ms
     90%    3.79ms
     99%    5.46ms
  469973 requests in 15.03s, 34.40MB read
Requests/sec:  31274.13
Transfer/sec:      2.29MB

With RUBY_MN_THREADS:

Running 15s test @ http://localhost:9292
  12 threads and 16 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   297.29us  146.13us   5.58ms   91.42%
    Req/Sec     3.40k   107.72     3.61k    86.87%
  Latency Distribution
     50%  279.00us
     75%  325.00us
     90%  380.00us
     99%  540.00us
  612683 requests in 15.10s, 44.85MB read
Requests/sec:  40575.10
Transfer/sec:      2.97MB

Updated by byroot (Jean Boussier) 12 days ago Actions #8

  • Backport changed from 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN to 3.2: DONTNEED, 3.3: DONTNEED, 3.4: REQUIRED

Updated by MSP-Greg (Greg L) 9 days ago Actions #10 [ruby-core:123627]

This is also an issue with 3.3.10 and 3.4.7. Should it be backported? The original commit happened on 12-Oct-2023

Updated by byroot (Jean Boussier) 9 days ago Actions #11 [ruby-core:123634]

  • Backport changed from 3.2: DONTNEED, 3.3: DONTNEED, 3.4: REQUIRED to 3.2: DONTNEED, 3.3: REQUIRED, 3.4: REQUIRED

I forgot it was introduced in 3.3. It is eligible for backport, but ultimately that decision is up to each branch maintainers.

Updated by byroot (Jean Boussier) 9 days ago 1Actions #12 [ruby-core:123638]

  • Backport changed from 3.2: DONTNEED, 3.3: REQUIRED, 3.4: REQUIRED to 3.2: DONTNEED, 3.3: REQUIRED, 3.4: DONE

The 3.4 backport was merged, but the 3.3 backport doesn't apply cleanly.

Updated by nagachika (Tomoyuki Chikanaga) 8 days ago Actions #13 [ruby-core:123643]

  • Backport changed from 3.2: DONTNEED, 3.3: REQUIRED, 3.4: DONE to 3.2: DONTNEED, 3.3: DONE, 3.4: DONE
Actions

Also available in: PDF Atom