https://bugs.ruby-lang.org/https://bugs.ruby-lang.org/favicon.ico?17113305112019-07-12T01:45:41ZRuby Issue Tracking SystemRuby master - Feature #15997: Improve performance of fiber creation by using pool allocation strategy.https://bugs.ruby-lang.org/issues/15997?journal_id=793142019-07-12T01:45:41Zioquatix (Samuel Williams)samuel@oriontransfer.net
<ul></ul><p><a class="user active user-mention" href="https://bugs.ruby-lang.org/users/17">@ko1 (Koichi Sasada)</a> asked:</p>
<blockquote>
<p>(1) stack size assumption</p>
</blockquote>
<p>The fiber pool stack size is (guard page + vm_stack_size + fiber_machine_stack_size).</p>
<blockquote>
<p>(2) maximum allocatable size</p>
</blockquote>
<p>On 64-bit platform it's effectively the same, although in some situations it can be better due to reduced number of <code>mmap</code>s required.</p>
<p>On 32-bit platform, it's slightly worse, because I didn't bother implementing fallback on <code>mmap</code> failure. In current implementation, worst case difference is 128 fiber stacks. That being said, if you are allocating fibers up to the limit of the 32-bit address space you will quickly run into other issues, so I don't consider this a bug, it's just natural limit of 32-bit address space.</p>
<blockquote>
<p>(3) GC.enable/disable usage (edited)</p>
</blockquote>
<ul>
<li>
<code>vm2_fiber_allocate</code> is running with <code>GC.disable</code> to do fair comparison of allocation overheads.</li>
<li>
<code>vm2_fiber_count</code> is running with normal GC, but due to using alloca on fiber pool stack, GC pressure/count is significantly reduced. It is not expected to represent expected improvement of real world code, but shows that fiber pool code in isolation avoids GC overheads.</li>
<li>
<code>vm2_fiber_reuse </code> is running with <code>GC.disable</code> and deterministically calls <code>GC.start</code> after allocating 1024 fibers to test performance of fiber reuse.</li>
<li>
<code>vm2_fiber_switch</code> is existing benchmark and is not affected by fiber pool implementation.</li>
</ul> Ruby master - Feature #15997: Improve performance of fiber creation by using pool allocation strategy.https://bugs.ruby-lang.org/issues/15997?journal_id=793162019-07-12T01:54:01Zko1 (Koichi Sasada)
<ul></ul><p>ioquatix (Samuel Williams) wrote:</p>
<blockquote>
<p><a class="user active user-mention" href="https://bugs.ruby-lang.org/users/17">@ko1 (Koichi Sasada)</a> asked:</p>
<blockquote>
<p>(1) stack size assumption</p>
</blockquote>
<p>The fiber pool stack size is (guard page + vm_stack_size + fiber_machine_stack_size).</p>
</blockquote>
<p>which size (xx KB, etc)?</p>
<blockquote>
<blockquote>
<p>(2) maximum allocatable size</p>
</blockquote>
<p>On 64-bit platform it's effectively the same, although in some situations it can be better due to reduced number of <code>mmap</code>s required.</p>
<p>On 32-bit platform, it's slightly worse, because I didn't bother implementing fallback on <code>mmap</code> failure. In current implementation, worst case difference is 128 fiber stacks. That being said, if you are allocating fibers up to the limit of the 32-bit address space you will quickly run into other issues, so I don't consider this a bug, it's just natural limit of 32-bit address space.</p>
</blockquote>
<p>I know you got measurements. please share us.</p>
<blockquote>
<blockquote>
<p>(3) GC.enable/disable usage (edited)</p>
</blockquote>
<ul>
<li>
<code>vm2_fiber_count</code> is running with normal GC, but due to using alloca on fiber pool stack, GC pressure/count is significantly reduced. It is not expected to represent expected improvement of real world code, but shows that fiber pool code in isolation avoids GC overheads.</li>
</ul>
</blockquote>
<p>In general, we should tell this memory usage to GC with <code>rb_gc_adjust_memory_usage()</code>. I don't think it is needed in this case.</p> Ruby master - Feature #15997: Improve performance of fiber creation by using pool allocation strategy.https://bugs.ruby-lang.org/issues/15997?journal_id=793192019-07-12T02:06:11Zioquatix (Samuel Williams)samuel@oriontransfer.net
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/79319/diff?detail_id=52602">diff</a>)</li></ul> Ruby master - Feature #15997: Improve performance of fiber creation by using pool allocation strategy.https://bugs.ruby-lang.org/issues/15997?journal_id=793212019-07-12T02:09:27Zioquatix (Samuel Williams)samuel@oriontransfer.net
<ul></ul><p>I've removed <code>2e7b2a0db6 Make default fiber stack size same as thread stack size.</code> because I think it should be separate PR.</p> Ruby master - Feature #15997: Improve performance of fiber creation by using pool allocation strategy.https://bugs.ruby-lang.org/issues/15997?journal_id=793222019-07-12T02:10:55Zioquatix (Samuel Williams)samuel@oriontransfer.net
<ul></ul><p>Here is a short script you can use to compare fiber allocation performance:</p>
<pre><code>GC.disable
puts RUBY_VERSION
system("git log -1 --oneline")
start_time = Time.now
ary = []
(1..).each{|i|
if (i % 1000).zero?
puts "#{Time.now - start_time} -> #{i} fibers [GC.count=#{GC.count}]"
system("ps --pid #{$$} -o pid,vsz,rsz")
end
ary << f = Fiber.new{Fiber.yield}
f.resume
}
</code></pre> Ruby master - Feature #15997: Improve performance of fiber creation by using pool allocation strategy.https://bugs.ruby-lang.org/issues/15997?journal_id=793232019-07-12T02:15:04Zioquatix (Samuel Williams)samuel@oriontransfer.net
<ul></ul><p>To run benchmarks:</p>
<pre><code>make benchmark ITEM=vm2_fiber
</code></pre> Ruby master - Feature #15997: Improve performance of fiber creation by using pool allocation strategy.https://bugs.ruby-lang.org/issues/15997?journal_id=793242019-07-12T02:18:35Zioquatix (Samuel Williams)samuel@oriontransfer.net
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/79324/diff?detail_id=52605">diff</a>)</li></ul> Ruby master - Feature #15997: Improve performance of fiber creation by using pool allocation strategy.https://bugs.ruby-lang.org/issues/15997?journal_id=793372019-07-12T04:41:05Zioquatix (Samuel Williams)samuel@oriontransfer.net
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/79337/diff?detail_id=52625">diff</a>)</li></ul> Ruby master - Feature #15997: Improve performance of fiber creation by using pool allocation strategy.https://bugs.ruby-lang.org/issues/15997?journal_id=793382019-07-12T04:41:29Zioquatix (Samuel Williams)samuel@oriontransfer.net
<ul></ul><p><a class="user active user-mention" href="https://bugs.ruby-lang.org/users/13">@matz (Yukihiro Matsumoto)</a> do you mind giving your feedback/opinion if possible?</p> Ruby master - Feature #15997: Improve performance of fiber creation by using pool allocation strategy.https://bugs.ruby-lang.org/issues/15997?journal_id=793392019-07-12T04:55:33Zioquatix (Samuel Williams)samuel@oriontransfer.net
<ul></ul><blockquote>
<p>which size (xx KB, etc)?</p>
</blockquote>
<pre><code>#define RUBY_VM_FIBER_VM_STACK_SIZE ( 16 * 1024 * sizeof(VALUE)) /* 64 KB or 128 KB */
#define RUBY_VM_FIBER_VM_STACK_SIZE_MIN ( 2 * 1024 * sizeof(VALUE)) /* 8 KB or 16 KB */
#define RUBY_VM_FIBER_MACHINE_STACK_SIZE ( 64 * 1024 * sizeof(VALUE)) /* 256 KB or 512 KB */
#if defined(__powerpc64__)
#define RUBY_VM_FIBER_MACHINE_STACK_SIZE_MIN ( 32 * 1024 * sizeof(VALUE)) /* 128 KB or 256 KB */
#else
#define RUBY_VM_FIBER_MACHINE_STACK_SIZE_MIN ( 16 * 1024 * sizeof(VALUE)) /* 64 KB or 128 KB */
#endif
</code></pre>
<p>Assuming page size of 4K and 64-bit platform, each fiber needs 4KB + 512KB + 16KB stack.</p>
<p>However, this is only part of the picture. Using the given <code>count.rb</code> we can divide VSZ/RSZ by number of fibers, to give "actual" usage:</p>
<pre><code>2.7.0-fiber-pool
2.28s to allocate 200000 fibers
PID VSZ RSZ
16105 129500736 2606936
-> 13.03KB of physical memory per fiber, 647.5KB address space per fiber
</code></pre>
<pre><code>2.7.0-master
82.67s to allocate 200000 fibers
PID VSZ RSZ
22398 128513292 2396224
-> 11.98KB of physical memory per fiber, 642.5KB address space per fiber
</code></pre>
<p>There is no significant difference, and it also looks like there might be room for improvement.</p> Ruby master - Feature #15997: Improve performance of fiber creation by using pool allocation strategy.https://bugs.ruby-lang.org/issues/15997?journal_id=793462019-07-12T06:17:21Zioquatix (Samuel Williams)samuel@oriontransfer.net
<ul></ul><blockquote>
<p>I know you got measurements. please share us.</p>
</blockquote>
<p>I added <code>show_limit</code> to bootstrap test so we can see for all platforms. However, all platforms I tested could allocate 10,000 fibers easily. e.g. all builds on Travis, AppVeyor, etc. When we explored increasing fiber stack size (to the same as thread stack size), we did create some problem for 32-bit platforms.</p>
<p>On Linux, we can artificially limit the memory (e.g. 4GB) to see how behaviour changes.</p>
<pre><code>2.7.0-fiber-pool
$ bash -c "ulimit -v 4000000; ./ruby --disable-gems ./count.rb"
... snip ...
0.059s to create 5113 fibers [GC.count=0]
./count.rb:16:in `resume': can't alloc machine stack to fiber (1024 x 659456 bytes): Cannot allocate memory (FiberError)
</code></pre>
<pre><code>2.6.3
$ bash -c "ulimit -v 4000000; ./ruby --disable-gems ./count.rb"
... snip ...
0.119s to create 6118 fibers [GC.count=0]
./count.rb:16:in `resume': can't alloc machine stack to fiber: Cannot allocate memory (FiberError)
</code></pre>
<p>The main concern I had for 32-bit implementation is fiber pool consuming all address space. Well, 32-bit address space is very limited. There is a simple fix for this if it's a major blocking point: we can revert back to individual fiber allocation and deallocation. It's straight forward to implement actually since all fibers now just use two functions: <code>fiber_pool_stack_acquire</code> and <code>fiber_pool_stack_release</code>. We can just replace these with direct <code>mmap</code> and <code>munmap</code>. I didn't bother because I don't know if it's problem in reality or just theoretical.</p>
<p>Regarding upper limits, I tested more extreme case. I could allocate 4 million fibers in about 2 minutes on my server (same specs as listed in summary), and it used 2.4TB of address space, and 50GB of actual memory. This is with GC disabled, so it's not exactly realistic test, but does show some kind of upper limit.</p> Ruby master - Feature #15997: Improve performance of fiber creation by using pool allocation strategy.https://bugs.ruby-lang.org/issues/15997?journal_id=793472019-07-12T06:18:50Zioquatix (Samuel Williams)samuel@oriontransfer.net
<ul></ul><blockquote>
<p>In general, we should tell this memory usage to GC with rb_gc_adjust_memory_usage(). I don't think it is needed in this case.</p>
</blockquote>
<p>Maybe I don't follow you, but I believe fiber memory usage is reported correctly by <code>fiber_memsize</code> and <code>cont_memsize</code>?</p> Ruby master - Feature #15997: Improve performance of fiber creation by using pool allocation strategy.https://bugs.ruby-lang.org/issues/15997?journal_id=793492019-07-12T07:55:29Zioquatix (Samuel Williams)samuel@oriontransfer.net
<ul></ul><p>I did some more research about 32-bit applications.</p>
<p>On Windows (32-bit), the process is limited to 2GB of memory, but address space should be 4GB. This is apparently the same for 32-bit Linux, maybe that includes arm32? There are some exceptions (PAE), but I don't know a lot about it.</p>
<p>If we assume we can create maximum 6000 fibers on a 32-bit platform (it's probably less in practice), if we use a pool allocator with 8 stacks per allocation, it only takes 750 fibers (6000 / 8) to deadlock the pool. What I mean is 1 stack is used out of every allocation, so we can't free any address space, even if we implemented it.</p>
<p>Therefore, the best approach for 32-bit architecture is probably to avoid pooled allocations. We can use existing code, but we basically restrict pool allocation to 1 stack per allocation. This way, we can always free the address space when the stack is released.</p>
<p>I'd be happy to receive more feedback about this proposed approach, but as it seems like the right way forward, I'll probably just implement it.</p> Ruby master - Feature #15997: Improve performance of fiber creation by using pool allocation strategy.https://bugs.ruby-lang.org/issues/15997?journal_id=793532019-07-12T13:16:49Zioquatix (Samuel Williams)samuel@oriontransfer.net
<ul></ul><p>Okay, so I implemented fiber pool changes which make it more suitable for 32-bit platform. It required additional book keeping. Essentially, the allocation list and free list became double linked which allows us to remove allocations and vacant stacks as required. It's more book keeping but the performance overhead is negligible.</p>
<p>Now, if fiber pool allocation becomes empty, we can remove it entirely. This means, the address space is freed too. So, on 32-bit platform, if we cap fiber pool size to maximum 4 - 8 stacks, maybe it's acceptable and limits fragmentation/over-use of address space.</p>
<p>We can also now experiment with the following situations:</p>
<ul>
<li>(1) When fiber pool allocation becomes unused, <code>munmap</code> it; reduce physical memory usage and address space.</li>
<li>(2) When fiber pool stack is released, <code>madvise(free)</code> it; reduce physical memory usage/swap usage only).</li>
<li>(3) When fiber pool stack is released, do nothing. It remains in the cache. If there is memory pressure, it can get swapped to disk.</li>
<li>(4) When the fiber pool stack is released, do nothing. On major GC, do one of the above.</li>
</ul>
<p>The code for the above decision is in <code>fiber_pool_stack_release</code>:</p>
<pre><code class="c syntaxhl" data-language="c"> <span class="k">if</span> <span class="p">(</span><span class="n">stack</span><span class="p">.</span><span class="n">allocation</span><span class="o">-></span><span class="n">used</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="n">fiber_pool_allocation_free</span><span class="p">(</span><span class="n">stack</span><span class="p">.</span><span class="n">allocation</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">fiber_pool_stack_free</span><span class="p">(</span><span class="o">&</span><span class="n">vacancy</span><span class="o">-></span><span class="n">stack</span><span class="p">);</span>
</code></pre>
<p>Here are the difference I performance, on macOS, comparing with Ruby 2.6.2:</p>
<p>(1) <code>munmap</code></p>
<pre><code> vm2_fiber_allocate
built-ruby: 130066.1 i/s
compare-ruby: 85436.6 i/s - 1.52x slower
vm2_fiber_count
built-ruby: 88426.2 i/s
compare-ruby: 3811.9 i/s - 23.20x slower
vm2_fiber_reuse
built-ruby: 109.9 i/s
compare-ruby: 61.5 i/s - 1.79x slower
vm2_fiber_switch
compare-ruby: 9437510.2 i/s
built-ruby: 8893636.1 i/s - 1.06x slower
</code></pre>
<p>(2) <code>madvise(free)</code></p>
<pre><code> vm2_fiber_allocate
built-ruby: 129641.0 i/s
compare-ruby: 101306.1 i/s - 1.28x slower
vm2_fiber_count
built-ruby: 87447.4 i/s
compare-ruby: 3945.7 i/s - 22.16x slower
vm2_fiber_reuse
built-ruby: 110.6 i/s
compare-ruby: 61.7 i/s - 1.79x slower
vm2_fiber_switch
compare-ruby: 9397149.4 i/s
built-ruby: 9095279.0 i/s - 1.03x slower
</code></pre>
<p>(3) nothing</p>
<pre><code> vm2_fiber_allocate
built-ruby: 129309.2 i/s
compare-ruby: 103792.5 i/s - 1.25x slower
vm2_fiber_count
built-ruby: 90957.3 i/s
compare-ruby: 4013.8 i/s - 22.66x slower
vm2_fiber_reuse
built-ruby: 644.5 i/s
compare-ruby: 61.0 i/s - 10.56x slower N.B. on Linux server, it's about 7x.
vm2_fiber_switch
built-ruby: 9196315.6 i/s
compare-ruby: 8661514.5 i/s - 1.06x slower
</code></pre>
<p>As you can see, trying to free address space or reduce memory/swap usage has a significant overhead in the <code>vm2_fiber_reuse</code> case, which is one of the most important for long running servers.</p>
<p>(1) & (2) look similar In terms of performance, with <code>munmap</code> perhaps being slightly better because it's done one when the fiber pool allocation is completely empty vs after every stack release.</p>
<p>(1) <code>munmap</code> releases address space back to system, which is ideal for 32-bit address space.</p>
<p>(2) <code>madvise(free)</code> should be much faster than <code>munmap</code>, but it doesn't seem significant. It leaves address space intact, but tells system that stack memory region is no longer needed, and it avoids the need to swap it to disk when there is memory pressure.</p>
<p>(3) address space is left in place. If system experiences memory pressure, stack areas are swapped to disk, even if unused. Because of this, if the user allocated 1 million fibers, a large amount of address space and swap space may be consumed. However I would like to believe this isn't such a big problem.</p>
<p>While I think the answer for 32-bit system is clearly (1), the best option for 64-bit is not obvious. (2) is pessimistic, while (3) is optimistic and may over-commit memory.</p>
<p>There is one solution to this however. We could utilise <code>GC.compact</code> or a similar mechanism. That way, we could use (3), but apply (1) and (2) as appropriate if <code>GC.compact</code> is invoked. There are other options here too: e.g. major GC, some kind of temporal GC (release fiber pool if it was no used after some time), <code>madvise(free)</code> only if more than 50% of stacks are freed, etc. However, I like simple, deterministic option, so maybe I personally lean towards <code>GC.compact</code> or <code>Fiber::Pool.shared.compact</code>, or some other similar method.</p> Ruby master - Feature #15997: Improve performance of fiber creation by using pool allocation strategy.https://bugs.ruby-lang.org/issues/15997?journal_id=796672019-07-16T02:34:40Zioquatix (Samuel Williams)samuel@oriontransfer.net
<ul></ul><p>I've updated the implementation for fiber pool, which now has some functionality controlled by <code>#define FIBER_POOL_ALLOCATION_FREE</code>.</p>
<p>The normal (CPU efficient, memory expensive) implementation creates and reuses <code>fiber_pool_allocation</code> indefinitely, and never returns the resources back to the system, so address space is not released.</p>
<p>If you defined <code>FIBER_POOL_ALLOCATION_FREE</code>, during <code>fiber_pool_stack_release</code>:</p>
<ol>
<li>We use <code>madvise(free)</code> to clear dirty bit on stack memory (to avoid swapping to disk under memory pressure) and,</li>
<li>We use <code>munmap</code> on the <code>fiber_pool_allocation</code> when it’s usage drops to 0 (extra book keeping required), and:
<ul>
<li>remove it from allocation list (double linked list required)</li>
<li>remove all <code>fiber_pool_vacancy</code> from vacancy list (double linked list required).</li>
</ul>
</li>
</ol>
<p>The consequence of <code>#define FIBER_POOL_ALLOCATION_FREE</code> is that <code>fiber_pool_stack_acquire</code> and <code>fiber_pool_stack_release</code> becomes more CPU expensive, but address space is released back to the system when possible and dirty pages are cleared so that swap space is not consumed. More specifically:</p>
<ol>
<li>
<code>fiber_pool_stack_release</code> will always call <code>madvise(free)</code> and occasionally <code>munmap</code>.</li>
<li>
<code>fiber_pool_stack_acquire</code> is more likely to call <code>mmap</code> and <code>mprotect</code> if more stacks are required.</li>
</ol>
<p>We could merge this PR, and decide whether we want to be conservative or not, or maybe do it on a arch basis (e.g. 32-bit could be conservative vs 64-bit since address space is less of a concern).</p> Ruby master - Feature #15997: Improve performance of fiber creation by using pool allocation strategy.https://bugs.ruby-lang.org/issues/15997?journal_id=796682019-07-16T02:44:57Zioquatix (Samuel Williams)samuel@oriontransfer.net
<ul></ul><p>On Linux, comparing fiber-pool with master.</p>
<pre><code>% make benchmark COMPARE_RUBY="../../ruby/build/ruby --disable-gems" ITEM=vm2_fiber RUBY_SHARED_FIBER_POOL_FREE_STACKS=0
Calculating -------------------------------------
master fiber-pool
vm2_fiber_allocate 122.128k 163.030k i/s - 100.000k times in 0.818812s 0.613385s
vm2_fiber_count 2.717k 78.701k i/s - 100.000k times in 36.809948s 1.270639s
vm2_fiber_reuse 155.573 935.127 i/s - 200.000 times in 1.285570s 0.213875s
vm2_fiber_switch 12.842M 12.730M i/s - 20.000M times in 1.557340s 1.571121s
Comparison:
vm2_fiber_allocate
built-ruby: 163029.6 i/s
compare-ruby: 122128.2 i/s - 1.33x slower
vm2_fiber_count
built-ruby: 78700.5 i/s
compare-ruby: 2716.7 i/s - 28.97x slower
vm2_fiber_reuse
built-ruby: 935.1 i/s
compare-ruby: 155.6 i/s - 6.01x slower
vm2_fiber_switch
compare-ruby: 12842411.5 i/s
built-ruby: 12729761.1 i/s - 1.01x slower
% make benchmark COMPARE_RUBY="../../ruby/build/ruby --disable-gems" ITEM=vm2_fiber RUBY_SHARED_FIBER_POOL_FREE_STACKS=1
Calculating -------------------------------------
master fiber-pool
vm2_fiber_allocate 122.656k 165.218k i/s - 100.000k times in 0.815289s 0.605260s
vm2_fiber_count 2.682k 77.541k i/s - 100.000k times in 37.288038s 1.289637s
vm2_fiber_reuse 160.836 449.224 i/s - 200.000 times in 1.243500s 0.445212s
vm2_fiber_switch 13.159M 13.132M i/s - 20.000M times in 1.519828s 1.522983s
Comparison:
vm2_fiber_allocate
built-ruby: 165218.2 i/s
compare-ruby: 122655.9 i/s - 1.35x slower
vm2_fiber_count
built-ruby: 77541.2 i/s
compare-ruby: 2681.8 i/s - 28.91x slower
vm2_fiber_reuse
built-ruby: 449.2 i/s
compare-ruby: 160.8 i/s - 2.79x slower
vm2_fiber_switch
compare-ruby: 13159383.0 i/s
built-ruby: 13132119.3 i/s - 1.00x slower
</code></pre> Ruby master - Feature #15997: Improve performance of fiber creation by using pool allocation strategy.https://bugs.ruby-lang.org/issues/15997?journal_id=796762019-07-16T04:48:07Zioquatix (Samuel Williams)samuel@oriontransfer.net
<ul></ul><p>On Darwin, comparing fiber-pool with master:</p>
<pre><code>> make benchmark COMPARE_RUBY="../../ruby/build/ruby --disable-gems" ITEM=vm2_fiber RUBY_SHARED_FIBER_POOL_FREE_STACKS=0
Calculating -------------------------------------
master fiber-pool
vm2_fiber_allocate 99.329k 124.488k i/s - 100.000k times in 1.006759s 0.803293s
vm2_fiber_count 3.621k 82.447k i/s - 100.000k times in 27.620062s 1.212895s
vm2_fiber_reuse 55.039 615.402 i/s - 200.000 times in 3.633812s 0.324991s
vm2_fiber_switch 8.803M 8.591M i/s - 20.000M times in 2.272063s 2.328041s
Comparison:
vm2_fiber_allocate
built-ruby: 124487.6 i/s
compare-ruby: 99328.6 i/s - 1.25x slower
vm2_fiber_count
built-ruby: 82447.4 i/s
compare-ruby: 3620.6 i/s - 22.77x slower
vm2_fiber_reuse
built-ruby: 615.4 i/s
compare-ruby: 55.0 i/s - 11.18x slower
vm2_fiber_switch
compare-ruby: 8802572.8 i/s
built-ruby: 8590914.0 i/s - 1.02x slower
> make benchmark COMPARE_RUBY="../../ruby/build/ruby --disable-gems" ITEM=vm2_fiber RUBY_SHARED_FIBER_POOL_FREE_STACKS=1
Calculating -------------------------------------
master fiber-pool
vm2_fiber_allocate 96.834k 121.823k i/s - 100.000k times in 1.032698s 0.820865s
vm2_fiber_count 3.027k 80.419k i/s - 100.000k times in 33.035732s 1.243489s
vm2_fiber_reuse 56.275 449.230 i/s - 200.000 times in 3.553979s 0.445206s
vm2_fiber_switch 8.640M 8.255M i/s - 20.000M times in 2.314890s 2.422917s
Comparison:
vm2_fiber_allocate
built-ruby: 121822.7 i/s
compare-ruby: 96833.7 i/s - 1.26x slower
vm2_fiber_count
built-ruby: 80418.9 i/s
compare-ruby: 3027.0 i/s - 26.57x slower
vm2_fiber_reuse
built-ruby: 449.2 i/s
compare-ruby: 56.3 i/s - 7.98x slower
vm2_fiber_switch
compare-ruby: 8639719.4 i/s
built-ruby: 8254513.1 i/s - 1.05x slower
</code></pre> Ruby master - Feature #15997: Improve performance of fiber creation by using pool allocation strategy.https://bugs.ruby-lang.org/issues/15997?journal_id=796812019-07-16T08:31:20Zioquatix (Samuel Williams)samuel@oriontransfer.net
<ul><li><strong>File</strong> <a href="/attachments/7906">Screen Shot 2019-07-16 at 8.30.59 PM.png</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/7906/Screen%20Shot%202019-07-16%20at%208.30.59%20PM.png">Screen Shot 2019-07-16 at 8.30.59 PM.png</a> added</li></ul><p>Attached graph/table.</p> Ruby master - Feature #15997: Improve performance of fiber creation by using pool allocation strategy.https://bugs.ruby-lang.org/issues/15997?journal_id=796842019-07-17T01:16:35Zioquatix (Samuel Williams)samuel@oriontransfer.net
<ul></ul><p>Here is some testing using falcon and <code>ab</code>. <code>ab</code> is HTTP/1.0 client test. Because of that, each connection/request makes new fiber, so it's going to show if there are improvements/regressions to performance.</p>
<pre><code>Server Software: 2.7.0-fiber-pool FREE_STACKS=0
Server Hostname: localhost
Server Port: 9292
Document Path: /small
Document Length: 1200 bytes
Concurrency Level: 256
Time taken for tests: 14.174 seconds
Complete requests: 100000
Failed requests: 0
Total transferred: 126000000 bytes
HTML transferred: 120000000 bytes
Requests per second: 7055.11 [#/sec] (mean)
Time per request: 36.286 [ms] (mean)
Time per request: 0.142 [ms] (mean, across all concurrent requests)
Transfer rate: 8681.10 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 17 122.8 2 3038
Processing: 4 19 5.7 18 231
Waiting: 0 8 6.6 7 225
Total: 10 36 123.1 19 3056
Percentage of the requests served within a certain time (ms)
50% 19
66% 21
75% 23
80% 24
90% 27
95% 28
98% 31
99% 1022
100% 3056 (longest request)
Server Software: 2.7.0-fiber-pool FREE_STACKS=1
Server Hostname: localhost
Server Port: 9292
Document Path: /small
Document Length: 1200 bytes
Concurrency Level: 256
Time taken for tests: 14.676 seconds
Complete requests: 100000
Failed requests: 0
Total transferred: 126000000 bytes
HTML transferred: 120000000 bytes
Requests per second: 6813.71 [#/sec] (mean)
Time per request: 37.571 [ms] (mean)
Time per request: 0.147 [ms] (mean, across all concurrent requests)
Transfer rate: 8384.06 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 17 124.6 1 1030
Processing: 4 20 9.3 18 416
Waiting: 0 8 10.0 7 412
Total: 7 37 126.9 20 1437
Percentage of the requests served within a certain time (ms)
50% 20
66% 22
75% 23
80% 24
90% 27
95% 29
98% 35
99% 1027
100% 1437 (longest request)
Server Software: 2.7.0-master
Server Hostname: localhost
Server Port: 9293
Document Path: /small
Document Length: 1200 bytes
Concurrency Level: 256
Time taken for tests: 16.170 seconds
Complete requests: 100000
Failed requests: 0
Total transferred: 126000000 bytes
HTML transferred: 120000000 bytes
Requests per second: 6184.15 [#/sec] (mean)
Time per request: 41.396 [ms] (mean)
Time per request: 0.162 [ms] (mean, across all concurrent requests)
Transfer rate: 7609.41 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 19 133.4 1 3223
Processing: 4 22 7.4 21 432
Waiting: 0 9 8.3 8 422
Total: 5 41 134.3 22 3246
Percentage of the requests served within a certain time (ms)
50% 22
66% 23
75% 25
80% 27
90% 31
95% 33
98% 39
99% 1029
100% 3246 (longest request)
Server Software: 2.6.3
Server Hostname: localhost
Server Port: 9294
Document Path: /small
Document Length: 1200 bytes
Concurrency Level: 256
Time taken for tests: 15.600 seconds
Complete requests: 100000
Failed requests: 0
Total transferred: 126000000 bytes
HTML transferred: 120000000 bytes
Requests per second: 6410.16 [#/sec] (mean)
Time per request: 39.937 [ms] (mean)
Time per request: 0.156 [ms] (mean, across all concurrent requests)
Transfer rate: 7887.51 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 18 130.2 1 3132
Processing: 4 21 8.4 20 432
Waiting: 0 9 9.2 8 428
Total: 9 39 131.6 21 3143
Percentage of the requests served within a certain time (ms)
50% 21
66% 22
75% 23
80% 25
90% 31
95% 33
98% 34
99% 1029
100% 3143 (longest request)
</code></pre> Ruby master - Feature #15997: Improve performance of fiber creation by using pool allocation strategy.https://bugs.ruby-lang.org/issues/15997?journal_id=796852019-07-17T01:57:05Zioquatix (Samuel Williams)samuel@oriontransfer.net
<ul></ul><p>There is some kind of performance regression in 2.6.3 -> 2.7.0-master.</p>
<p>So, I'm trying with 2.7.0-preview1 to see if it's better or worse.</p>
<pre><code>Server Software:
Server Hostname: localhost
Server Port: 9294
Document Path: /small
Document Length: 1200 bytes
Concurrency Level: 256
Time taken for tests: 17.464 seconds
Complete requests: 100000
Failed requests: 0
Total transferred: 126000000 bytes
HTML transferred: 120000000 bytes
Requests per second: 5726.11 [#/sec] (mean)
Time per request: 44.708 [ms] (mean)
Time per request: 0.175 [ms] (mean, across all concurrent requests)
Transfer rate: 7045.80 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 20 137.8 1 1029
Processing: 4 24 7.8 21 428
Waiting: 0 10 8.5 9 420
Total: 4 44 138.5 23 1452
Percentage of the requests served within a certain time (ms)
50% 23
66% 24
75% 28
80% 30
90% 34
95% 36
98% 45
99% 1032
100% 1452 (longest request)
</code></pre>
<p>2.7.0-preview1 is much worse, relatively speaking.</p> Ruby master - Feature #15997: Improve performance of fiber creation by using pool allocation strategy.https://bugs.ruby-lang.org/issues/15997?journal_id=801302019-07-27T05:57:36Zioquatix (Samuel Williams)samuel@oriontransfer.net
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Closed</i></li><li><strong>Target version</strong> set to <i>2.7</i></li></ul><p>It was merged.</p> Ruby master - Feature #15997: Improve performance of fiber creation by using pool allocation strategy.https://bugs.ruby-lang.org/issues/15997?journal_id=801452019-07-28T02:34:52Zmethodmissing (Lourens Naudé)lourens@bearmetal.eu
<ul><li><strong>File</strong> <a href="/attachments/7920">Screenshot from 2019-07-28 03-31-05.png</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/7920/Screenshot%20from%202019-07-28%2003-31-05.png">Screenshot from 2019-07-28 03-31-05.png</a> added</li></ul><p>ioquatix (Samuel Williams) wrote:</p>
<blockquote>
<p>It was merged.</p>
</blockquote>
<p>Hi Samuel excellent work. I'm just wondering about the 20MB pool size noticeable across a few different application types with the fiber pool changes. And the ruby teardown sequence ideally needs to clean up too.</p>