Project

General

Profile

Actions

Feature #15997

closed

Improve performance of fiber creation by using pool allocation strategy.

Added by ioquatix (Samuel Williams) over 4 years ago. Updated over 4 years ago.

Status:
Closed
Target version:
[ruby-core:93695]

Description

https://github.com/ruby/ruby/pull/2224

This PR improves the performance of fiber allocation and reuse by implementing a better stack cache.

The fiber pool manages a singly linked list of fiber pool allocations. The fiber pool allocation contains 1 or more stack (typically more, e.g. 512). It uses N^2 allocation strategy, starting at 8 initial stacks, next is 8, 16, 32, etc.

//
// base = +-------------------------------+-----------------------+  +
//        |VM Stack       |VM Stack       |                       |  |
//        |               |               |                       |  |
//        |               |               |                       |  |
//        +-------------------------------+                       |  |
//        |Machine Stack  |Machine Stack  |                       |  |
//        |               |               |                       |  |
//        |               |               |                       |  |
//        |               |               | .  .  .  .            |  |  size
//        |               |               |                       |  |
//        |               |               |                       |  |
//        |               |               |                       |  |
//        |               |               |                       |  |
//        |               |               |                       |  |
//        +-------------------------------+                       |  |
//        |Guard Page     |Guard Page     |                       |  |
//        +-------------------------------+-----------------------+  v
//
//        +------------------------------------------------------->
//
//                                  count
//

The performance improvement depends on usage:

Calculating -------------------------------------
                     compare-ruby  built-ruby 
  vm2_fiber_allocate     132.900k    180.852k i/s -    100.000k times in 0.752447s 0.552939s
     vm2_fiber_count       5.317k    110.724k i/s -    100.000k times in 18.806479s 0.903145s
     vm2_fiber_reuse      160.128     347.663 i/s -     200.000 times in 1.249003s 0.575269s
    vm2_fiber_switch      13.429M     13.490M i/s -     20.000M times in 1.489303s 1.482549s

Comparison:
               vm2_fiber_allocate
          built-ruby:    180851.6 i/s 
        compare-ruby:    132899.7 i/s - 1.36x  slower

                  vm2_fiber_count
          built-ruby:    110724.3 i/s 
        compare-ruby:      5317.3 i/s - 20.82x  slower

                  vm2_fiber_reuse
          built-ruby:       347.7 i/s 
        compare-ruby:       160.1 i/s - 2.17x  slower

                 vm2_fiber_switch
          built-ruby:  13490282.4 i/s 
        compare-ruby:  13429100.0 i/s - 1.00x  slower

This test is run on Linux server with 64GB memory and 4-core Xeon (Intel Xeon CPU E3-1240 v6 @ 3.70GHz). "compare-ruby" is master, and "built-ruby" is master+fiber-pool.

Additionally, we conservatively use madvise(free) to avoid swap space usage for unused fiber stacks. However, if you remove this requirement, we can get 6x - 10x performance improvement in vm2_fiber_reuse benchmark. There are some options to deal with this (e.g. moving it to GC.compact) but as this is still a net win, I'd like to merge this PR as is.


Files

Screen Shot 2019-07-16 at 8.30.59 PM.png (138 KB) Screen Shot 2019-07-16 at 8.30.59 PM.png ioquatix (Samuel Williams), 07/16/2019 08:31 AM
Screenshot from 2019-07-28 03-31-05.png (226 KB) Screenshot from 2019-07-28 03-31-05.png methodmissing (Lourens Naudé), 07/28/2019 02:34 AM
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0