Project

General

Profile

Actions

Bug #17263

closed

Fiber context switch degrades with number of fibers, limit on number of fibers

Added by ciconia (Sharon Rosner) over 3 years ago. Updated 7 months ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:100401]

Description

I'm working on developing Polyphony, a Ruby gem for writing
highly-concurrent Ruby programs with fibers. In the course of my work I have
come up against two problems using Ruby fibers:

  1. Fiber context switching performance seem to degrade as the number of fibers
    is increased. This is both with Fiber#transfer and
    Fiber#resume/Fiber.yield.
  2. The number of concurrent fibers that can exist at any time seems to be
    limited. Once a certain number is reached (on my system this seems to be
    31744 fibers), calling Fiber#transfer will raise a FiberError with the
    message can't set a guard page: Cannot allocate memory. This is not due to
    RAM being saturated. With 10000 fibers, my test program hovers at around 150MB
    RSS (on Ruby 2.7.1).

Here's a program for testing the performance of Fiber#transfer:

# frozen_string_literal: true

require 'fiber'

class Fiber
  attr_accessor :next
end

def run(num_fibers)
  count = 0

  GC.start
  GC.disable

  first = nil
  last = nil
  supervisor = Fiber.current
  num_fibers.times do
    fiber = Fiber.new do
      loop do
        count += 1
        if count == 1_000_000
          supervisor.transfer
        else
          Fiber.current.next.transfer
        end
      end
    end
    first ||= fiber
    last.next = fiber if last
    last = fiber
  end

  last.next = first
  
  t0 = Time.now
  first.transfer
  elapsed = Time.now - t0

  rss = `ps -o rss= -p #{Process.pid}`.to_i

  puts "fibers: #{num_fibers} rss: #{rss} count: #{count} rate: #{count / elapsed}"
rescue Exception => e
  puts "Stopped at #{count} fibers"
  p e
end

run(100)
run(1000)
run(10000)
run(100000)

With Ruby 2.6.5 I'm getting:

fibers: 100 rss: 23212 count: 1000000 rate: 3357675.1688139187
fibers: 1000 rss: 31292 count: 1000000 rate: 2455537.056439736
fibers: 10000 rss: 127388 count: 1000000 rate: 954251.1674325482
Stopped at 22718 fibers
#<FiberError: can't set a guard page: Cannot allocate memory>

With Ruby 2.7.1 I'm getting:

fibers: 100 rss: 23324 count: 1000000 rate: 3443916.967616508
fibers: 1000 rss: 34676 count: 1000000 rate: 2333315.3862491543
fibers: 10000 rss: 151364 count: 1000000 rate: 916772.1008060966
Stopped at 31744 fibers
#<FiberError: can't set a guard page: Cannot allocate memory>

With ruby-head I get an almost identical result to that of 2.7.1.

As you can see, the performance degradation is similar in all the three versions
of Ruby, going from ~3.4M context switches per second for 100 fibers to less
then 1M context switches per second for 10000 fibers. Running with 100000 fibers
fails to complete.

Here's a program for testing the performance of Fiber#resume/Fiber.yield:

# frozen_string_literal: true

require 'fiber'

class Fiber
  attr_accessor :next
end

# This program shows how the performance of Fiber.transfer degrades as the fiber
# count increases

def run(num_fibers)
  count = 0

  GC.start
  GC.disable

  fibers = []
  num_fibers.times do
    fibers << Fiber.new { loop { Fiber.yield } }
  end

  t0 = Time.now

  while count < 1000000
    fibers.each do |f|
      count += 1
      f.resume
    end
  end

  elapsed = Time.now - t0

  puts "fibers: #{num_fibers} count: #{count} rate: #{count / elapsed}"
rescue Exception => e
  puts "Stopped at #{count} fibers"
  p e
end

run(100)
run(1000)
run(10000)
run(100000)

With Ruby 2.7.1 I'm getting the following output:

fibers: 100 count: 1000000 rate: 3048230.049946255
fibers: 1000 count: 1000000 rate: 2362235.6455160403
fibers: 10000 count: 1000000 rate: 950251.7621725246
Stopped at 21745 fibers
#<FiberError: can't set a guard page: Cannot allocate memory>

As I understand it, theoretically at least switching between fibers should have
a constant cost in terms of CPU cycles, irrespective of the number of fibers
currently existing in memory. I am completely ignorant the implementation
details of Ruby fibers, so at least for now I don't have any idea where this
problem is coming from.


Files

clipboard-202308251514-grqb1.png (81.3 KB) clipboard-202308251514-grqb1.png ioquatix (Samuel Williams), 08/25/2023 03:15 AM
clipboard-202308251514-r7g4l.png (81 KB) clipboard-202308251514-r7g4l.png ioquatix (Samuel Williams), 08/25/2023 03:15 AM
clipboard-202308251538-kmofk.png (13.8 KB) clipboard-202308251538-kmofk.png ioquatix (Samuel Williams), 08/25/2023 03:38 AM
flamegraph_make_many_fibers.png (471 KB) flamegraph_make_many_fibers.png kjtsanaktsidis (KJ Tsanaktsidis), 09/18/2023 08:21 AM
cache_misses_vs_time.png (42.5 KB) cache_misses_vs_time.png kjtsanaktsidis (KJ Tsanaktsidis), 09/18/2023 08:21 AM
Actions

Also available in: Atom PDF

Like1
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like1