Project

General

Profile

Bug #17263

Updated by Eregon (Benoit Daloze) over 3 years ago

I'm working on developing [Polyphony](https://github.com/digital-fabric/polyphony), [Polyphony](), a Ruby gem for writing 
 highly-concurrent Ruby programs with fibers. In the course of my work I have 
 come up against two problems using Ruby fibers: 

 1. Fiber context switching performance seem to degrade as the number of fibers 
    is increased. This is both with `Fiber#transfer` and 
    `Fiber#resume/Fiber.yield`. 
 2. The number of concurrent fibers that can exist at any time seems to be 
    limited. Once a certain number is reached (on my system this seems to be 
    31744 fibers), calling `Fiber#transfer` will raise a `FiberError` with the 
    message `can't set a guard page: Cannot allocate memory`. This is not due to 
    RAM being saturated. With 10000 fibers, my test program hovers at around 150MB 
    RSS (on Ruby 2.7.1). 

 Here's a program for testing the performance of `Fiber#transfer`: 

 ```ruby 
 # frozen_string_literal: true 

 require 'fiber' 

 class Fiber 
   attr_accessor :next 
 end 

 def run(num_fibers) 
   count = 0 

   GC.start 
   GC.disable 

   first = nil 
   last = nil 
   supervisor = Fiber.current 
   num_fibers.times do 
     fiber = Fiber.new do 
       loop do 
         count += 1 
         if count == 1_000_000 
           supervisor.transfer 
         else 
           Fiber.current.next.transfer 
         end 
       end 
     end 
     first ||= fiber 
     last.next = fiber if last 
     last = fiber 
   end 

   last.next = first 
  
   t0 = Time.now 
   first.transfer 
   elapsed = Time.now - t0 

   rss = `ps -o rss= -p #{Process.pid}`.to_i 

   puts "fibers: #{num_fibers} rss: #{rss} count: #{count} rate: #{count / elapsed}" 
 rescue Exception => e 
   puts "Stopped at #{count} fibers" 
   p e 
 end 

 run(100) 
 run(1000) 
 run(10000) 
 run(100000) 
 ``` 

 With Ruby 2.6.5 I'm getting: 

 ``` 
 fibers: 100 rss: 23212 count: 1000000 rate: 3357675.1688139187 
 fibers: 1000 rss: 31292 count: 1000000 rate: 2455537.056439736 
 fibers: 10000 rss: 127388 count: 1000000 rate: 954251.1674325482 
 Stopped at 22718 fibers 
 #<FiberError: can't set a guard page: Cannot allocate memory> 
 ``` 

 With Ruby 2.7.1 I'm getting: 

 ``` 
 fibers: 100 rss: 23324 count: 1000000 rate: 3443916.967616508 
 fibers: 1000 rss: 34676 count: 1000000 rate: 2333315.3862491543 
 fibers: 10000 rss: 151364 count: 1000000 rate: 916772.1008060966 
 Stopped at 31744 fibers 
 #<FiberError: can't set a guard page: Cannot allocate memory> 
 ``` 

 With ruby-head I get an almost identical result to that of 2.7.1. 

 As you can see, the performance degradation is similar in all the three versions 
 of Ruby, going from ~3.4M context switches per second for 100 fibers to less 
 then 1M context switches per second for 10000 fibers. Running with 100000 fibers 
 fails to complete. 

 Here's a program for testing the performance of `Fiber#resume/Fiber.yield`: 

 ```ruby 
 # frozen_string_literal: true 

 require 'fiber' 

 class Fiber 
   attr_accessor :next 
 end 

 # This program shows how the performance of Fiber.transfer degrades as the fiber 
 # count increases 

 def run(num_fibers) 
   count = 0 

   GC.start 
   GC.disable 

   fibers = [] 
   num_fibers.times do 
     fibers << Fiber.new { loop { Fiber.yield } } 
   end 

   t0 = Time.now 

   while count < 1000000 
     fibers.each do |f| 
       count += 1 
       f.resume 
     end 
   end 

   elapsed = Time.now - t0 

   puts "fibers: #{num_fibers} count: #{count} rate: #{count / elapsed}" 
 rescue Exception => e 
   puts "Stopped at #{count} fibers" 
   p e 
 end 

 run(100) 
 run(1000) 
 run(10000) 
 run(100000) 
 ``` 

 With Ruby 2.7.1 I'm getting the following output: 

 ``` 
 fibers: 100 count: 1000000 rate: 3048230.049946255 
 fibers: 1000 count: 1000000 rate: 2362235.6455160403 
 fibers: 10000 count: 1000000 rate: 950251.7621725246 
 Stopped at 21745 fibers 
 #<FiberError: can't set a guard page: Cannot allocate memory> 
 ``` 

 As I understand it, theoretically at least switching between fibers should have 
 a constant cost in terms of CPU cycles, irrespective of the number of fibers 
 currently existing in memory. I am completely ignorant the implementation 
 details of Ruby fibers, so at least for now I don't have any idea where this 
 problem is coming from.

Back