Project

General

Profile

Feature #21843

Updated by luke-gru (Luke Gruber) 3 months ago

Hi, 

 There's a problem with Ruby's scheduler that I want to solve. 

 # The Problem 

 Since Ruby schedules threads using round-robin inside a Ractor, threads doing I/O after only running for a short time are punished. Once the I/O finishes, they are put on the back of the ready queue and have to wait for max `N * 100ms` where `N` is the number of threads in the queue before it (100ms is the quantum for threads with normal `Thread#priority`). Most ruby threads don't use up their full timeslice (I'm guessing), but they are still punished by CPU hogs. 

 I would like to more fairly run threads that are punished for giving up their timeslice early. My goals for such a scheduler are the following: 

 # The Goal 

 * Threads that give up their timeslice early are rewarded with a priority boost. 
 * Simple approach (no vruntime like Linux's CFS, not too many runqueues, etc) with easy to understand design and implementation. 
 * No starvation. There can be no worst-case scenario where a thread is stuck in a queue. 
 * Make perf. gains for programs with threads that perform mixed workloads of I/O and CPU. 
 * In worst-case scenarios (100% CPU threads with background I/O threads) don't slow down throughput of CPU-intensive threads by a considerable amount. 

 # The Design 

 ### Queues and Dequeue 

 The design is simple and uses 2 queues `high` and `low`. The scheduler tracks the order each thread is put in a queue, and the order is global for both queues. If a thread on the head of the `high` queue was put there before the thread on the head of the `low` queue, the `high` thread is dequeued. If the `low` thread was enqueued first, then this is considered a "priority decision". Half the time during a priority decision, the thread from `high` is dequeued, otherwise `low` is dequeued. 

 ### Enqueue 

 Which queue is a thread placed on when it is ready to run? 

 * When there are no threads in any queue, we always pick `high`. 
 * If we just performed I/O and didn't even use up 1/10 of our timeslice, we are enqueued in `high`. We can do this max 3 times in a row, and only if `Thread#priority >= 0`. 
 * If we were in the low queue, every 4 "enqueues" we check the accumulated timeslice time taken during the last 4 timeslices. If that time is less than 1/2 of our quantum, we are enqueued in `high`. This can only happen if `Thread#priority >= 0`. 
 * Otherwise, we are enqueued in `low`. 

 This design is very conservative, and attempts not to punish CPU-intensive threads. 

 # Testing + Benchmarking 

 I would like to add a suite of tests to `ruby-bench` that would test the scheduler in different scenarios, including scenarios in which it could potentially perform worse than round-robin. This set of benchmarks would be useful going forward when making any changes to the scheduler. I already have several scripts that I've used for testing my draft PR. Benchmark results to come :) 

 # Feature Flags and API 

 I don't know if a feature like this would need to be behind a flag, marked as experimental, etc. Maybe it could even be enabled by `Thread.scheduler = :priority`. 

 # Alternatives 

 I'm open to any ideas and different approaches. I know that prioritizing more fairness in timeslice time is one approach, but we could also simply enqueue threads that are marked `Thread#priority > 0` more often in the `high` queue. Right now their timeslice is increased but they don't get special treatment in the queues. 

 I have a draft branch here: https://github.com/ruby/ruby/compare/master...luke-gruber:ruby:priority_scheduler_simple 

 Let me know your thoughts. 

 Thanks! 

Back