Project

General

Profile

Actions

Feature #21720

open

Add a native Binary Heap / Priority Queue to Ruby's Standard Library (heapify, heappush, heappop)

Feature #21720: Add a native Binary Heap / Priority Queue to Ruby's Standard Library (heapify, heappush, heappop)

Added by rahulk501 (Rahul Kumar) 6 months ago. Updated 5 days ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:123958]

Description

Title: Add a native Binary Heap / Priority Queue to Ruby's Standard Library (heapify, heappush, heappop)

Problem

Ruby currently lacks a native binary heap / priority queue data structure in the standard library.
Users frequently reimplement heaps manually, or install third-party gems, simply to access common operations such as:

  • heapify
  • heappush
  • heappop
  • efficient priority queues

Meanwhile, languages like Python provide a standard heapq module which is widely used in algorithms, AI, scheduling, job queues, pathfinding, graph traversal, and competitive programming.

Because Ruby does not include a heap implementation, developers often write ad-hoc, buggy, or inconsistent versions of a heap. This leads to fragmentation and duplication.

Proposal

Introduce a lightweight Heap or BinaryHeap class to Ruby’s standard library with the following minimal API:

heap = Heap.new(array = [])
heap.push(value)
heap.pop       # returns smallest (min-heap)
heap.peek
heap.size

Or a module-based functional API similar to Python’s heapq:

Heap.heapify(array)
Heap.heappush(array, value)
Heap.heappop(array)

Both designs are simple and allow backward-compatible extension.

Why include it in the standard library?

  1. Heaps are a fundamental data structure—used in sorting, algorithms, scheduling, and timers.
  2. Many languages include them natively (Python, C++ via STL priority_queue, Java, Go, Rust BinaryHeap).
  3. Adding it reduces unnecessary reimplementations in user codebases.
  4. A small heap implementation adds little maintenance burden.
  5. Existing Ruby solutions (gems like algorithms, priority_queue, custom code) are not universally available and often exceed what users need.

Reference Implementation (pure Ruby, minimal example)

module Heap
  def self.heapify(arr)
    (arr.length / 2 - 1).downto(0) { |i| sift_down(arr, i) }
    arr
  end

  def self.heappush(arr, value)
    arr << value
    sift_up(arr, arr.length - 1)
  end

  def self.heappop(arr)
    return nil if arr.empty?
    arr[0], arr[-1] = arr[-1], arr[0]
    min = arr.pop
    sift_down(arr, 0)
    min
  end

  def self.sift_up(arr, i)
    while i > 0
      p = (i - 1) / 2
      break if arr[p] <= arr[i]
      arr[p], arr[i] = arr[i], arr[p]
      i = p
    end
  end

  def self.sift_down(arr, i)
    n = arr.length
    loop do
      left = 2 * i + 1
      right = 2 * i + 2
      smallest = i

      smallest = left if left < n && arr[left] < arr[smallest]
      smallest = right if right < n && arr[right] < arr[smallest]

      break if smallest == i
      arr[i], arr[smallest] = arr[smallest], arr[i]
      i = smallest
    end
  end

  private_class_method :sift_up, :sift_down
end

Optional C implementation

If accepted, I am willing to provide a minimal C version for performance parity with other stdlib containers.

Compatibility

No breaking changes expected.
Naming avoids conflict with existing libraries.

Conclusion

Adding a tiny, well-defined heap structure would strengthen Ruby’s built-in algorithmic tools, reduce duplicated code across the ecosystem, and align Ruby with other mainstream languages.

Updated by herwin (Herwin W) 6 months ago Actions #1 [ruby-core:124009]

Heap.heapify(array)
Heap.heappush(array, value)
Heap.heappop(array)

This looks like a very fragile API to me:

Heap.heapify(array)
array << 'something random' # The heap is not encapsulated, so we can perform any Array operation
Heap.heappush(array, 123)   # And this probably breaks, since array is no longer a heap

Updated by rahulk501 (Rahul Kumar) 6 months ago Actions #2 [ruby-core:124055]

herwin (Herwin W) wrote in #note-1:

Heap.heapify(array)
Heap.heappush(array, value)
Heap.heappop(array)

This looks like a very fragile API to me:

Heap.heapify(array)
array << 'something random' # The heap is not encapsulated, so we can perform any Array operation
Heap.heappush(array, 123)   # And this probably breaks, since array is no longer a heap

Update (response to feedback)

Thank you for pointing out the fragility of the module-style API that operates directly on a plain Array.
I agree that exposing the raw array makes it easy to break the heap by performing unrelated operations on it.

To avoid this issue, I'm happy to move the proposal toward an encapsulated class-based design, e.g.:

heap = Heap.new([3, 1, 4])
heap.push(2)
heap.pop

This avoids accidental misuse and keeps the internal structure consistent.

To refine the proposal correctly, I would appreciate guidance on two specific points:

1. Would a dedicated class name like Heap or BinaryHeap be preferred for the standard library?
2. Should the initial implementation support only a min-heap, or should support for max-heaps be included as well?

I’m happy to adjust the proposal based on whichever direction aligns best with Ruby’s design principles.

Updated by baweaver (Brandon Weaver) 5 days ago Actions #3 [ruby-core:125616]

I'm in favor of adding a Heap or PriorityQueue abstraction, but we should frame the argument in terms of what that abstraction affords us and whether or not it would have substantial use.

In terms of efficiency it does certainly afford a lot of gains on problems around the next smallest/largest/prioritized item without requiring constant re-sorting.

For example, scheduled or retryable work:

retries = PriorityQueue.new { |job| job.retry_at }

retries.push(job)

while (job = retries.peek) && job.retry_at <= Time.now
  retries.pop.perform
end

Timeouts and deadlines:

deadlines = PriorityQueue.new { |request| request.deadline }

deadlines.push(request)

while (request = deadlines.peek) && request.deadline <= now
  deadlines.pop.cancel
end

Bounded top-k aggregation:

top = MinHeap.new

scores.each do |score|
  top.push(score)
  top.pop if top.size > 100
end

Shortest-path or best-first traversal:

frontier = PriorityQueue.new { |entry| entry.cost }

frontier.push([0, start])

until frontier.empty?
  cost, node = frontier.pop
  # visit neighbors...
end

These are all possible today, but the usual Ruby alternatives are not ideal:

items.sort_by(&:priority).first

or:

items.sort_by!(&:priority)
items.shift

...or hand-written heap implementations inside application code.

Of the current approaches that are usable from core Ruby they tend to obscure intent, repeatedly sort, or push projects towards their own custom implementations.

Enumerable is a gold-standard for general-purpose collections, but I believe there are collections and traversals that exist beyond it that we need to consider as first-class options in Ruby. Heaps are one such collection, and represent a category of functions related to ordered retrieval which is a real problem.

There is also precedent for specialized collection types such as Set and Thread::Queue that optimize for more efficient operations.

Regarding implementation, I would push for us to consider a more object oriented approach aligned with how Set operates today:

heap = MinHeap[5, 1, 8, 3]

heap.push(2) # or aliased `heap << item`
heap.peek # => 1
heap.pop  # => 1

For arbitrary priorities:

queue = PriorityQueue.new { |job| job.priority }

queue.push(job)
queue.pop

I would prefer the initial API to stay small:

push
pop
peek
size
empty?
self.[]

...with efficient construction from an existing enumerable:

heap = MinHeap[*items]

...or another constructor if splatting large collections is undesirable.

A few design questions seem important:

  • Should this be Heap, MinHeap/MaxHeap, PriorityQueue, or some combination? (I'd argue all)
  • Should arbitrary ordering use a block, <=>, or both? (I'd argue both)
  • Should each expose heap-internal order, priority order, or be omitted initially? (I'd argue hide internal order, favor priority)
  • Should this live in core or stdlib? (I'd argue it should behave like Set)
  • Should it be mutable only, or should there eventually be immutable/persistent variants? (I'd argue short term mutable, long term think this through more as a Ractor primitive)

My preference would be a small mutable collection first, not thread-safe by default, similar in spirit to Array, Hash, and Set. Thread-safe coordination can remain the role of Queue and related concurrency primitives.

To wrap this up I think that there is a lot of value that a native Heap implementation affords us for optimization, especially as we're starting to venture into more concurrency and priority primitives.

Likewise to the OP I'd be up to draft implementations if that helps consideration here.

Actions

Also available in: PDF Atom