Project

General

Profile

Actions

Feature #18262

open

Enumerator::Lazy#partition

Added by zverok (Victor Shepelev) 3 months ago. Updated 2 months ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:105750]

Description

(Part of my set of proposals about making .lazy more useful/popular.)

Currently:

file = File.open('very-large-file.txt')
lines_with_errors, lines_without_errors = file.lazy.partition { _1.start_with?('E:') }
lines_with_errors.class
# => Array, all file is read by this moment

This might be not very practical performance-wise and memory-wise.

I am thinking that maybe returning a pair of lazy enumerators might be a good addition to Enumerator::Lazy

Naive prototype:

class Enumerator::Lazy
  def partition(&block)
    buffer1 = []
    buffer2 = []
    source = self

    [
      Enumerator.new { |y|
        loop do
          if buffer1.empty?
            begin
              item = source.next
              if block.call(item)
                y.yield(item)
              else
                buffer2.push(item)
              end
            rescue StopIteration
              break
            end
          else
            y.yield buffer1.shift
          end
        end
      }.lazy,
      Enumerator.new { |y|
        loop do
          if buffer2.empty?
            begin
              item = source.next
              if !block.call(item)
                y.yield(item)
              else
                buffer1.push(item)
              end
            rescue StopIteration
              break
            end
          else
            y.yield buffer2.shift
          end
        end
      }.lazy
    ]
  end
end

Testing it:

Enumerator.produce(1) { |i| puts "processing #{i}"; i + 1 }.lazy
  .take(30)
  .partition(&:odd?)
  .then { |odd, even|
    p odd.first(3), even.first(3)
  }
# Prints:
# processing 1
# processing 2
# processing 3
# processing 4
# processing 5
# [1, 3, 5]
# [2, 4, 6]

As you might notice by the "processing" log, it only fetched the amount of entries that was required by produced enumerators.

The drawback would be—as my prototype implementation shows—the need of internal "buffering" (I don't think it is possible to implement lazy partition without it), but it still might be worth a shot?

Actions

Also available in: Atom PDF