Project

General

Profile

Feature #6261

Enumerable#emap and Enumerable#egrep

Added by yimutang (Joey Zhou) over 7 years ago. Updated over 7 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:44147]

Description

I was inspired by Ruby 1.9.x`s Enumerable#chunk and #slice_before, which both take a block and return an enumerator. I wish to introduce two new method into the Enumerable core, which can be implemented in Ruby like this:

module Enumerable

def emap # return an enumerator
raise ArgumentError, 'no block given' unless block_given?

Enumerator.new do |yielder|
  self.each do |elem|
    mapped = yield elem
    yielder << mapped
  end
end

end

def egrep
raise ArgumentError, 'no block given' unless block_given?

Enumerator.new do |yielder|
  self.each do |elem|
    allowed = yield elem
    yielder << elem if allowed
  end
end

end

end

#emap + #to_a is just like #map / #collect, #egrep + #to_a is just like #select. Why I think it's necessary to introduce those methods? Because #collect and #select sometimes are not effecient. Here's an weird example:

lines = File.foreach('a_very_large_file')
.egrep {|line| line.length < 10 }
.emap {|line| line.chomp!; line }
.each_slice(3)
.emap {|lines| lines.join(';').downcase }
.take_while {|line| line.length > 20 }

The above code means: from 'a_very_large_file' take each line, let go whose length < 10, chomp each allowed line, take 3 of them as a group and join them, at last, stop when the length of joined line has length less than 20.

If you replace #egrep with #select, #emap with #collect, you must iterate the whole lines of 'a_very_large_file' and create a temporary array, 3 times! It is not efficient in this situation, because the #take_while means 'I do not want to check all lines'.

If you want to omit the #select and #collect, just do it like:

File.foreach('a_very_large_file') do |line|
# blah blah to achieve the same goal
end

I'm afraid it's hard to make the code clear at a glance.

So you may see #egrep and #emap are very useful.

Another example, I want to make a class FreqDist, which records the frequency distribution of a population of samples.

class FreqDist

def initialize(samples)
@sample_dict = Hash.new(0)
samples.each {|sample| @sample_dict[sample] += 1 }
end

end

I want to use FreqDist to store the frequency distribution of a list of words, but there is case problem, 'When' and 'when' should not be regard as two sample. I can do it like this:

fd = FreqDist.new(words.emap {|w| w.downcase })

use an enumerator instead of an array as argument, iterate once, no temporary array.

Well, in my opinion, such #emap and #egrep are very powerful. Although I can implement them in Ruby and put them in a custom gem, I think it's better to introduce them into the core Enumerable module.

Please consider the suggestion. Thank you!

History

Updated by Eregon (Benoit Daloze) over 7 years ago

Hello,

This should already be possible with the recent Enumerator::Lazy (in trunk), just drop a .lazy after the File.foreach and use usual select,map,...:

lines = File.foreach('a_very_large_file').lazy
.select {|line| line.length < 10 }
.map {|line| line.chomp!; line }
.each_slice(3)
.map {|lines| lines.join(';').downcase }
.take_while {|line| line.length > 20 }

The same goes for the second example: words.lazy.map(&:downcase).

Be aware it's not always faster (although likely taking less memory), this is a trade-off.

Updated by matz (Yukihiro Matsumoto) over 7 years ago

  • Status changed from Open to Rejected

use Enumerable#lazy.

Matz.

Also available in: Atom PDF