Feature #6261
closedEnumerable#emap and Enumerable#egrep
Description
I was inspired by Ruby 1.9.x`s Enumerable#chunk and #slice_before, which both take a block and return an enumerator. I wish to introduce two new method into the Enumerable core, which can be implemented in Ruby like this:
module Enumerable
def emap # return an enumerator
raise ArgumentError, 'no block given' unless block_given?
Enumerator.new do |yielder|
self.each do |elem|
mapped = yield elem
yielder << mapped
end
end
end
def egrep
raise ArgumentError, 'no block given' unless block_given?
Enumerator.new do |yielder|
self.each do |elem|
allowed = yield elem
yielder << elem if allowed
end
end
end
end
#emap + #to_a is just like #map / #collect, #egrep + #to_a is just like #select. Why I think it's necessary to introduce those methods? Because #collect and #select sometimes are not effecient. Here's an weird example:
lines = File.foreach('a_very_large_file')
.egrep {|line| line.length < 10 }
.emap {|line| line.chomp!; line }
.each_slice(3)
.emap {|lines| lines.join(';').downcase }
.take_while {|line| line.length > 20 }
The above code means: from 'a_very_large_file' take each line, let go whose length < 10, chomp each allowed line, take 3 of them as a group and join them, at last, stop when the length of joined line has length less than 20.
If you replace #egrep with #select, #emap with #collect, you must iterate the whole lines of 'a_very_large_file' and create a temporary array, 3 times! It is not efficient in this situation, because the #take_while means 'I do not want to check all lines'.
If you want to omit the #select and #collect, just do it like:
File.foreach('a_very_large_file') do |line|
blah blah to achieve the same goal¶
end
I'm afraid it's hard to make the code clear at a glance.
So you may see #egrep and #emap are very useful.
Another example, I want to make a class FreqDist, which records the frequency distribution of a population of samples.
class FreqDist
def initialize(samples)
@sample_dict = Hash.new(0)
samples.each {|sample| @sample_dict[sample] += 1 }
end
end
I want to use FreqDist to store the frequency distribution of a list of words, but there is case problem, 'When' and 'when' should not be regard as two sample. I can do it like this:
fd = FreqDist.new(words.emap {|w| w.downcase })
use an enumerator instead of an array as argument, iterate once, no temporary array.
Well, in my opinion, such #emap and #egrep are very powerful. Although I can implement them in Ruby and put them in a custom gem, I think it's better to introduce them into the core Enumerable module.
Please consider the suggestion. Thank you!