Feature #11090
closedEnumerable#each_uniq and #each_uniq_by
Description
currently if you want to iterate the first uniq elements you either need to call uniq and create a big array or you need to count the elements yourself
if you have an Enumerable with an indifferent size (maybe something with cycle or something you cant rewind) calling the Array#uniq method might not what you want.
the idea is adding each_uniq which does only iterate though the elements which are not already send (it does count for you)
a second each_uniq_by does work similar with chunk and does takes a block using a generated Enumerator
IceDragon200 did make the following gist/sample in ruby, it might be written in C later to make it faster/better. https://gist.github.com/IceDragon200/5b1c205b4b38665c308e for better view i also added it as attachment.
Files
Updated by prijutme4ty (Ilya Vorontsov) over 9 years ago
- Assignee set to nobu (Nobuyoshi Nakada)
Why introduce one more method if we can just implement #uniq (with or without block, sticking to Array#uniq semantics) for Enumerable and Enumerator::Lazy? With Enumerator::Lazy we do not need to create an array.
module Enumerable
def uniq
result = []
uniq_map = {}
if block_given?
each do |value|
key = yield value
next if uniq_map.has_key?(key)
uniq_map[key] = true
result << value
end
else
each do |value|
next if uniq_map.has_key?(value)
uniq_map[value] = true
result << value
end
end
result
end
end
class Enumerator::Lazy
def uniq
uniq_map = {}
if block_given?
Enumerator::Lazy.new(self) do |yielder, value|
key = yield value
next if uniq_map.has_key?(key)
uniq_map[key] = true
yielder << value
end
else
Enumerator::Lazy.new(self) do |yielder, value|
next if uniq_map.has_key?(value)
uniq_map[value] = true
yielder << value
end
end
end
end
olimpics = {1896 => 'Athens', 1900 => 'Paris', 1904 => 'Chikago', 1906 => 'Athens', 1908 => 'Rome'}
each_city_first_time = olimpics.uniq{|k,v| v }
# [[1896, "Athens"], [1900, "Paris"], [1904, "Chikago"], [1908, "Rome"]]
(1..Float::INFINITY).lazy.uniq{|x| (x**2) % 10 }.first(6)
# => [1, 2, 3, 4, 5, 10]
While I propose another solution for the problem, I'm totally agree that we need a way to work with unique elements of collections without creating intermediate array. In heavy data processing it is a very common problem.
Updated by matz (Yukihiro Matsumoto) over 9 years ago
As Ilya proposed, Enumerable#uniq and Enumerable::Lazy#uniq is reasonable.
Matz.
Updated by nobu (Nobuyoshi Nakada) over 9 years ago
- Status changed from Open to Closed
Applied in changeset r55709.
enum.c: Enumerable#uniq
- enum.c (enum_uniq): new method Enumerable#uniq.
[Feature #11090]
Updated by shyouhei (Shyouhei Urabe) over 6 years ago
- Related to Feature #1153: Enumerable#uniq added