Feature #11090
closedEnumerable#each_uniq and #each_uniq_by
Description
currently if you want to iterate the first uniq elements you either need to call uniq
and create a big array or you need to count the elements yourself
if you have an Enumerable
with an indifferent size (maybe something with cycle
or something you cant rewind
) calling the Array#uniq
method might not what you want.
the idea is adding each_uniq
which does only iterate though the elements which are not already send (it does count for you)
a second each_uniq_by
does work similar with chunk and does takes a block using a generated Enumerator
IceDragon200 did make the following gist/sample in ruby, it might be written in C later to make it faster/better. https://gist.github.com/IceDragon200/5b1c205b4b38665c308e for better view i also added it as attachment.
Files
Updated by prijutme4ty (Ilya Vorontsov) over 8 years ago
- Assignee set to nobu (Nobuyoshi Nakada)
Why introduce one more method if we can just implement #uniq
(with or without block, sticking to Array#uniq
semantics) for Enumerable
and Enumerator::Lazy
? With Enumerator::Lazy
we do not need to create an array.
module Enumerable
def uniq
result = []
uniq_map = {}
if block_given?
each do |value|
key = yield value
next if uniq_map.has_key?(key)
uniq_map[key] = true
result << value
end
else
each do |value|
next if uniq_map.has_key?(value)
uniq_map[value] = true
result << value
end
end
result
end
end
class Enumerator::Lazy
def uniq
uniq_map = {}
if block_given?
Enumerator::Lazy.new(self) do |yielder, value|
key = yield value
next if uniq_map.has_key?(key)
uniq_map[key] = true
yielder << value
end
else
Enumerator::Lazy.new(self) do |yielder, value|
next if uniq_map.has_key?(value)
uniq_map[value] = true
yielder << value
end
end
end
end
olimpics = {1896 => 'Athens', 1900 => 'Paris', 1904 => 'Chikago', 1906 => 'Athens', 1908 => 'Rome'}
each_city_first_time = olimpics.uniq{|k,v| v }
# [[1896, "Athens"], [1900, "Paris"], [1904, "Chikago"], [1908, "Rome"]]
(1..Float::INFINITY).lazy.uniq{|x| (x**2) % 10 }.first(6)
# => [1, 2, 3, 4, 5, 10]
While I propose another solution for the problem, I'm totally agree that we need a way to work with unique elements of collections without creating intermediate array. In heavy data processing it is a very common problem.
Updated by matz (Yukihiro Matsumoto) about 8 years ago
As Ilya proposed, Enumerable#uniq and Enumerable::Lazy#uniq is reasonable.
Matz.
Updated by nobu (Nobuyoshi Nakada) about 8 years ago
- Status changed from Open to Closed
Applied in changeset r55709.
enum.c: Enumerable#uniq
- enum.c (enum_uniq): new method Enumerable#uniq.
[Feature #11090]
Updated by shyouhei (Shyouhei Urabe) over 5 years ago
- Related to Feature #1153: Enumerable#uniq added