Project

General

Profile

Feature #11090

Enumerable#each_uniq and #each_uniq_by

Added by Hanmac (Hans Mackowiak) over 2 years ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Target version:
-
[ruby-core:68969]

Description

currently if you want to iterate the first uniq elements you either need to call uniq and create a big array or you need to count the elements yourself
if you have an Enumerable with an indifferent size (maybe something with cycle or something you cant rewind) calling the Array#uniq method might not what you want.

the idea is adding each_uniq which does only iterate though the elements which are not already send (it does count for you)
a second each_uniq_by does work similar with chunk and does takes a block using a generated Enumerator

IceDragon200 did make the following gist/sample in ruby, it might be written in C later to make it faster/better. https://gist.github.com/IceDragon200/5b1c205b4b38665c308e for better view i also added it as attachment.

each_uniq.rb (830 Bytes) each_uniq.rb Hanmac (Hans Mackowiak), 04/23/2015 07:37 AM

Associated revisions

Revision 55709
Added by nobu (Nobuyoshi Nakada) about 1 year ago

enum.c: Enumerable#uniq

  • enum.c (enum_uniq): new method Enumerable#uniq. [Feature #11090]

Revision 55709
Added by nobu (Nobuyoshi Nakada) about 1 year ago

enum.c: Enumerable#uniq

  • enum.c (enum_uniq): new method Enumerable#uniq. [Feature #11090]

Revision 55710
Added by nobu (Nobuyoshi Nakada) about 1 year ago

enumerator.c: Enumerator::Lazy#uniq

  • enumerator.c (lazy_uniq): new method Enumerator::Lazy#uniq. [Feature #11090]

Revision 55710
Added by nobu (Nobuyoshi Nakada) about 1 year ago

enumerator.c: Enumerator::Lazy#uniq

  • enumerator.c (lazy_uniq): new method Enumerator::Lazy#uniq. [Feature #11090]

Revision 55714
Added by nobu (Nobuyoshi Nakada) about 1 year ago

NEWS: Enumerable#uniq [ci skip]

  • NEWS: mention Enumerable#uniq and Enumerator::Lazy#uniq. [Feature #11090]

Revision 55714
Added by nobu (Nobuyoshi Nakada) about 1 year ago

NEWS: Enumerable#uniq [ci skip]

  • NEWS: mention Enumerable#uniq and Enumerator::Lazy#uniq. [Feature #11090]

Revision 55715
Added by nobu (Nobuyoshi Nakada) about 1 year ago

enum.c: [DOC] Enumerable#uniq [ci skip]

  • enum.c (enum_uniq): add rdoc, reference to Array#uniq. [Feature #11090]

Revision 55715
Added by nobu (Nobuyoshi Nakada) about 1 year ago

enum.c: [DOC] Enumerable#uniq [ci skip]

  • enum.c (enum_uniq): add rdoc, reference to Array#uniq. [Feature #11090]

History

#2 [ruby-core:75475] Updated by prijutme4ty (Ilya Vorontsov) over 1 year ago

  • Assignee set to nobu (Nobuyoshi Nakada)

Why introduce one more method if we can just implement #uniq (with or without block, sticking to Array#uniq semantics) for Enumerable and Enumerator::Lazy? With Enumerator::Lazy we do not need to create an array.

module Enumerable
  def uniq
    result = []
    uniq_map = {}
    if block_given?
      each do |value|
        key = yield value
        next if uniq_map.has_key?(key)
        uniq_map[key] = true
        result << value
      end
    else
      each do |value|
        next if uniq_map.has_key?(value)
        uniq_map[value] = true
        result << value
      end
    end
    result
  end
end

class Enumerator::Lazy
  def uniq
    uniq_map = {}
    if block_given?
      Enumerator::Lazy.new(self) do |yielder, value|
        key = yield value
        next if uniq_map.has_key?(key)
        uniq_map[key] = true
        yielder << value
      end
    else
      Enumerator::Lazy.new(self) do |yielder, value|
        next if uniq_map.has_key?(value)
        uniq_map[value] = true
        yielder << value
      end
    end
  end
end

olimpics = {1896 => 'Athens', 1900 => 'Paris', 1904 => 'Chikago', 1906 => 'Athens', 1908 => 'Rome'}
each_city_first_time = olimpics.uniq{|k,v| v }
# [[1896, "Athens"], [1900, "Paris"], [1904, "Chikago"], [1908, "Rome"]]

(1..Float::INFINITY).lazy.uniq{|x| (x**2) % 10 }.first(6)
# => [1, 2, 3, 4, 5, 10]

While I propose another solution for the problem, I'm totally agree that we need a way to work with unique elements of collections without creating intermediate array. In heavy data processing it is a very common problem.

#3 [ruby-core:76410] Updated by matz (Yukihiro Matsumoto) about 1 year ago

As Ilya proposed, Enumerable#uniq and Enumerable::Lazy#uniq is reasonable.

Matz.

#4 Updated by nobu (Nobuyoshi Nakada) about 1 year ago

  • Status changed from Open to Closed

Applied in changeset r55709.


enum.c: Enumerable#uniq

  • enum.c (enum_uniq): new method Enumerable#uniq. [Feature #11090]

Also available in: Atom PDF