Project

General

Profile

Actions

Feature #5478

open

Add syntax to import Set into core

Added by rkh (Konstantin Haase) about 13 years ago. Updated about 4 years ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:40311]

Description

Set is a central data structure. However, a lot of Ruby developers use arrays for situations where it would be more reasonable to use a set. One reason for that is that it is way easier to use Array than Set at the moment. Another one is that developers are simply not aware that it exists.

I propose to move Set from stdlib to core and possibly add a literal syntax or a method on array for creating a Set.

First class syntax suggestions:

<1, 2, 3>  # might be tricky to parse
#[1, 2, 3] # would collide with comments
$[1, 2, 3]
${1, 2, 3}

Method suggestions:

~[1, 2, 3]
+[1, 2, 3]

Whitespace separated String Sets could look like this:

%w<foo bar blah> # creates an array at the moment 
#w[foo bar blah] # would collide with comments
$w[foo bar blah] # would collide with sending :[] to $w
$w{foo bar blah}

~%w[foo bar blah] # not really shorter than using an array with strings
+%w[foo bar balh] # not really shorter than using an array with strings

Maybe it's ok to not have a whitespace separated syntax. I'm just brainstorming here.

An issue with the method approach is that it would create an Array first to which the message would be sent.

I favor the <1, 2, 3> syntax, possibly without the ability to create a whitespace separated version.

I'd be willing to work on a patch not only for MRI but also for JRuby and Rubinius if you would consider this to be useful although I would need help with the parser.

Updated by kernigh (George Koehler) about 13 years ago

Ruby stdlib already has Set::[], a method that creates a literal set.

irb(main):001:0> require 'set'
=> true
irb(main):002:0> Set[1, 2, 3]
=> #<Set: {1, 2, 3}>

The proposed syntax <1, 2, 3> might be impossible to parse, because it conflicts with < and > operators. Consider this example:

def c(*args); args.length; end

p c <1, 2, 3>
4

Current Ruby (with no set literal syntax) parses this example like so:

p(c() < 1, 2, (3 >
               4))

If <1, 2, 3> became a set literal, then this example would become ambiguous. No parser would know whether this code has a set literal, or calls < and > operators.

Updated by alexeymuranov (Alexey Muranov) about 13 years ago

What about simply { :a, :b, :c } ? It should be obvious to Ruby that this is not a hash.

In fact, how about specifying that internally Set is a form of Hash, and to have a lossless conversion Set->Hash->Set? I do not know the current implementation of Set, but the most natural in my opinion would be the following:
{ :a => nil, :b => nil, :c => nil } # is the set of { :a, :b, :c }.

How about making Ruby interpret { :a, :b, :c => 1 } as an alternative form of { :a => nil, :b => nil, :c => 1}, and view Set as a special Hash, where all values are nil? That is, make ruby understand that { :a, :b, :c => 1 } is a hash, but { :a, :b, :c } is a Set (which inherits from Hash and internally is { :a => nil, :b => nil, :c => nil }).


Update. I've got another idea: maybe the best would be to make Set and Hash inherit from some kind of a BasicHash class? Then Hash would extend BasicHash by, for example #default= and #default_proc= methods, and Set would simply undefine the #[] and #[]= methods, rename #has_key? to #include?, etc...


Update 2011-11-09. A related proposal: how about having a series of BasicClasses like: BasicArray, BasicHash, BasicRange for functionality which is essential to the classes and is unlikely to be questioned or changed in future versions of Ruby. Then to have: class Array < BasicArray ..., class Hash < BasicHash ..., class Set < BasicHash ..., class DirectedRange < BasicRange ... (for ranges 1..3 and 3..1, see discussion in #5534), class OrderedHash < BasicOrderedHash ... (or class OrderedHash < BasicArray ..., see #5123), etc.

Updated by rkh (Konstantin Haase) about 13 years ago

I don't think it's impossible to parse. Ruby also manages to parse p /a/m as regexp rather than p / a / m.

Konstantin

Updated by alexeymuranov (Alexey Muranov) almost 13 years ago

I forgot to say that i really like this proposal. Please, let us have Set in core with {1, "two", 3} literal syntax, mathematicians will be crazy.

Alexey.

Updated by Anonymous almost 13 years ago

+1

Michel Demazure

Updated by headius (Charles Nutter) almost 13 years ago

I must point out that Ruby 1.8.x allowed {:a, :b, :c, :d} as literal Hash syntax. 1.9 removed that syntax. If it is added back again, it could be confusing...and code which used that syntax and failed fast on 1.9 would now create a Set and fail at some arbitrary time in the future.

All that said, I do like this syntax as a way of representing a literal Set.

Updated by josb (Jos Backus) almost 13 years ago

How would one represent an empty Set?
'{}' can't be used as it would be ambiguous.

Updated by jballanc (Joshua Ballanco) almost 13 years ago

Perhaps in the vein of OCaml, we could use {[]} for set literals?

Updated by alexeymuranov (Alexey Muranov) almost 13 years ago

Jos Backus wrote:

How would one represent an empty Set?
'{}' can't be used as it would be ambiguous.

Good point. Need to think about this. I would be in favor of using {} for the empty set and Hash::new, or Hash[], or {}.to_hash for the empty hash. To keep partial compatibility, Set can convert itself to Hash if a #[] or #[]= method is called on it. Actually, can it? Can an object replace itself with an object of another class in Ruby? Also, {} can be an object of some common ancestor class of Hash and Set, if this can help.

I think it should be possible to use a single class as both Set and Hash, but having set methods and hash methods both defined on the same object somehow seems strange to me.

Updated by alexeymuranov (Alexey Muranov) almost 13 years ago

I've let my imagination go free and got another crazy(/stupid) idea: abandon Set, add methods to Hash, and use it as Set.

The main drawback seemed at first to be the name, but the technical term for Ruby's hash is "associative array", so there shall be no abuse of terminology if Hash class is used to represent both sets and associative arrays.

In fact, the #include? instance method already exists in Hash and for "set-like" hashes it works as expected:

{ 1 => nil, 2 => nil, 3 => nil }.include?(2) # => true

To make Hash implement both sets and associative arrays, it can be made to implement first of all sets, while implementing associative arrays can be a byproduct: store associative arrays as sets in which every element is, for example, an instance of Arrow, where

class Arrow
  attr_reader :key
  attr_accessor :value

  def initialize(k, v=nil)
     @key = k
     self.value = v
  end

  def hash
    key.hash
  end
end

This will make the literal syntax consistent:

{} == Hash.new                                                                                 # => true
{1, 2} == { 1 => nil, 2 => nil }                                                         # => *false*
{ 1 => nil, 2 => nil }  == { Arrow.new(1), Arrow.new(2) }                 # => true
{ 1 => "a", 2 => "b" } == { Arrow.new(1, "a"), Arrow.new(2, "b") }   # => true
{ 1, 2 => "b" } == { 1, Arrow.new(2, "b") }                                       # => true
Arrow.new(2, "b") == (2 => "b")                                                          # => true

(Note that this is different from Ruby 1.8 interpretation of {1, 2}.)

This does not make impossible to use the current syntax foo x, y, :option1 => true, :option2 => 0, but there may be an issue: shall foo x, y, (:option1 => true), :option2 => 0 be equivalent to foo x, y, Arrow.new(:option1, true), { :option2 => 0 } or to foo x, y, { :option1 => true, :option2 => 0 }?

Updated by trans (Thomas Sawyer) almost 13 years ago

Do you think perhaps you've allowed a certain notion to lead you to left field? You're talking about shoehorning Set into Hash just so it can have a literal notation akin to the one Mathematicians use. Sorry, but {} is already taken in Ruby. Mathematicians can get over it or use another language. And honestly, what's wrong with:

Set[1,2,3]

Updated by alexeymuranov (Alexey Muranov) almost 13 years ago

Thomas Sawyer wrote:

Do you think perhaps you've allowed a certain notion to lead you to left field? You're talking about shoehorning Set into Hash just so it can have a literal notation akin to the one Mathematicians use. Sorry, but {} is already taken in Ruby. Mathematicians can get over it or use another language. And honestly, what's wrong with:

Set[1,2,3]

But what is wrong with having one class instead of two, if internally they behave mostly the same, and one literal notation instead of two? Anyway, this was just an idea.

Updated by jballanc (Joshua Ballanco) almost 13 years ago

On Sun, Dec 4, 2011 at 1:05 PM, Eero Saynatkari wrote:

On 2011-12-04, at 16:15:00, Alexey Muranov wrote:

But what is wrong with having one class instead of two, if internally
they behave mostly the same, and one literal notation instead of two?
Anyway, this was just an idea.

Hash tables aren't Sets, nor the other way around. I'd like to avoid
messing up data structure definitions any more than they already are.

The only compelling reason for bringing Set into core would be a literal
syntax – and I agree, as a data structure it would be a useful addition and
encourage its usage – but I don't really have anything to contribute to the
particular bikeshed of syntax thereof.

Actually, the bulk of Set's functionality is already built on top of Hash.
Personally, since the ability to create Hashes from comma-delimited
key,value lists has been removed in 1.9, I think reintroducing it to create
Set literals is not the worst idea in the world.

Additionally, implementing Set functionality directly on top of Hash would
remove the unfortunate method call overhead that currently exists in the
Set library (note that the implementation of Set#include? is just a call
through to Hash#include?):

Benchmark.bmbm do |x|
?> h = {}
s = Set.new
10000.times { |i| h[i] = nil; s << i }
x.report("Hash#include?") { 1000000.times { h.include? rand(20000) } }
x.report("Set#include?") { 1000000.times { s.include? rand(20000) } }
end
Rehearsal -------------------------------------------------
Hash#include? 0.370000 0.000000 0.370000 ( 0.370516)
Set#include? 0.440000 0.000000 0.440000 ( 0.436806)
---------------------------------------- total: 0.810000sec

                 user     system      total        real

Hash#include? 0.370000 0.000000 0.370000 ( 0.361795)
Set#include? 0.430000 0.000000 0.430000 ( 0.426560)

Updated by aprescott (Adam Prescott) almost 13 years ago

On Sun, Dec 4, 2011 at 22:34, Joshua Ballanco wrote:

Actually, the bulk of Set's functionality is already built on top of Hash.
Personally, since the ability to create Hashes from comma-delimited
key,value lists has been removed in 1.9, I think reintroducing it to create
Set literals is not the worst idea in the world.

Additionally, implementing Set functionality directly on top of Hash would
remove the unfortunate method call overhead that currently exists in the
Set library (note that the implementation of Set#include? is just a call
through to Hash#include?):

This really does seem to just be an argument via implementation details. I
don't think that's the best approach. The point of implementation details
is that they're just that.

Updated by alexeymuranov (Alexey Muranov) almost 13 years ago

From the point of view of behavior/interface, both sets and associative arrays are just unordered collections, associative arrays being unordered collections of (key,value) pairs, where each key is unique.

(This is why i thought that it might be possible to combine them in one class Hash as they both need hash -table- function.)

Updated by alexeymuranov (Alexey Muranov) almost 13 years ago

If you had enough of my comments on this thread, please tell me stop, i will listen.

Here is my another argument in favor of pure Set as a base class or replacement for Hash: on top of Set (or Hash modified in a way to be able to hold pure sets), one can build not only associative arrays, but also a class (call it Relation) that can naturally hold tables from relational databases. A relation, as well as an associative array, is just a form of a set.

Updated by judofyr (Magnus Holm) almost 13 years ago

On Wed, Dec 7, 2011 at 10:01, Alexey Muranov
wrote:

Issue #5478 has been updated by Alexey Muranov.

If you had enough of my comments on this thread, please tell me stop, i will listen.

Here is my another argument in favor of pure Set as a base class or replacement for Hash: on top of Set (or Hash modified in a way to be able to hold pure sets), one can build not only associative arrays, but also a class (call it Relation) that can naturally hold tables from relational databases.  A relation, as well as an associative array, is just a form of a set.

From a theoretical point of view, a relation has a heading + a set
that holds tuples (arrays):

class Relation
  attr_reader :header, :values

  def inittialize(header, values)
    @header = header
    @values = values
  end
end

Relation.new(%w[id name], Set[[1, 'Magnus'], [2, 'Alexy']])

Of course, there are many other ways to represent it (depending on
your requirements). Sometimes you want to use a set of hashes. I don't
quite see how the merging of Set and Hash makes it easier to represent
relations though. It's still a two-dimensional data-structure, so you
either need a Set that contains a Hash or a tuple (Set, Array/Hash).

Updated by alexeymuranov (Alexey Muranov) almost 13 years ago

Magnus Holm wrote:

From a theoretical point of view, a relation has a heading + a set
that holds tuples (arrays):

class Relation
  attr_reader :header, :values

  def inittialize(header, values)
    @header = header
    @values = values
  end
end

Relation.new(%w[id name], Set[[1, 'Magnus'], [2, 'Alexy']])

Of course, there are many other ways to represent it (depending on
your requirements). Sometimes you want to use a set of hashes. I don't
quite see how the merging of Set and Hash makes it easier to represent
relations though. It's still a two-dimensional data-structure, so you
either need a Set that contains a Hash or a tuple (Set, Array/Hash).

I agree, i was simply pointing out that set is, in my opinion, a more basic data structure than associative array, and both associative arrays and relations can be either built on top of it, or simply represented by sets. In particular, it seems wrong to me that Set uses Hash in implementation, and not the other way around.

As Hash class already exists and is the native class for associative arrays, i proposed a way to transform it into set so that to still be able to use it as an associative array and so that most of the current syntax works. Also this would give a common WYSIWYG :) syntax for both sets and associative arrays: { 1, { 2 => 3 }, 4, {5} }.

I was thinking like this: how to simplify Hash to use it as a base class for both Set and Hash? And i ended up with a data structure that was just a set. Then i thought: how now to build Hash back on top of it? And i decided that there were almost nothing to build, sets can be used as associative arrays, just store (key,value) pairs in them and add methods for fetching and inserting by key. Of course, there need to be a way to set default_value for associative arrays, etc.

I agree with your examples of how to represent Relation, they both use Set as top-level structure. As you pointed out, you can just use a set of associative arrays, in my proposed literal syntax:

{ { "id" => "1", "name" => "Magnus" },
  { "id" => "2", "name" => "Alexey", "country" => "France" } }

This might not be the most efficient way, and you would use instead:

Relation.new(%w[id name country], { [1, "Magnus", nil], [2, "Alexey", "France"] })

Update: As identical symbols do not take up extra space, i think the first implementation of Relation is better:

{ { :id => "1", :name => "Magnus" },
  { :id => "2", :name => "Alexey", :country => "France" } }

It can have a separate attribute for meta-data, like column types.

Updated by rue (Eero Saynatkari) almost 13 years ago

On 2011-12-09, at 12:05:41, Alexey Muranov wrote:

I agree, i was simply pointing out that set is, in my opinion, a more basic data structure than associative array, and both associative arrays and relations can be either built on top of it, or simply represented by sets. In particular, it seems wrong to me that Set uses Hash in implementation, and not the other way around.

A hash table isn't an associative array (also known as a map). The latter can be implemented using the former, as can a set. Perhaps it'd have been better to abstract Hash more from its implementation, but it wasn't.

I don't see any practical benefit in unifying the two, given the assumption that Hash is to remain the default – and fast – data structure to use. Have you run any numbers to see how much of a hit you're going to take from deconstructing Hash?

--
Magic is indistinguishable from insufficiently advanced technology.

Updated by alexeymuranov (Alexey Muranov) over 12 years ago

I think that unifying Hash with Set could also allow to use the splat star operator more intelligently. For example, currently the following does not work:

h = { 1 => 2, 3 => 4}
k = { 0 => 0, *h } # => SyntaxError: unexpected tSTAR, expecting '}'

I would like to see here k = { 0 => 0, 1 => 2, 3 => 4 }.

Such behavior could be nice for merging several hashes:

def f(h); ...; end

h1 = { :a => 1 }; h2 = { :b => 2 }

f({ *h1, *h2, :name => 'foo' })

Instead of:

h = h1.merge(h2)
h[:name] = 'foo'
f(h)

Update: I also think unifying Hash with Set, and maybe with other thing, agrees with Ruby philosophy: if everything is an Object, why not to make every finite collection to be a Set? I know there could be some issues with making Array a Set (probably it would first need to become a Hash), but i do not think there are big issues with making Hash a Set.

There is also the Square-Rectangle inheritance problem with this approach, i do not really know how to resolve it. However, in Ruby already Class is a Module, but a class cannot be included into another class. So Ruby does not look very strict here.

Updated by nobu (Nobuyoshi Nakada) over 12 years ago

Seems off-topic.

h1 = {a: 1}; h2 = {b: 2}
def foo(name: 'unknown', **rest) name; end
foo **h1, **h2, name: "foo" #=> "foo"

Updated by alexeymuranov (Alexey Muranov) over 12 years ago

nobu (Nobuyoshi Nakada) wrote:

Seems off-topic.

h1 = {a: 1}; h2 = {b: 2}
def foo(name: 'unknown', **rest) name; end
foo **h1, **h2, name: "foo" #=> "foo"

Ok nobu, thanks for the idea. Then back to the topic. If combining Hash and Set is somehow unfeasible, the question stays about the literal notation for Set, and if using {1, 2, 3}, how to distinguish the empty set from the empty hash.

I can propose {-} for the empty set.

I think it would be even better to use {} for the empty set and {=>} or {:} for the empty hash, but this would be incompatible with the current use.

Updated by fxn (Xavier Noria) over 12 years ago

I don't know which is the general position of core with respect to non-ASCII in source code, but the natural literal to represent an empty set is ∅.

Updated by trans (Thomas Sawyer) over 12 years ago

I don't see why Hash and Set need to be merged, however. It certainly could be useful for Ruby supported Set out-of-the-box. As it is I think it is not uncommon to just use an Array and pretend as if it is a Set. But Set can be it's own thing.

Updated by alexeymuranov (Alexey Muranov) over 12 years ago

trans (Thomas Sawyer) wrote:

  {/}

Close enough?

Thomas, unfortunately here it looks like two set notations overlap (empty set is either {} or ∅). Then maybe (/) ? :D

Updated by Anonymous about 12 years ago

I am for having Set in the core, with Set[ a, b, c ] and Set.new as basic constructors. I am against mingling it with existing collection classes. I think that the most frequent usec for Set would be order-independent comparison (instad of eg. "array1.sort == array2.sort"). I find { a, b, c } syntax tempting, otherwise I agree on this issue with headius (response #6).

Updated by nathan.f77 (Nathan Broadbent) about 12 years ago

I really like ~[1, 2, 3] as a shortcut for Set.new([1, 2, 3]):

class Array
  def ~@
    Set.new self
  end
end

My other preferences would be:

  1. <1, 2, 3>

  2. {1, 2, 3}, and {/} for Set.new.

Updated by alexeymuranov (Alexey Muranov) over 10 years ago

It has not been mentioned in this thread yet that in Python it is done like this:

empty_set = set()               # => set([])
some_set  = {1, 1, 2, 2, 3, 4}  # => set([1, 2, 3, 4])

Python has no literal for the empty set, but otherwise uses the same syntax for set and dictionary literals.

Edited 2016-02-06

Updated by alexeymuranov (Alexey Muranov) almost 9 years ago

Another syntax idea:

{|1,2,3|}

Updated by duerst (Martin Dürst) almost 9 years ago

It may be a good idea if somebody could provide some actual data/examples for the claim "However, a lot of Ruby developers use arrays for situations where it would be more reasonable to use a set.". E.g. how frequent are sets when compared to real Arrays or Hashes,...

Actions #32

Updated by sawa (Tsuyoshi Sawada) about 4 years ago

  • Subject changed from import Set into core, add syntax to Add syntax to import Set into core
  • Description updated (diff)
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0