Project

General

Profile

Bug #10984

Hash#contain? to check whether hash contains other hash

Added by olivierlacan (Olivier Lacan) about 2 years ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Target version:
-
ruby -v:
Backport:
[ruby-core:<unknown>]

Description

Comparing hashes seems like a common practice but there currently isn't a method to ask a
hash instance whether it includes another hash instance.

The most intuitive method to reach for would be Hash#include? but it is in fact an alias to Hash#has_key?

What I'm looking for can be achieved with:

class Hash
  def contain?(other)
    self.merge(other) == self
  end
end

Here's a simple demo of #contain? in use:

{ a: true, b: false }.contain?({ a: true})
# => true

{ a: true, b: false }.contain?({ b: false})
# => true

{ a: true, b: false }.contain?({ a: false})
# => false

{ a: true, b: false }.contain?({ c: true})
# => false

One important note is that this method is not checking for nested hash matches.
This may need to be addressed when the parameters include a nested hash perhaps.

Thanks to Terence Lee's help, nobu created a patch for this feature last year.
I've only modified the name of the method from his original patch and attached it to this issue.

Hash#contain_.patch View (2.22 KB) olivierlacan (Olivier Lacan), 03/19/2015 01:54 PM

Associated revisions

Revision 52518
Added by nobu (Nobuyoshi Nakada) over 1 year ago

hash.c: compare methods

  • hash.c (rb_hash_{le,lt,ge,gt}): new methods, Hash#<=, Hash#<, Hash#>=, Hash#>, to test if all elements of a hash are also included in another hash, and vice versa. [Feature #10984]

Revision 52518
Added by nobu (Nobuyoshi Nakada) over 1 year ago

hash.c: compare methods

  • hash.c (rb_hash_{le,lt,ge,gt}): new methods, Hash#<=, Hash#<, Hash#>=, Hash#>, to test if all elements of a hash are also included in another hash, and vice versa. [Feature #10984]

Revision 52518
Added by nobu (Nobuyoshi Nakada) over 1 year ago

hash.c: compare methods

  • hash.c (rb_hash_{le,lt,ge,gt}): new methods, Hash#<=, Hash#<, Hash#>=, Hash#>, to test if all elements of a hash are also included in another hash, and vice versa. [Feature #10984]

Revision 52570
Added by nobu (Nobuyoshi Nakada) over 1 year ago

hash.c: compare methods [ci skip]

  • hash.c (rb_hash_{le,lt,ge,gt}): [DOC] for [Feature #10984]

Revision 52570
Added by nobu (Nobuyoshi Nakada) over 1 year ago

hash.c: compare methods [ci skip]

  • hash.c (rb_hash_{le,lt,ge,gt}): [DOC] for [Feature #10984]

Revision 52570
Added by nobu (Nobuyoshi Nakada) over 1 year ago

hash.c: compare methods [ci skip]

  • hash.c (rb_hash_{le,lt,ge,gt}): [DOC] for [Feature #10984]

History

#1 Updated by sferik (Erik Michaels-Ober) about 2 years ago

I agree that #include? is a more consistent and appropriate name for this method. I would prefer to see that change introduced in Ruby 3.0 than settle for this suboptimal change in Ruby 2.

#2 Updated by olivierlacan (Olivier Lacan) about 2 years ago

Erik Michaels-Ober wrote:

I agree that #include? is a more consistent and appropriate name for this method. I would prefer to see that change introduced in Ruby 3.0 than settle for this suboptimal change in Ruby 2.

From the many conversations I've had with folks after publishing the original proposal for this it seemed that my expectations for Hash#include?'s behavior didn't match many (if not most) people's expectations so it seemed unlikely that such a breaking change would ever be favored.

I don't think a brand new method would be such a bad idea in 2.3 because it would allow for a smoother upgrade path. People could start using Hash#contain? when 2.3 ships and then the internal implementation of Hash#include? could be swapped in 3.0 if that is deemed a good idea later on.

#3 Updated by shevegen (Robert A. Heiler) about 2 years ago

I have no particular pro or contra opinion. I merely wish to think that #include? is a better name because Array also has include? and
String also has include?. While a Hash#include? may not be exactly the same as the other two #includes?, I think the name would be
ok. #contain? feels a bit weird, I mean you could use Array#contain? too (on the other hand, you could use an alias just as .map
vs. .collect; I myself use solely .map, other people can use .collect, things are fine) ok so bottom line from me, I think
Hash#include? is a perfectly suitable name for it too as Erik wrote. Perhaps for Ruby 3.0

#4 Updated by olivierlacan (Olivier Lacan) almost 2 years ago

I didn't notice the notes from DevelopersMeeting20150514Japan before so I'll link to my original musings about Hash#include? which later evolved into this feature/patch: Proposal for a better Ruby Hash#include?

I'm aware this post needs revision and more concrete examples and use cases. I'll try to bring that together soon.

#5 [ruby-core:71321] Updated by olivierlacan (Olivier Lacan) over 1 year ago

  • Assignee set to akr (Akira Tanaka)

Responding to feedback from Akira Tanaka and Nobuyoshi Nakada at DevelopersMeeting20150514Japan

akr: “contain” is too general. “subhash”?

You mean something like this?

{ a: 1, b: 2 }.subhash?({ b: 2 })

Semantically, this feels strange to me. It doesn't seem obvious at all which hash we're checking for a subhash on and I would expect a lot of confusion with a method name like this. Compare to:

{ a: 1, b: 2 }.contains?({ b: 2 })

I believe contains is semantically far more self-evident.

It also seems odd to introduce a sub<class>? method name for this since I'm not aware of any similar method names for classes that would have similar behavior.


n0kada: “contain?” seems similiar to “include?”

It is. Sadly, I've been told repeatedly that it's a bad idea to try to change the behavior of include?. I would prefer replacing the existing include? but I will settle for contains? for now because the meaning of "contain" focuses on what's inside the object under observation and is far more commonly used than "comprise":

contain |kənˈtān|
verb [ with obj. ]

1. have or hold (someone or something) within: coffee cans that once contained a full pound of coffee.
  - be made up of (a number of things); consist of: borscht can contain mainly beets or a number of vegetables.
  - (of a number) be divisible by (a factor) without a remainder.

akr: do we really use? we need concrete examples.

Yes, RSpec has an ad-hoc implementation of this feature in its include matcher: https://github.com/rspec/rspec-expectations/blob/bb731e29f7800f5cef736cf8850293276a0d3f90/lib/rspec/matchers/built_in/include.rb#L94-L97

RSpec has been downloaded 29 Million times on RubyGems. I think this is a legitimate use case. This would simplify not only RSpec's internal code for Hash matchers, but any existing application who depends on this code, for a relatively minimal impact on the core Hash codebase (see provided patch).

I expanded on my original proposal (since then changed from Hash#include? to Hash#contains?) here: http://olivierlacan.com/posts/proposal-for-a-better-ruby-hash-include/

#6 [ruby-core:71328] Updated by funny_falcon (Yura Sokolov) over 1 year ago

What if

 {b: 1} === {a: 2, b: 1}

then

 h = {a: 2, b: 1}
 case h
 when {b: 1}
 puts "got it"
 end

😁😃😄😈

#7 [ruby-core:71407] Updated by matz (Yukihiro Matsumoto) over 1 year ago

Hash#contain? has slight ambiguity problem. I'd vote for adding >=, along with <=.

Matz.

#9 [ruby-core:71419] Updated by olivierlacan (Olivier Lacan) over 1 year ago

Yukihiro Matsumoto wrote:

Hash#contain? has slight ambiguity problem. I'd vote for adding >=, along with <=.

Matz.

Thanks for considering this feature, Matz. :-)

If I understand correctly, the following examples would be correct?

{ a: 1, b: 2 } >=  { b: 2 }
=> true
{ b: 2 } <=  { a: 1, b: 2 }
=> true

# also
{ b: 2 } >=  { b: 2 }
=> true
{ a: 1 } >=  { b: 2 }
=> false

{ b: 2 } <=  { b: 2 }
=> true
{ b: 2 } <=  { a: 1 }
=> false

I think the versatility of this alone would make it even more useful than the one direction method I suggested.

#10 [ruby-core:71426] Updated by nobu (Nobuyoshi Nakada) over 1 year ago

If we'll introduce Hash#<= and Hash#>=, then Hash#< and Hash#> too?

Hash will include Comparable with Hash#<=>?

#11 [ruby-core:71427] Updated by akr (Akira Tanaka) over 1 year ago

Nobuyoshi Nakada wrote:

If we'll introduce Hash#<= and Hash#>=, then Hash#< and Hash#> too?

Maybe.

Usefulness of Hash#< and Hash#> is not discussed well, though.

Hash will include Comparable with Hash#<=>?

No. It is clearly stated by matz.

% ruby -e '
class Hash
  def <=(other)
    self.merge(other) == other
  end
  def >=(other)
    self.merge(other) == self
  end
  def <(other)
    self <= other && self != other
  end
  def >(other)
    self >= other && self != other
  end
end
hs = [{a:1, b:2}, {a:1, b:2, c:3}]
ops = %w[<= >= < >]
ops.each {|op|
  hs.each {|h1|
    hs.each {|h2|
      puts "#{h1} #{op} #{h2} = #{h1.send(op, h2)}"
    }
  }
}
'
{:a=>1, :b=>2} <= {:a=>1, :b=>2} = true
{:a=>1, :b=>2} <= {:a=>1, :b=>2, :c=>3} = true
{:a=>1, :b=>2, :c=>3} <= {:a=>1, :b=>2} = false
{:a=>1, :b=>2, :c=>3} <= {:a=>1, :b=>2, :c=>3} = true
{:a=>1, :b=>2} >= {:a=>1, :b=>2} = true
{:a=>1, :b=>2} >= {:a=>1, :b=>2, :c=>3} = false
{:a=>1, :b=>2, :c=>3} >= {:a=>1, :b=>2} = true
{:a=>1, :b=>2, :c=>3} >= {:a=>1, :b=>2, :c=>3} = true
{:a=>1, :b=>2} < {:a=>1, :b=>2} = false
{:a=>1, :b=>2} < {:a=>1, :b=>2, :c=>3} = true
{:a=>1, :b=>2, :c=>3} < {:a=>1, :b=>2} = false
{:a=>1, :b=>2, :c=>3} < {:a=>1, :b=>2, :c=>3} = false
{:a=>1, :b=>2} > {:a=>1, :b=>2} = false
{:a=>1, :b=>2} > {:a=>1, :b=>2, :c=>3} = false
{:a=>1, :b=>2, :c=>3} > {:a=>1, :b=>2} = true
{:a=>1, :b=>2, :c=>3} > {:a=>1, :b=>2, :c=>3} = false

#12 Updated by nobu (Nobuyoshi Nakada) over 1 year ago

  • Status changed from Open to Closed

Applied in changeset r52518.


hash.c: compare methods

  • hash.c (rb_hash_{le,lt,ge,gt}): new methods, Hash#<=, Hash#<, Hash#>=, Hash#>, to test if all elements of a hash are also included in another hash, and vice versa. [Feature #10984]

#13 [ruby-core:71440] Updated by akr (Akira Tanaka) over 1 year ago

Akira Tanaka wrote:

% ruby -e '
class Hash
  def <=(other)
    self.merge(other) == other
  end
  def >=(other)
    self.merge(other) == self
  end
  def <(other)
    self <= other && self != other
  end
  def >(other)
    self >= other && self != other
  end
end
hs = [{a:1, b:2}, {a:1, b:2, c:3}]
ops = %w[<= >= < >]
ops.each {|op|
  hs.each {|h1|
    hs.each {|h2|
      puts "#{h1} #{op} #{h2} = #{h1.send(op, h2)}"
    }
  }
}
'
{:a=>1, :b=>2} <= {:a=>1, :b=>2} = true
{:a=>1, :b=>2} <= {:a=>1, :b=>2, :c=>3} = true
{:a=>1, :b=>2, :c=>3} <= {:a=>1, :b=>2} = false
{:a=>1, :b=>2, :c=>3} <= {:a=>1, :b=>2, :c=>3} = true
{:a=>1, :b=>2} >= {:a=>1, :b=>2} = true
{:a=>1, :b=>2} >= {:a=>1, :b=>2, :c=>3} = false
{:a=>1, :b=>2, :c=>3} >= {:a=>1, :b=>2} = true
{:a=>1, :b=>2, :c=>3} >= {:a=>1, :b=>2, :c=>3} = true
{:a=>1, :b=>2} < {:a=>1, :b=>2} = false
{:a=>1, :b=>2} < {:a=>1, :b=>2, :c=>3} = true
{:a=>1, :b=>2, :c=>3} < {:a=>1, :b=>2} = false
{:a=>1, :b=>2, :c=>3} < {:a=>1, :b=>2, :c=>3} = false
{:a=>1, :b=>2} > {:a=>1, :b=>2} = false
{:a=>1, :b=>2} > {:a=>1, :b=>2, :c=>3} = false
{:a=>1, :b=>2, :c=>3} > {:a=>1, :b=>2} = true
{:a=>1, :b=>2, :c=>3} > {:a=>1, :b=>2, :c=>3} = false

For the record, this sample implementation was wrong.

It should consider that two hashs may have different values for same key.

% ruby -e '
class Hash
  def <=(other)
    self.merge(other) {|k,v1,v2| v1 } == other
  end
  def >=(other)
    self.merge(other) {|k,v1,v2| v2 } == self
  end
  def <(other)
    self <= other && self != other
  end
  def >(other)
    self >= other && self != other
  end
end
hs = [{a:1}, {a:2}]
ops = %w[<= >= < >]
ops.each {|op|
  hs.each {|h1|
    hs.each {|h2|
      puts "#{h1} #{op} #{h2} = #{h1.send(op, h2)}"
    }
  }
}
'
{:a=>1} <= {:a=>1} = true
{:a=>1} <= {:a=>2} = false
{:a=>2} <= {:a=>1} = false
{:a=>2} <= {:a=>2} = true
{:a=>1} >= {:a=>1} = true
{:a=>1} >= {:a=>2} = false
{:a=>2} >= {:a=>1} = false
{:a=>2} >= {:a=>2} = true
{:a=>1} < {:a=>1} = false
{:a=>1} < {:a=>2} = false
{:a=>2} < {:a=>1} = false
{:a=>2} < {:a=>2} = false
{:a=>1} > {:a=>1} = false
{:a=>1} > {:a=>2} = false
{:a=>2} > {:a=>1} = false
{:a=>2} > {:a=>2} = false

Note that nobu's implementation already committed has no problem.

#14 [ruby-core:71529] Updated by prijutme4ty (Ilya Vorontsov) over 1 year ago

Hello everyone.
I urge to remove Hash comparison methods and to stick to methods like #contain. Or at least to return nil instead of false for comparison of non-comparable hashes. Underlying reasons are strictly mathematical but have far-reaching consequences.
Usually we deal with linearly ordered sets or totally ordered (like usual numbers or string are) i.e. such sets that either a <= b or b <= a for every two elements a and b of a set.
Comparison can be generalized for posets or partially ordered sets. They don't require that any two elements are comparable. Set of hashes is a typical example of a partially ordered set (see "Partial ordered set" or "Hasse diagram" in wikipedia).
One must not implement a <= b for unrelated elements because if such comparison returns any certain result either true or false - then its negation would be counterintuitive. I'm not a proponent of current ruby approach of Class#<=> because ordinary intuition based on everyday use of totally ordered sets suggest that this code would be correct which is definitely false:

if String <= Fixnum
  puts 'String is a Fixnum subclass'
else
  puts 'Fixnum is a String subclass'
end

But at least String <=> Fixnum is neither true or false but nil which allow us to distinguish such situations. nil result is properly handled by Comparable methods like #sort. Thus [String, Fixnum].sort will raise.
So why one can sort this array and which result does one expect?:

[{}, {a:1,b:2}, {c:3}, {a:1}, {b:2}].sort

That's why, I insist, comparison of non-comparable hashes at least must return nil. As a more strict approach one can raise exception when try to compare hashes but it makes the main use-case impractical. But I can't see why one want to deal with such a controversial methods when #contain and #included_by will be enough for this not-so-often task.

As an example of why implementing #<=> for posets is not a good idea, lets consider this typical hand-written qsort implementation.

def qsort(arr)
  return arr  if arr.size <= 1
  pivot = arr[arr.length / 2]
  left = arr.select{|el| el < pivot }
  right = arr.select{|el| el > pivot }
  central = arr.select{|el| el == pivot }
  qsort(left) + central + qsort(right)
end

Okay. Now lets run and see how this "obvious" algorithm loses values.

qsort( [{}, {a:1,b:2}, {c:3}, {a:1}, {b:2}] )
# => [{}, {:c=>3}] 

Surely, sorting is already implemented, but this problem persist in every place where one suggest that a < b, a == b and a > b are the only possible alternatives - thus in almost every if-else pair.

I ask a community think one more time about consequences of such a decision.

Ilya

#15 [ruby-core:71534] Updated by olivierlacan (Olivier Lacan) over 1 year ago

Ilya Vorontsov wrote:

So why one can sort this array and which result does one expect?:

[{}, {a:1,b:2}, {c:3}, {a:1}, {b:2}].sort

This is the actual result with 2.3.0-preview1:

RUBY_VERSION
=> "2.3.0"
 [{}, {a:1,b:2}, {c:3}, {a:1}, {b:2}].sort
ArgumentError: comparison of Hash with Hash failed

Maybe I'm reading you too quickly or I'm too tired but did you note that <=> is not implemented on Hash? Matz made that clear and Akira Tanaka clarified it in this response.

#16 [ruby-core:71536] Updated by prijutme4ty (Ilya Vorontsov) over 1 year ago

I've missed absence of <=> first. Yes, you are right. And it can slightly reduce damage.

But anyway it doesn't resolve issues with misinterpretation of comparison negation. My qsort function is just the one problem which lie on surface. There are lots if (a < b) ...; else ... constructions in lots of codebases.
I'm sure that such behaviour will lead to tons of subtle bugs.

Also It's highly probable that some libraries will be subjected to malicious inputs which will cause infinite recursion bombs or other threats. Here I've slightly changed qsort code using #reject instead of #select. Not a very natural code but it's still correct:

def qsort(arr)
  return arr  if arr.size <= 1
  pivot = arr[arr.length / 2]
  left = arr.reject{|el| el >= pivot }
  right = arr.reject{|el| el <= pivot }
  central = arr.select{|el| el == pivot }
  qsort(left) + central + qsort(right)
end

This code will not reduce size as in my previous example but contrariwise will expand array size and break asymptotical estimations of CPU time.

qsort( [{}, {a:1,b:2}, {c:3}, {a:1}, {b:2}] )
# => [{}, {:b=>2}, {:a=>1}, {:b=>2}, {:a=>1, :b=>2}, {:c=>3}, {:b=>2}, {:a=>1}, {:b=>2}, {:a=>1, :b=>2}]

Look at expansion rate.

qsort( 10.times.map{|i| {i => i} } ).size # => 1023
qsort( 20.times.map{|i| {i => i} } ).size # => 1048575
qsort( 25.times.map{|i| {i => i} } ).size # => 33554431

The last array of 25 single-element hashes is sorted in a minute or so. It can easily hang server.

Ok, I specially wrote code subjected to this attack. But in a large codebase of ruby community there would be places which have similar problems (which are not problems for totally ordered sets like numbers). It can be hard to find and exploit it, but if you did - you have broken the server. Just imagine that some code which earlies just raised an exception due to bad input now ruins your application.

#17 [ruby-core:71569] Updated by prijutme4ty (Ilya Vorontsov) over 1 year ago

  • Tracker changed from Feature to Bug

Akira Tanaka wrote:

% ruby -e '
class Hash
  def <=(other)
    self.merge(other) == other
  end
  def >=(other)
    self.merge(other) == self
  end
  def <(other)
    self <= other && self != other
  end
  def >(other)
    self >= other && self != other
  end
end

This implies that

{a: 1, b: 2} <= {a: 1, b: 3}
{a: 1, b: 3} <= {a: 1, b: 2}

Is that expected? This breaks the main use-case in testing method results.

#18 [ruby-core:71580] Updated by nobu (Nobuyoshi Nakada) over 1 year ago

It's an easy code to show the concept.
Both return false.

#19 [ruby-core:71609] Updated by cvss (Cyril Vechera) over 1 year ago

I second Ilya's opinion regarding partially ordered sets. But propose to implement the comparision similiar to classes - Hash and Class both satisfy reflexive, antisymmetric, and transitive relations. So like the Class, Hash can implement <=>, returning nil on noncomparable argument.

Take a look on classes:

class A; end
class B; end
class BB < B; end

How they are comparing:

> A < B
 => nil
> B < BB
 => false
> BB < B
 => true
> A <=> B
 => nil
> B <=> BB
 => 1

So the similiar approach could be implemented for hashes, with inversed signigication for the < and >, to realize this behavior:

> {a: 1} < {b: 2}
 => nil
> {a: 1} < {a: 1, b: 2}
 => true
> {a: 1; b: 2} < {a: 1}
 => false
> {a: 1} <=> {b: 2}
 => nil
> {a: 1} <=> {a: 1, b: 2}
 => -1

Regarding qsort - partially ordered sets imply topological sorting, not kind of sorting algorithms for strictly ordered sets. So qsort should not work on array of hashes.

Also available in: Atom PDF