Misc #20509
openDocument importance of #to_ary and #to_hash for Array#== and Hash#==
Description
Both Array#==
and Hash#==
provide special behaviour in case the other
argument is not an Array/Hash but defines the special #to_ary
/#to_hash
methods. Those methods are never called, they are just checked for existence. And if they exist, other#==
is called to allow the other
argument to decide whether the two objects are equal.
I think this is worth mentioning in the documentation for Array#==
and Hash#==
.
[Background: In my PDF library HexaPDF I have defined two classes PDFArray
and Dictionary
which act like Array and Hash but provide special PDF specific behaviour. For PDFArray I defined the #to_ary
method but for Dictionary just the #to_h
method. I have come across a bug where comparing Arrays with PDFArrays just works as it should be but comparing Hashes with Dictionaries doesn't due to the absence of #to_hash
(it seems I removed Dictionary#to_hash
in 2017 due to problems with automatic destructuring when passing a Dictionary as argument; from what I see that should be no problem anymore, so I will just add it back).]
Updated by matz (Yukihiro Matsumoto) 10 months ago
It is intentional behavior. Usually, having to_ary
means the object must behave as an array. If the object a
is an array and the object b
responds to to_ary
, I expect the same result from b == a
and a == b.to_ary
. We use the former to reduce unnecessary object allocation.
Same for to_hash
respectively.
Maybe we should document this expectation clearly in the reference.
Matz.
Updated by Dan0042 (Daniel DeLorme) 10 months ago
ยท Edited
Maybe we should document this expectation clearly in the reference.
Definitely, because this is all new and surprising to me even with 20+ years of ruby experience.
o = Object.new
def o.to_ary
[1,2,3]
end
[1,2,3] == o #=>false
[1,2,3] == o.to_ary #=>true
#I expected these two expressions to be equivalent
So when we define #to_ary we also should define #==
It's the first time I hear about this.
It's also rather inconvenient, and imho not worth the benefit of "reduce unnecessary object allocation", but that's another story.
Updated by mame (Yusuke Endoh) 10 months ago
As you may know, but for the record, I add just some background.
The rationale for to_ary
is a compromise between duck typing and efficiency.
Following the principles of duck typing, a method that accepts an Array should access its argument via methods such as #size
, #[]
, #each
, etc. However, this is too inefficient for builtin methods that are implemented in C. So, C methods access their arguments in a way that depends on the implementation details of the Array, such as RARRAY_LEN
. Then, the C method cannot accept an non-Array object that behaves as an Array.
So, the protocol of to_ary
was introduced: an object that behaves as an Array implements #to_ary
; C methods that accept an Array will attempt to convert them using #to_ary
if the argument is not an Array. This allows both duck typing and efficiency.
This means that an object that implements #to_ary
is supposed to have their other methods also behave Array-compatible to a reasonable extent.
I have heard this several times from matz, but am not sure if it is documented. At least it wasn't in doc/implicit_conversion.rdoc. I think this should be added.
In the case of this ticket, this to_ary
protocol is applied in a different way. Array#==(other)
delegates to other == self
, because the fact that other
implements #to_ary
implies that other
's ==
must behave as Array#==
. This is because calling to_ary
is likely to generate an array, but other
's ==
may be implemented to compare arrays without generating an array.