Misc #20509
openDocument importance of #to_ary and #to_hash for Array#== and Hash#==
Description
Both Array#==
and Hash#==
provide special behaviour in case the other
argument is not an Array/Hash but defines the special #to_ary
/#to_hash
methods. Those methods are never called, they are just checked for existence. And if they exist, other#==
is called to allow the other
argument to decide whether the two objects are equal.
I think this is worth mentioning in the documentation for Array#==
and Hash#==
.
[Background: In my PDF library HexaPDF I have defined two classes PDFArray
and Dictionary
which act like Array and Hash but provide special PDF specific behaviour. For PDFArray I defined the #to_ary
method but for Dictionary just the #to_h
method. I have come across a bug where comparing Arrays with PDFArrays just works as it should be but comparing Hashes with Dictionaries doesn't due to the absence of #to_hash
(it seems I removed Dictionary#to_hash
in 2017 due to problems with automatic destructuring when passing a Dictionary as argument; from what I see that should be no problem anymore, so I will just add it back).]
Updated by matz (Yukihiro Matsumoto) 5 months ago
It is intentional behavior. Usually, having to_ary
means the object must behave as an array. If the object a
is an array and the object b
responds to to_ary
, I expect the same result from b == a
and a == b.to_ary
. We use the former to reduce unnecessary object allocation.
Same for to_hash
respectively.
Maybe we should document this expectation clearly in the reference.
Matz.
Updated by Dan0042 (Daniel DeLorme) 5 months ago ยท Edited
Maybe we should document this expectation clearly in the reference.
Definitely, because this is all new and surprising to me even with 20+ years of ruby experience.
o = Object.new
def o.to_ary
[1,2,3]
end
[1,2,3] == o #=>false
[1,2,3] == o.to_ary #=>true
#I expected these two expressions to be equivalent
So when we define #to_ary we also should define #==
It's the first time I hear about this.
It's also rather inconvenient, and imho not worth the benefit of "reduce unnecessary object allocation", but that's another story.
Updated by mame (Yusuke Endoh) 5 months ago
As you may know, but for the record, I add just some background.
The rationale for to_ary
is a compromise between duck typing and efficiency.
Following the principles of duck typing, a method that accepts an Array should access its argument via methods such as #size
, #[]
, #each
, etc. However, this is too inefficient for builtin methods that are implemented in C. So, C methods access their arguments in a way that depends on the implementation details of the Array, such as RARRAY_LEN
. Then, the C method cannot accept an non-Array object that behaves as an Array.
So, the protocol of to_ary
was introduced: an object that behaves as an Array implements #to_ary
; C methods that accept an Array will attempt to convert them using #to_ary
if the argument is not an Array. This allows both duck typing and efficiency.
This means that an object that implements #to_ary
is supposed to have their other methods also behave Array-compatible to a reasonable extent.
I have heard this several times from matz, but am not sure if it is documented. At least it wasn't in doc/implicit_conversion.rdoc. I think this should be added.
In the case of this ticket, this to_ary
protocol is applied in a different way. Array#==(other)
delegates to other == self
, because the fact that other
implements #to_ary
implies that other
's ==
must behave as Array#==
. This is because calling to_ary
is likely to generate an array, but other
's ==
may be implemented to compare arrays without generating an array.