Project

General

Profile

Actions

Misc #20509

open

Document importance of #to_ary and #to_hash for Array#== and Hash#==

Added by gettalong (Thomas Leitner) 12 months ago. Updated 10 months ago.

Status:
Open
Assignee:
-
[ruby-core:118012]

Description

Both Array#== and Hash#== provide special behaviour in case the other argument is not an Array/Hash but defines the special #to_ary/#to_hash methods. Those methods are never called, they are just checked for existence. And if they exist, other#== is called to allow the other argument to decide whether the two objects are equal.

I think this is worth mentioning in the documentation for Array#== and Hash#==.

[Background: In my PDF library HexaPDF I have defined two classes PDFArray and Dictionary which act like Array and Hash but provide special PDF specific behaviour. For PDFArray I defined the #to_ary method but for Dictionary just the #to_h method. I have come across a bug where comparing Arrays with PDFArrays just works as it should be but comparing Hashes with Dictionaries doesn't due to the absence of #to_hash (it seems I removed Dictionary#to_hash in 2017 due to problems with automatic destructuring when passing a Dictionary as argument; from what I see that should be no problem anymore, so I will just add it back).]

Updated by matz (Yukihiro Matsumoto) 10 months ago

It is intentional behavior. Usually, having to_ary means the object must behave as an array. If the object a is an array and the object b responds to to_ary, I expect the same result from b == a and a == b.to_ary. We use the former to reduce unnecessary object allocation.

Same for to_hash respectively.

Maybe we should document this expectation clearly in the reference.

Matz.

Updated by Dan0042 (Daniel DeLorme) 10 months ago ยท Edited

Maybe we should document this expectation clearly in the reference.

Definitely, because this is all new and surprising to me even with 20+ years of ruby experience.

o = Object.new
def o.to_ary
  [1,2,3]
end
[1,2,3] == o        #=>false 
[1,2,3] == o.to_ary #=>true
#I expected these two expressions to be equivalent

So when we define #to_ary we also should define #==
It's the first time I hear about this.
It's also rather inconvenient, and imho not worth the benefit of "reduce unnecessary object allocation", but that's another story.

Updated by mame (Yusuke Endoh) 10 months ago

As you may know, but for the record, I add just some background.

The rationale for to_ary is a compromise between duck typing and efficiency.
Following the principles of duck typing, a method that accepts an Array should access its argument via methods such as #size, #[], #each, etc. However, this is too inefficient for builtin methods that are implemented in C. So, C methods access their arguments in a way that depends on the implementation details of the Array, such as RARRAY_LEN. Then, the C method cannot accept an non-Array object that behaves as an Array.

So, the protocol of to_ary was introduced: an object that behaves as an Array implements #to_ary; C methods that accept an Array will attempt to convert them using #to_ary if the argument is not an Array. This allows both duck typing and efficiency.

This means that an object that implements #to_ary is supposed to have their other methods also behave Array-compatible to a reasonable extent.

I have heard this several times from matz, but am not sure if it is documented. At least it wasn't in doc/implicit_conversion.rdoc. I think this should be added.

In the case of this ticket, this to_ary protocol is applied in a different way. Array#==(other) delegates to other == self, because the fact that other implements #to_ary implies that other's == must behave as Array#==. This is because calling to_ary is likely to generate an array, but other's == may be implemented to compare arrays without generating an array.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0