Project

General

Profile

Actions

Bug #6087

closed

How should inherited methods deal with return values of their own subclass?

Added by marcandre (Marc-Andre Lafortune) about 12 years ago. Updated over 3 years ago.

Status:
Closed
Target version:
ruby -v:
trunk
Backport:
[ruby-core:42932]
Tags:

Description

Just noticed that we still don't have a consistent way to handle return values:

class A < Array
end
a = A.new
a.flatten.class # => A
a.rotate.class  # => Array
(a * 2).class   # => A
(a + a).class   # => Array

Some methods are even inconsistent depending on their arguments:

a.slice!(0, 1).class # => A
a.slice!(0..0).class # => A
a.slice!(0, 0).class # => Array
a.slice!(1, 0).class # => Array
a.slice!(1..0).class # => Array

Finally, there is currently no constructor nor hook called when making these new copies, so they are never properly constructed.

Imagine this simplified class that relies on @foo holding a hash:

class A < Array
  def initialize(*args)
    super
    @foo = {}
  end

  def initialize_copy(orig)
    super
    @foo = @foo.dup
  end
end
a = A.new.flatten
a.class # => A
a.instance_variable_get(:@foo) # => nil, should never happen

I feel this violates object orientation.

One solution is to always return the base class (Array/String/...).

Another solution is to return the current subclass. To be object oriented, I feel we must do an actual dup of the object, including copying the instance variables, if any, and calling initialize_copy. Exceptions to this would be (1) explicit documentation, e.g. Array#to_a, or (2) methods inherited from a module (like Enumerable methods for Array).

I'll be glad to fix these once there is a decision made on which way to go.


Related issues 1 (0 open1 closed)

Related to Ruby master - Bug #10845: Subclassing StringClosedmatz (Yukihiro Matsumoto)Actions

Updated by trans (Thomas Sawyer) about 12 years ago

I would think these methods should be using self.class.new for constructors thus returning the subclass. Although, that might not always possible.

Updated by marcandre (Marc-Andre Lafortune) about 12 years ago

  • Assignee set to matz (Yukihiro Matsumoto)

Hi,

Thomas Sawyer wrote:

I would think these methods should be using self.class.new for constructors thus returning the subclass. Although, that might not always possible.

This has two problems:

  1. It imposes an API on the constructor of subclasses (i.e. that they accept one parameter which would be an instance of the base class)
  2. The builtin classes constructors doesn't even respect that, i.e.
    Hash.new({1 => 2}).has_key?(1) # => false

--
Marc-André

Updated by marcandre (Marc-Andre Lafortune) about 12 years ago

I apparently forgot to mention that I prefer the second approach, i.e. the equivalent of calling dup on the receiver.

I believe Aaron Patterson seconds this in [ruby-core:43030]

If this approach is accepted, the last remaining question is what of cases of instances of Array/String/... in which instance variables where set using instance_variable_set. Should the instance variables copied over?

b = []
b.instance_variable_set(:@foo, 42)
b.flatten.instance_variable_get(:@foo) # => nil or 42?

I think that to be consistent, they should be copied (again, assuming we decide to return an instance of subclasses). In the discussion of #4136, Charles Nutter thinks it could hinder performance to do so, but I feel that cases where such objects happen to have instance variables set should be extremely rare, so I don't think it would have much effect in practice.

--
Marc-André

Updated by tenderlovemaking (Aaron Patterson) about 12 years ago

Yes, I do second this.

Updated by trans (Thomas Sawyer) about 12 years ago

This has two problems:

  1. It imposes an API on the constructor of subclasses (i.e. that they accept one parameter which would be an instance of the base class)
  2. The builtin classes constructors doesn't even respect that, i.e.
    Hash.new({1 => 2}).has_key?(1) # => false

You took me a bit too literally. I only meant it should be equivalent too calling self.class.new. In other words, it should return an instance of the subclass, not the base class. I did not mean to imply the necessary use of the constructor in this way --which (perhaps unfortunately) is not possible in some notable cases, as you point out.

Actions #6

Updated by shyouhei (Shyouhei Urabe) about 12 years ago

  • Status changed from Open to Assigned

Updated by headius (Charles Nutter) almost 12 years ago

I never noticed this before, so I'm jumping in a couple months late.

Duping the original object or copying its instance vars is wrong. Instance variables are state of an individual object, and should not be carried on to a new object as in these messages. There's no precedent for doing that other than dup'ing, which is explicitly for making a copy of the target object.

flatten et al are not returning "copies"...they're returning new instances with a different arrangement of the same elements. Therefore, those new objects should not automatically inherit instance variables from their parents.

It would be a good idea to design a formal way by which subclasses that want to propagate instance vars to new instances can do so. It just shouldn't be the default.

For the pattern that keeps coming up, where A < Array...you're doing it wrong anyway. Favor composition over inheritance :)

Updated by matz (Yukihiro Matsumoto) almost 12 years ago

  • Target version changed from 2.0.0 to 3.0

Array methods should return consistent values.
But we keep the behavior for now to maintain compatibility.
We will fix this (to consistently return Arrays) in 3.0.

Matz.

Actions #9

Updated by mame (Yusuke Endoh) over 4 years ago

Actions #10

Updated by nobu (Nobuyoshi Nakada) about 4 years ago

  • Description updated (diff)

Updated by mame (Yusuke Endoh) about 4 years ago

Recently this ticket was discussed at dev-meeting, and matz changed his mind. I remember that matz said:

  • A method that seems to return a new array that is directly related to the receiver, should return an instance of the receiver's class.
  • A method that seems to return a new array that is not directly related to the receiver, should return an Array.

So, we need to decide the behavior for each method.

Updated by matz (Yukihiro Matsumoto) about 4 years ago

I used to think methods should honor subclasses, but I changed my mind that the behavior made things too complex.
So if possible I want to make every method return Array instead of instance of a subclass. I just worry about the size of the incompatibility.

Matz.

Updated by matz (Yukihiro Matsumoto) about 4 years ago

Should we do an experiment in 3.0?

Matz

Updated by Eregon (Benoit Daloze) about 4 years ago

Much like all Enumerable methods return Array and (of course) do not copy instance variables, I think Array methods should do the same.

This seems particularly important since Array overrides a few methods from Enumerable for optimization but that should be entirely transparent.
For example, returning a subclass in e.g. Array#map would make it inconsistent with Enumerable#map.

So I'm in favor of no subclass handling here.
We're creating a new instance, and copying the entire state from the receiver doesn't seem reasonable to me.

If people want to keep receiver state like class and @ivars, they can always use mutating methods + #dup if needed.

Updated by ko1 (Koichi Sasada) almost 4 years ago

Eregon (Benoit Daloze) wrote in #note-14:

Much like all Enumerable methods return Array and (of course) do not copy instance variables, I think Array methods should do the same.

+1

Updated by Dan0042 (Daniel DeLorme) almost 4 years ago

  • A method that seems to return a new array that is directly related to the receiver, should return an instance of the receiver's class.
  • A method that seems to return a new array that is not directly related to the receiver, should return an Array.

So this is the old thinking?

I used to think methods should honor subclasses, but I changed my mind that the behavior made things too complex.

And this is the new thinking? In that case +1

If a subclass needs a method to return an instance of the subclass, it can easily and safely opt-in to this behavior (similar to Hash)

class A < Array
  def select(...)
    A.new(super) #or e.g. dup.replace(super) depending on specifics of the subclass
  end
end

On the other hand returning a subclass by default opens the door to all kinds of complexity and bugs depending on how the subclass is implemented. In particular if it has any state/ivars. ary.select is not the same as ary.dup.select! in that case.

Is there somewhere a complete list of methods that currently return a subclass?
For Array I think there's only this: drop, drop_while, take, take_while, flatten, uniq, slice

Updated by jeremyevans0 (Jeremy Evans) over 3 years ago

matz (Yukihiro Matsumoto) wrote in #note-13:

Should we do an experiment in 3.0?

I've added a pull request that modifies the Array methods to return Array instances instead of subclass instances: https://github.com/ruby/ruby/pull/3690

Actions #18

Updated by jeremyevans (Jeremy Evans) over 3 years ago

  • Status changed from Assigned to Closed

Applied in changeset git|2a294d499bf03211d02695f613f784a05943ea35.


Make Array methods return Array instances instead of subclass instances

This changes the following methods to return Array instances instead
of subclass instances:

  • Array#drop
  • Array#drop_while
  • Array#flatten
  • Array#slice!
  • Array#slice/#[]
  • Array#take
  • Array#take_while
  • Array#uniq
  • Array#*

Fixes [Bug #6087]

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0