Project

General

Profile

Actions

Bug #19278

closed

Constructing subclasses of Data with positional arguments

Added by tenderlovemaking (Aaron Patterson) about 2 years ago. Updated about 2 years ago.

Status:
Rejected
Assignee:
-
Target version:
-
ruby -v:
ruby 3.2.0 (2022-12-25 revision a528908271) [arm64-darwin22]
[ruby-core:111484]

Description

I'd expect both of the following subclasses to work, but the subclass that uses positional parameters raises an exception:

Foo = Data.define

class Bar < Foo
  def initialize foo:
    p foo
  end
end

class Baz < Foo
  def initialize foo
    p foo
  end
end

Bar.new foo: 1 # Prints 1
Baz.new 1    # Raises ArgumentError

I'd expect the subclass that uses positional arguments to work.


Related issues 1 (0 open1 closed)

Related to Ruby master - Bug #19301: Fix Data class to report keyrest instead of rest parametersRejectedActions

Updated by zverok (Victor Shepelev) about 2 years ago

  • Status changed from Open to Feedback

It is a deliberate and documented design decision: new handles unifying positional and keyword args, and initialize always receives keyword args.

Note that Measure#initialize always receives keyword arguments, and that mandatory arguments are checked in initialize, not in new. This can be important for redefining initialize in order to convert arguments or provide defaults

Check it:

class Point < Data.define(:x, :y)
  def initialize(*a, **kwa)
    puts "args: #{a}, kwargs: #{kwa}"
    super
  end
end

Point.new(1, 2)
# args: [], kwargs: {:x=>1, :y=>2}
Point.new(x: 1, y: 2)
# args: [], kwargs: {:x=>1, :y=>2}

This decision was made for usability: this way, it is easy for class user to redefine #initialize with small additions (like converting argument types or providing defaults):

class Point < Data.define(:x, :y, :z)
  def initialize(x:, y:, z: 0) = super
end

p Point.new(1, 2)
#=> #<data Point x=1, y=2, z=0>
p Point.new(x: 1, y: 2)
#=> #<data Point x=1, y=2, z=0>
p Point.new(1, 2, 3)
#=> #<data Point x=1, y=2, z=3>
p Point.new(x: 1, y: 2, z: 3)
#=> <data Point x=1, y=2, z=3>

Now, imagine amount of code one would need to write in #initialize if the difference of keyword vs positional arguments initialization would be handled there.

Updated by tenderlovemaking (Aaron Patterson) about 2 years ago

I guess the fact that new doesn't behave the same way as new on every other class kind of ruins "usability" for me. As an end user, it's not clear why this works:

class Foo
  def initialize x:, y:
  end
end

class Bar < Foo
  def initialize(x, y)
    super(x:, y:)
  end
end

Bar.new(1, 2)

But this doesn't:

Foo = Data.define(:x, :y)

class Bar < Foo
  def initialize(x, y)
    super(x:, y:)
  end
end

Bar.new(1, 2)

And even more confusingly:

Foo = Data.define(:x, :y)

class Bar < Foo
  def initialize(x, y)
    super(x:, y:)
  end
end

Foo.new(1, 2) # this works
Bar.new(1, 2) # this doesn't

I should be able to call a constructor on the subclass in the same way that I could call the constructor on the super class.

Updated by Eregon (Benoit Daloze) about 2 years ago

It seems hard to solve to me, without losing the significant advantages that @zverok (Victor Shepelev) mentions.

Also inheriting from a Data class seems kind of an anti/rare pattern.
One can of course:

Foo = Data.define(:x, :y)
  def custom
  end
  ...
end

Updated by zverok (Victor Shepelev) about 2 years ago

@tenderlove I assure you that I spent a lot of time thinking about possible design decisions, and I believe that while somewhat unlike other classes, the current choice is the best in usability. It is documented (the documentation might emphasize the design choice more, with that I might agree).

The constraints I was working is was:

  • flexible creation (e.g. Point.new(1, 2) and Point.new(x: 1, y: 2) both works)
  • as clear error indication with wrong arguments as possible without creating a whole framework for "explanatory errors" (think Rust)
  • ease of updating initialize protocol for the end user (providing defaults and conversions).

If you have better idea of how to achieve it, I will gladly consider it.

That being said, I believe that "it was surprising on the first pass" does not necessarily equal "ruins usability", and the gains are more than losses.

Also inheriting from a Data class seems kind of an anti/rare pattern.

I don't think it is 100% true (especially if we'll consider Data.define(...) do as an inheritance). At least, even in pre-Data times, I have a fair share of usages doing nice things like

class Point < Struct.new(:x, :y)
  def distance_to(other)
    # ...
  end
end

I think "immutable value objects" doesn't imply "no additional convenience querying methods".

And, as I said, the new/initialize pair was specifically designed for usability on inheritance (or do-block-inheritance).

When we discussed the initial protocol, Matz was concerned about the ability to provide defaults for some fields (like some additional API maybe, like Data.define(:x, :y, z: 0) (which looks nice, but has some grave problems), and I believe that this example provides a clear and usable compromise:

class Point < Data.define(:x, :y, :z)
  def initialize(x:, y:, z: 0) = super
end

That is intended.

Data is a new thing. It has its quirks, but we put a lot into designing it as a small, easy-to-learn, and convenient API.

Updated by sam.saffron (Sam Saffron) about 2 years ago

Just going to be a bit more explicit here about the problem for future travelers (I know I am repeating but this is the most succinct way imo to show the issue)...

Foo = Data.define :x

class Bar < Foo
  def initialize(*args, **kwargs)
    p args
    p kwargs
  end
 end
 
Bar.new(1)
# []
# {:x=>1}  

I get that usability arguably (I would argue that now is surprising and unprecedented so it is a usability bomb, not bain) suffers for inheritance, but we would gain tons of clarity this way.

I am with tenderlove here, needing to reason about new -> initialize auto coercion is somewhat scary...

I know this is a bucket of compromises given where we are at, but couldn't this just work correctly without coercion?

class Foo
  attr_reader :x

  def initialize(*args, **kwargs)
    raise ArgumentError if args.length > 0 && kwargs.length > 0 
    raise ArgumentError if args.length > 1 || kwargs.length > 1
    raise ArgumentError if kwargs.length == 1 && !kwargs.key?(:x)

    if args.length > 0 
      @x = args[0]
    else
      @x = kwargs[:x]
    end
  end
end

Updated by zverok (Victor Shepelev) about 2 years ago

@sam.saffron your hand-maid Foo is analogous to Foo = Data.define(:x), right?

Now, can you please show what if somebody wants to inherit your Foo to provide:

  • argument conversion for x
  • or, default value for x

I believe, the best you can get with would be something like this, for data conversion:

UNDEFINED = Object.new.freeze

class MyFoo < Foo
  def initialize(pos_x = UNDEFINED, x: UNDEFINED)
    # 1. check that one, and exactly of them is `UNDEFINED`
    # 2. do the conversion with the right one, we'd like to preserve the protocol, right?..
    if pos_x == UNDEFINED
      super(x: x.to_i)
    else
      super(pos_x.to_i)
    end
  end
end

Providing the default value, how would you address this?.. Please share your thoughts!

With the current protocol, it is just like that:

# Conversion
Foo = Data.define(:x) do
  def initialize(x:) = super(x: x.to_i)
end

# Default value:
Foo = Data.define(:x) do
  def initialize(x: 0) = super
end

# Both with now work with Foo.new(1) and Foo.new(x: 1), without breaking the contract that Data promises

Yes, the rule "Data#initialize accepts only keyword arguments" is unlike the barest "simple class" implementation, but:

  1. For a good reason
  2. It is a rule consisting of exactly one phrase; even if one considers it "a dumb quirk," it is not something that is impossible to explain or remember
  3. But it is not "a dumb quirk"

Updated by tenderlovemaking (Aaron Patterson) about 2 years ago

Eregon (Benoit Daloze) wrote in #note-3:

It seems hard to solve to me, without losing the significant advantages that @zverok (Victor Shepelev) mentions.

Also inheriting from a Data class seems kind of an anti/rare pattern.
One can of course:

Foo = Data.define(:x, :y)
  def custom
  end
  ...
end

😅 I do this all the time with structs. I assumed Data.define would basically just be a read-only Struct.new and that's how I got here.

zverok (Victor Shepelev) wrote in #note-6:

Yes, the rule "Data#initialize accepts only keyword arguments" is unlike the barest "simple class" implementation

IMO this would be a fine rule if new also only accepted keyword arguments. The automatic coercion from positional to keyword is what got me in to this pickle. This coercion makes subclasses unable to use positional arguments with initialize (a feature which, as I pointed out, regular classes support).

I guess it's not really a big deal if I just make all of my subclasses use keyword arguments, and then only ever instantiate them using keyword parameters. But is that the "best practice"? If so, why implement the coercion feature?

Updated by zverok (Victor Shepelev) about 2 years ago

I do this all the time with structs. I assumed Data.define would basically just be a read-only Struct.new and that's how I got here.

(Wrote a long theoretical answer, deleted it!)

Wait. I now reread your initial example more attentively, and I see the source of the confusion now: that we are adjusting the initialize to accept the arguments the initial class knew nothing about (not redefining it to accept the same args, but differently), and the behavior is indeed hard to grasp immediately.

I am not sure something can be done about it in a consistent manner, though.

That's a source of the conveniences that are provided in other cases (I think for "simple data classes" ability to be initialized by both keyword and positional argument is crucial for their adoption; and I still believe that the convention of "initialize is easy to redefine, but it should always accept keywords" is a good enough compromise).

It seems that "completely redefine what data is accepted to constructor" would be a natural limitation of Data, which seems more or less suitable (inheriting to provide some convenience methods is one thing, building an inheritance chain is another).

The only way I see so far to make your Baz work is this:

Foo = Data.define

class Bar < Foo
  def self.new(foo) = super(foo:) # convert positional to keyword

  def initialize(foo:) = p foo # ...then they are safely passed to initialize
end

Bar.new(1)
#=> 1

BTW, note that before 3.2, for structs it was either keyword_init: false, or keyword_init: true, not both.

That's why you were able to easily redefine initialize: you knew it receives either keywords or positionals, not "both and need to choose". After 3.2, it is different, and I think that people would stumble on that many times (the design decision was made differently, too, unlike Data):

S = Struct.new(:x) do
  def initialize(*a, **kwa)
    puts "a=#{a}, kwa=#{kwa}"
  end
end
S.new(1)
# a=[1], kwa={}
S.new(x: 1)
# a=[], kwa={:x=>1}

I think it will cause a fair share of confusion in completely different ways than the design choice for Data did :)

Updated by Eregon (Benoit Daloze) about 2 years ago

To clarify:

Also inheriting from a Data class seems kind of an anti/rare pattern.

I meant one should use Struct.new(...) { ... } / Data.define(...) { ... } and not class < Struct.new(...) / class < Data.define(...) which is wasteful because it creates an extra class and makes the hierarchy needlessly one level deeper (and also causes an unnamed supercass).

Updated by zverok (Victor Shepelev) about 2 years ago

@Eregon (Benoit Daloze) Ah, yeah, that's true.

Though I saw codebases that did that for aesthetical purposes (all "classes with methods" look the same), or to not confuse some code analysis tools (I haven't checked lately, but I believe YARD at some point was able to properly parse class Foo < Struct.new(...) and produce docs for it, but not Foo = Struct.new(...) do)

Updated by tenderlovemaking (Aaron Patterson) about 2 years ago

zverok (Victor Shepelev) wrote in #note-8:

BTW, note that before 3.2, for structs it was either keyword_init: false, or keyword_init: true, not both.

That's why you were able to easily redefine initialize: you knew it receives either keywords or positionals, not "both and need to choose". After 3.2, it is different, and I think that people would stumble on that many times (the design decision was made differently, too, unlike Data):

S = Struct.new(:x) do
  def initialize(*a, **kwa)
    puts "a=#{a}, kwa=#{kwa}"
  end
end
S.new(1)
# a=[1], kwa={}
S.new(x: 1)
# a=[], kwa={:x=>1}

I don't really understand this example. The initialize works the same way whether you use a Struct or not:

S = Struct.new(:x) do
  def initialize(*a, **kwa)
    puts "a=#{a}, kwa=#{kwa}"
  end
end
S.new(1)
# a=[1], kwa={}
S.new(x: 1)
# a=[], kwa={:x=>1}

class K
  def initialize(*a, **kwa)
    puts "a=#{a}, kwa=#{kwa}"
  end
end
K.new(1)
# a=[1], kwa={}
K.new(x: 1)
# a=[], kwa={:x=>1}

I think it will cause a fair share of confusion in completely different ways than the design choice for Data did :)

Data is confusing for me because it doesn't share the same behavior as a regular class (where Struct does).

Eregon (Benoit Daloze) wrote in #note-9:

To clarify:

Also inheriting from a Data class seems kind of an anti/rare pattern.

I meant one should use Struct.new(...) { ... } / Data.define(...) { ... } and not class < Struct.new(...) / class < Data.define(...) which is wasteful because it creates an extra class and makes the hierarchy needlessly one level deeper (and also causes an unnamed supercass).

Sure, but making a class that eventually inherits from a class defined via Struct.new is something I normally do. I wanted to do the same thing with Data, but found I could not because it defines new differently.

Updated by zverok (Victor Shepelev) about 2 years ago

I don't really understand this example. The initialize works the same way whether you use a Struct or not

It is not.

  1. Both 3.2+ Struct and Data make a promise that no other class makes and is non-trivial to implement: if you have Struct/Data-derived class C with member x, then both C.new(x: 1) and C.new(1) work.
  2. These promises are implemented differently (because different people worked on them :shrug:): In Struct, the #initialize handles "unify positional and keywords", in Data, .new handles.
  3. This leads to different cost/benefit outcome: in Struct, it is "like in other classes" but redefining #initialize is non-trivial, in Data, it is "one interesting quirk", but redefining #initialize is trivial.

Can you please provide a small realistic example of how would you like to define your Struct or Data-inherited class' #initialize to not break this contract and achieve the goal you were trying to achieve? We can proceed from there to maybe come to some compromise or better understand each other's points.

Updated by tenderlovemaking (Aaron Patterson) about 2 years ago

  • Status changed from Feedback to Rejected

zverok (Victor Shepelev) wrote in #note-12:

I don't really understand this example. The initialize works the same way whether you use a Struct or not

It is not.

  1. Both 3.2+ Struct and Data make a promise that no other class makes and is non-trivial to implement: if you have Struct/Data-derived class C with member x, then both C.new(x: 1) and C.new(1) work.

Your example didn't demonstrate this difference.

  1. These promises are implemented differently (because different people worked on them :shrug:): In Struct, the #initialize handles "unify positional and keywords", in Data, .new handles.
  2. This leads to different cost/benefit outcome: in Struct, it is "like in other classes" but redefining #initialize is non-trivial, in Data, it is "one interesting quirk", but redefining #initialize is trivial.

I really don't understand point 3. As I showed in the original post as well as as other examples, I cannot "trivially" define an initialize as I would with normal Ruby classes.

Can you please provide a small realistic example of how would you like to define your Struct or Data-inherited class' #initialize to not break this contract and achieve the goal you were trying to achieve? We can proceed from there to maybe come to some compromise or better understand each other's points.

To be honest, I don't really care anymore. The problems with new on Data seem quite clear to me, but perhaps I am wrong. Struct seems to behave in the way I would expect, so I will just use that (or write regular POROs).

Thanks for the feedback.

Updated by zverok (Victor Shepelev) about 2 years ago

Your example didn't demonstrate this difference.

I really don't understand point 3.

To be honest, I don't really care anymore.

It is a pity, after the thread this long.

If you would've cared, you'd maybe spent a few minutes drafting an initialize that would work the way you want and maintained the contract, and my points would probably become more obvious.

(FTR, the starting example of the ticket wasn't really illustrative because the point of creating Data with no members and then adding instance variables in initialize is unclear. Data is a value object that is good for value object goals, not a generic "just base class for anything".)

But it is what it is, I think.

Updated by tenderlovemaking (Aaron Patterson) about 2 years ago

zverok (Victor Shepelev) wrote in #note-14:

Your example didn't demonstrate this difference.

I really don't understand point 3.

To be honest, I don't really care anymore.

It is a pity, after the thread this long.

If you would've cared, you'd maybe spent a few minutes drafting an initialize that would work the way you want and maintained the contract, and my points would probably become more obvious.

Sorry, I think you misunderstood. I understand your points, I am just out of steam and don't care to continue the thread anymore.

Thanks.

Actions #16

Updated by zverok (Victor Shepelev) about 2 years ago

  • Related to Bug #19301: Fix Data class to report keyrest instead of rest parameters added
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0