Project

General

Profile

Actions

Feature #19000

open

Data: Add "Copy with changes method" [Follow-on to #16122 Data: simple immutable value object]

Added by RubyBugs (A Nonymous) 3 months ago. Updated 3 days ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:109850]

Description

As requested: extracted a follow-up to #16122 Data: simple immutable value object from this comment

Proposal: Add a "Copy with changes" method to Data

Assume the proposed Data.define exists.
Seeing examples from the [Values gem]:

require "values"

# A new class
Point = Value.new(:x, :y)

# An immutable instance
Origin = Point.with(x: 0, y: 0)

# Q: How do we make copies that change 1 or more values?
right        = Origin.with(x: 1.0)
up           = Origin.with(y: 1.0)
up_and_right = right.with(y: up.y)

# In loops
movements = [
  [ :x, +0.5 ],
  [ :x, +0.5 ],
  [ :y, -1.0 ],
  [ :x, +0.5 ],
]

# position = Point(x: 1.5, y: -1.0)
position = movements.inject(Origin) do |p, (field, delta)|
  p.with(field => p.send(field) + delta)
end

Proposed detail: Call this method: #with

Money = Data.define(:amount, :currency)

account = Money.new(amount: 100, currency: 'USD')

transactions = [+10, -5, +15]

account = transactions.inject(account) { |a, t| a.with(amount: a.amount + t) }
#=> Money(amount: 120, currency: "USD")

Why add this "Copy with changes" method to the Data simple immutable value class?

Called on an instance, it returns a new instance with only the provided parameters changed.

This API affordance is now widely adopted across many languages for its usefulness. Why is it so useful? Because copying immutable value object instances, with 1 or more discrete changes to specific fields, is the proper and ubiquitous pattern that takes the place of mutation when working with immutable value objects.

Other languages

C# Records: “immutable record structs — Non-destructive mutation” — is called with { ... }
https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/record#nondestructive-mutation

Scala Case Classes — is called #copy
https://docs.scala-lang.org/tour/case-classes.html

Java 14+ Records — Brian Goetz at Oracle is working on adding a with copy constructor inspired by C# above as we speak, likely to be called #with
https://mail.openjdk.org/pipermail/amber-spec-experts/2022-June/003461.html

Rust “Struct Update Syntax” via .. syntax in constructor
https://doc.rust-lang.org/book/ch05-01-defining-structs.html#creating-instances-from-other-instances-with-struct-update-syntax

Alternatives

Without a copy-with-changes method, one must construct entirely new instances using the constructor. This can either be (a) fully spelled out as boilerplate code, or (b) use a symmetrical #to_h to feed the keyword-args constructor.

(a) Boilerplate using constructor

Point = Data.define(:x, :y, :z)
Origin = Point.new(x: 0.0, y: 0.0, z: 0.0)

change = { z: -1.5 }

# Have to use full constructor -- does this even work?
point = Point.new(x: Origin.x, y: Origin.y, **change)

(b) Using a separately proposed #to_h method and constructor symmetry

Point = Data.define(:x, :y, :z)
Origin = Point.new(x: 0.0, y: 0.0, z: 0.0)

change = { z: -1.5 }

# Have to use full constructor -- does this even work?
point = Point.new(**(Origin.to_h.merge(change)))

Notice that the above are not ergonomic -- leading so many of our peer language communities to adopt the #with method to copy an instance with discrete changes.

Updated by bkuhlmann (Brooke Kuhlmann) 3 months ago

💡 In case it's of interest, I've partially solved this problem in the Refinements gem several years back by adding the #merge and #merge! methods to Struct as a refinement. Here's the source code. I'm mostly posting this here in case it helps provide more example solutions to the problem mentioned in Alternative B above:

point = Point.new(**(Origin.to_h.merge(change)))

I've found that using keyword arguments, regardless of whether the struct was constructed using positionals or keywords, has been very effective in practice.

Updated by bdewater (Bart de Water) about 2 months ago

Example from Sorbet which uses with as well:

class Point < T::Struct
  const :x, Numeric
  const :y, Numeric
end

# An immutable instance
Origin = Point.new(x: 0, y: 0)

right = Origin.with(x: 1.0)
up = Origin.with(y: 1.0)
up_and_right = right.with(y: up.y)

Updated by RubyBugs (A Nonymous) about 2 months ago

Example from value_semantics gem which uses with as well:

Point = ValueSemantics::Struct.new do
  x Numeric
  y Numeric
end

# An immutable instance
Origin = Point.new(x: 0, y: 0)

right = Origin.with(x: 1.0)
up = Origin.with(y: 1.0)
up_and_right = right.with(y: up.y)

Updated by ufuk (Ufuk Kayserilioglu) about 2 months ago

Since Ruby already has the Object#dup method, I think we don't need to create a new method to duplicate a Data instance with potentially slightly different values.

I think allowing the dup method on Data instances to take optional keyword arguments would be the ideal API for this:

Point = Data.define(:x, :y, :z)
ORIGIN = Point.new(x: 0.0, y: 0.0, z: 0.0)

right = ORIGIN.dup(x: 1.0)
up = ORIGIN.dup(y: 1.0)
up_and_right = right.dup(y: up.y)

Updated by jeremyevans0 (Jeremy Evans) about 2 months ago

ufuk (Ufuk Kayserilioglu) wrote in #note-4:

I think allowing the dup method on Data instances to take optional keyword arguments would be the ideal API for this:

I agree. This is the pattern that Sequel uses for datasets (which are always frozen, as Data instances are), except Sequel uses clone instead of dup (for historical reasons and because it also needs to copy singleton classes).

Updated by Dan0042 (Daniel DeLorme) about 2 months ago

ufuk (Ufuk Kayserilioglu) wrote in #note-4:

I think allowing the dup method on Data instances to take optional keyword arguments would be the ideal API for this

I had a strong visceral "no" reaction to this. To a certain extent it's a logical idea, we are after all creating a kind of copy. But the dup/clone API is so old, so well established and so consistent throughout ruby, changing/extending it would be a mistake.

Updated by ufuk (Ufuk Kayserilioglu) about 1 month ago

... Sequel uses clone instead of dup (for historical reasons and because it also needs to copy singleton classes).

I think copying singleton classes into such copies might NOT be desired behaviour for Data objects. So, it might make more sense to use dup for this. That means, it should work in the following way:

Point = Data.define(:x, :y, :z)
ORIGIN = Point.new(x: 0.0, y: 0.0, z: 0.0)
ORIGIN.singleton_class.define_method(:distance_from_origin) { 0 }

right = ORIGIN.dup(x: 1.0)
up = ORIGIN.dup(y: 1.0)
up_and_right = right.dup(y: up.y)

ORIGIN.distance_from_origin # => 0
right.distance_from_origin # => undefined method `distance_from_origin' for #<data Point x=1.0, y=0.0, z=0.0> (NoMethodError)
up.distance_from_origin # => undefined method `distance_from_origin' for #<data Point x=0.0, y=1.0, z=0.0> (NoMethodError)
up_and_right.distance_from_origin # => undefined method `distance_from_origin' for #<data Point x=1.0, y=1.0, z=0.0> (NoMethodError)

Dan0042 (Daniel DeLorme) wrote in #note-6:

ufuk (Ufuk Kayserilioglu) wrote in #note-4:
... But the dup/clone API is so old, so well established and so consistent throughout ruby, changing/extending it would be a mistake.

I am not sure we would be extending the API in any incompatible way. Calling dup with no arguments would still work as it should, by creating a copy of the original object. If one supplies, optional, keyword arguments to dup, though, then one would get the special copying that is suitable for Data objects. BTW, it should be an error to supply a keyword argument that the Data class does not have as members.

Point = Data.define(:x, :y, :z)
ORIGIN = Point.new(x: 0.0, y: 0.0, z: 0.0)

cloned_origin = ORIGIN.dup # => works as expected
illegal = ORIGIN.dup(w: 1.0) # => ArgumentError

Updated by RubyBugs (A Nonymous) about 1 month ago

it should be an error to supply a keyword argument that the Data class does not have as members

+1 Agree!

Please note, performance work should be planned to ensure that such checking of the keyword args is as fast as possible and avoids unnecessary allocations.

For example, here's a real life pull request which sped up this error checking the Values gem #with method by 2.29x:
https://github.com/ms-ati/Values/commit/c3f00e3e8d67ff5d6470315ad137166177f079e2

Updated by tomstuart (Tom Stuart) 20 days ago

While it’s undeniable that this method duplicates its receiver in some sense, I’d like to add my support for the #with naming choice.

I don’t have any technical objection to using #dup, it’s just a stylistic preference: in the interests of programmer happiness, to me it’s more natural for the name to communicate the intent of the sender (“I want an object just like ORIGIN, except #with an x of 1.0”) rather than the lower-level implementation detail of what that operation involves (“I want a shallow clone of ORIGIN via #allocate & #initialize_copy, except assigning 1.0 as the value of the @x instance variable”).

Perhaps another way of putting this is that I’d estimate most Ruby programmers rarely call Object#dup in the course of their work, whereas I would expect users of Data to need this new “copy with changes” operation somewhat frequently, especially if they’re migrating existing code from Struct. But of course I have no data to support that guess!

Updated by p8 (Petrik de Heus) 20 days ago

I'm seeing some overlap with "Object#with to set and restore attributes around a block" https://bugs.ruby-lang.org/issues/18951

Should we have a single method Object#with_attrs that:

  • returns a copy with the attributes set by default
  • sets and restores the attributes when passed a block? (although that doesn't make sense for the immutable Data)
Origin = Point.new(x: 0, y: 0)
right = Origin.with_attrs(x: 1.0)
right.x # => 1.0

SomeLibrary.with_attrs(enabled: true) do
  SomeLibrary.enabled # => true
end
SomeLibrary.enabled # => false

Updated by nobu (Nobuyoshi Nakada) 19 days ago

  • Description updated (diff)

At the developers meeting today, someone proposed something like this.

point = Point.new(**(Origin.to_h.merge(change)))
point = Point.new(**Origin.to_h, **change)

Updated by tomstuart (Tom Stuart) 19 days ago

nobu (Nobuyoshi Nakada) wrote in #note-11:

At the developers meeting today, someone proposed something like this.

point = Point.new(**Origin.to_h, **change)

Agreed — this is what I do at the moment. There’s no need to #merge explicitly because “last keyword wins”, and although repeating a literal keyword will generate a warning, the same isn’t true when double-splatting the result of #to_h.

Updated by mame (Yusuke Endoh) 19 days ago

@RubyBugs (A Nonymous) I cannot understand the following parts of your first motivative example

# In loops
movements = [
  { x: +0.5 },
  { x: +0.5 },
  { y: -1.0 },
  { x: +0.5 },
]

# position = Point(x: 1.5, y: -1.0)
position = movements.inject(Origin) { |p, move| p.with(**move) }

According to the comment, do you really expect Point.new(x: 0.0+0.5+0.5+0.5, y: 0.0-1.0)? I don't think the proposed operation should mean addition implicitly.

If it is a mistake, could you update the code example?

Updated by ufuk (Ufuk Kayserilioglu) 18 days ago

I just put up a PR implementing this using the dup proposal: https://github.com/ruby/ruby/pull/6766

Here are my objections to other proposals:

  1. point = Point.new(**Origin.to_h, **change) works but is very verbose and not intention revealing, in my opinion. For the reader of this code, it is very hard to understand what the intended operation is.
  2. with_attrs (from https://bugs.ruby-lang.org/issues/18951): That proposal is explicitly about mutating the receiver of with_attrs, which is the exact opposite of what this "copy with changes" operation is trying to do. For that reason, I don't think we should be conflating that with this proposal.
  3. with: While I understand that there is precedent for using with from other languages, I have always found the name to not make it clear that a copy is being made. Since with implies doing something "with" original receiver, it leads less experienced folks to think that the receiver might be modified in the process. I think we have an opportunity to make the operation more clear by giving it a better name.

My reasoning for dup is to make it explicit that a copy is being created (just as how a string copy is created when "foo".dup is called) but that the copy has updated values for the supplied member fields. dup is already an existing protocol for creating shallow copies of objects, and can already be used on Data class instance even without this change, of course without being able to affect the copy in any way. Extending the existing copying protocol of Ruby to optionally add the ability to affect the copy feels like the best way forward to me.

I would be happy to hear feedback from the community on my PR and if a different name than dup is picked in the end, also would be happy to update the PR to conform.

Updated by bdewater (Bart de Water) 8 days ago

I like dup as the method name 👍

tomstuart (Tom Stuart) wrote in #note-9:

Perhaps another way of putting this is that I’d estimate most Ruby programmers rarely call Object#dup in the course of their work, whereas I would expect users of Data to need this new “copy with changes” operation somewhat frequently, especially if they’re migrating existing code from Struct. But of course I have no data to support that guess!

I don't think dup is that rare, IME most Ruby programmers are familiar with it even if they don't use it daily. Some unscientific data points of usage in Rails apps:

Updated by RubyBugs (A Nonymous) 8 days ago

bdewater (Bart de Water) wrote in #note-15:

I like dup as the method name 👍

Is there a way we could get more active Rubyists to weigh in? My sense is that there is a real tension in that:

  • Nearly every major value objects gem use #with
  • Most other language with value objects use #with

On the other hand, there seem to be a number of voices on this Bug thread, who while they don't necessarily currently work with code that uses this exact pattern, feel strongly that overloading the meaning of #dup is a better choice for Ruby. Even the amazing and incomparable @jeremyevans, whose Sequel gem our team depends on as well :)

While it might not generally be the practice of the Ruby community, would we consider a way to get more "working Rubyist" eyes on this question? Perhaps by getting this thread in to the Ruby Weekly News for example?

In the end, having the method is more important than the name. But it does seem important to let voices be heard?

Updated by mame (Yusuke Endoh) 8 days ago

@RubyBugs (A Nonymous) Please check my comment https://bugs.ruby-lang.org/issues/19000#note-13 . A wrong motivation example raises the suspicion that this API is actually confusing to users.

Updated by p8 (Petrik de Heus) 7 days ago

If dup is chosen would it make sense to always allow dup methods to take arguments for consistency?

For example, it would allow the following code in Rails

firm = Firm.first.dup
firm.account = Account.first
assert_queries(2) { firm.save! }

(https://github.com/rails/rails/blob/8e34831f97acd7448fd68a6ac130694079aea951/activerecord/test/cases/autosave_association_test.rb#L277-L279)

to be written as:

firm = Firm.first.dup(account: Account.first)
assert_queries(2) { firm.save! }

Updated by Eregon (Benoit Daloze) 7 days ago

p8 (Petrik de Heus) wrote in #note-18:

If dup is chosen would it make sense to always allow dup methods to take arguments for consistency?

I'm rather against that, it makes the standard dup so much more complicated, and it would most likely be pretty slow in CRuby.
If it's only for Data it's a much smaller concern.

Updated by Dan0042 (Daniel DeLorme) 7 days ago

p8 (Petrik de Heus) wrote in #note-18:

If dup is chosen would it make sense to always allow dup methods to take arguments for consistency?

It would make sense, but it would require everyone who wrote a custom #dup to extend their version to support this new API pattern. Not likely to happen.

On the other hand with #with it would be possible to implement a general version for all objects

def with(attr={})
  attr.each_with_object(self.dup) do |(k,v),obj|
    obj.send("#{k}=", v)
  end
end

(but let's keep in mind this proposal is about Data#with, not Object#with)

Updated by RubyBugs (A Nonymous) 6 days ago

mame (Yusuke Endoh) wrote in #note-17:

@RubyBugs (A Nonymous) Please check my comment https://bugs.ruby-lang.org/issues/19000#note-13 . A wrong motivation example raises the suspicion that this API is actually confusing to users.

Quite right @mame (Yusuke Endoh), thank you for catching that! I provided two examples. The first example is intended to use the previous values in creating the next values, and that part was broken.

How is this?

##
# Example of proposed Data#with
#
# To try this out:
#
#     ruby-install ruby-3.2.0-preview3
#
##

Point = Data.define(:x, :y) do
  # Example only, too slow. Real implementation should eliminate allocations.
  def with(**args)
    raise ArgumentError unless args.keys.all? { |k| members.include? k }
    self.class.new(**(to_h.merge(args)))
  end  
end

Origin = Point.new(x: 0, y: 0)

# Q: How do we make copies that change 1 or more values?
right        = Origin.with(x: 1.0)
up           = Origin.with(y: 1.0)
up_and_right = right.with(y: up.y)

begin
  Origin.with(x: 1, z: 0)
rescue ArgumentError
  puts "#with(z: 0) raised ArgumentError as expected"
end

# In loops
movements = [
  [ :x, +0.5 ],
  [ :x, +0.5 ],
  [ :y, -1.0 ],
  [ :x, +0.5 ],
]

position = movements.inject(Origin) do |p, (field, delta)|
  p.with(field => p.send(field) + delta) 
end

# position = Point(x: 1.5, y: -1.0)

Updated by RubyBugs (A Nonymous) 6 days ago

For comparison, you can plug in the following example code for a faster implementation of #with and it should work the same:

Point = Data.define(:x, :y) do
  # Example only, too slow. Real implementation should eliminate allocations.
  # def with(**args)
  #   raise ArgumentError unless args.keys.all? { |k| members.include? k }
  #   self.class.new(**(to_h.merge(args)))
  # end  

  # A trick to avoid `nil` for defaults, so that `nil` can be a valid value
  DEFAULT = Class.new.new.freeze

  # An example of a faster implementation that could inspire auto-generation
  def with(x: DEFAULT, y: DEFAULT)
    self.class.new(
      x: x.equal?(DEFAULT) ? self.x : x,
      y: y.equal?(DEFAULT) ? self.y : y,
    )
  end  
end
Actions #23

Updated by RubyBugs (A Nonymous) 6 days ago

  • Description updated (diff)

Updated by mame (Yusuke Endoh) 5 days ago

RubyBugs (A Nonymous) wrote in #note-21:

How is this?

Thanks for the update.

Now I have a question. Do you really want to write p.with(field => p.send(field) + delta)? I don't think it is very elegant. It is not very convincing (at least, to me) as a first motivation example.

Also, do you need the ability to update multiple fields at once? Both motivation examples only update a single field. This may be the result of simplifying the motivation example, though.

Looking at these motivation examples, there may be room to consider an API like p.with(field) {|old_value| old_value + delta } or something.

Updated by mame (Yusuke Endoh) 5 days ago

Discussed at the dev meeting.

Regarding method names, @matz (Yukihiro Matsumoto) likes Data#update. However, @ko1 (Koichi Sasada) objects because "update" is used as a destructive vocabulary in Hash#update. @matz (Yukihiro Matsumoto) said Data#with is acceptable; Data#dup is not acceptable.

Updated by ufuk (Ufuk Kayserilioglu) 4 days ago

Thanks for the discussion and the name suggestion. I updated the PR https://github.com/ruby/ruby/pull/6766 to use Data#with. I would be grateful if it could get a review.

However, that raises a question: How should Data#with behave if it is passed no arguments? Semantically, that should mean "create a copy of the current data object with no fields changed", but then that ends up being identical to what Kernel#dup does. Also, with with no arguments does not read that well.

@matz (Yukihiro Matsumoto) and @mame (Yusuke Endoh) Should it be an error?

Example:

Point = Data.define(:x, :y)
origin = Point.new(x: 0, y: 0)
new_origin = origin.with # should this be an error?

Updated by matz (Yukihiro Matsumoto) 4 days ago

I think it should be an error. I am still not 100% sure with is the best name.

Matz.

Updated by ufuk (Ufuk Kayserilioglu) 4 days ago

Thank you @matz (Yukihiro Matsumoto). I will update the implementation to make the no-args case an error, and point people to use dup to make identical shallow clones instead in the error message.

I read the dev meeting notes and saw that you made a point about the main of the operation not being about duplication. While I agree with that, I also feel that makes people think that they are making a cheap operation, when they are actually creating a clone with modified values. I had suggested dup to make that explicit, but happy to go with another name that is chosen that gives a similar sense of an extra clone operation happening under the hood. Another name that was suggested was a dup_with that I also find acceptable personally.

Along the same lines, I feel like update gives the wrong impression that the receiver instance will be updated, so my vote would be against it.

Updated by RubyBugs (A Nonymous) 3 days ago

ufuk (Ufuk Kayserilioglu) wrote in #note-28:

Thank you @matz (Yukihiro Matsumoto). I will update the implementation to make the no-args case an error, and point people to use dup to make identical shallow clones instead in the error message.

Hi Matz and Ufuk, please consider that making no-args an error makes programming around this API dynamically more complicated — to generate a set of changes as a Hash and pass then goes from:

changes = {} # calculated somehow
p = Origin.with(**changes) # p == Origin is ok

to:

changes = {} # calculated somehow
p = unless changes.empty
      Origin.with(**changes)
    else
      Origin
    end

Because dynamically generated changes is an anticipated use for such an API, would we consider allowing calling #with with no arguments to return self?

Semantically one could describe this API behavior as: “an immutable value object with no changes is itself”?

As another reference point: in Scala case class this operation is called #copy, and I think it returns the original object instance in the no-args case.
https://docs.scala-lang.org/tour/case-classes.html

Updated by tomstuart (Tom Stuart) 3 days ago

RubyBugs (A Nonymous) wrote in #note-29:

please consider that making no-args an error makes programming around this API dynamically more complicated

I second this request. In the interests of regularity, it should be fine to ask for a Data instance #with “no changes”, otherwise the programmer has to anticipate & handle that as an edge case unnecessarily if they intend to generically handle changes which originate from elsewhere rather than supplying keyword arguments inline.

Whether doing so returns self or a shallow copy shouldn’t be very important since the instances are immutable, but I agree that returning self would be more semantically predictable, slightly more efficient, and a nice way to differentiate it from the behaviour of #dup.

Updated by ufuk (Ufuk Kayserilioglu) 3 days ago

RubyBugs (A Nonymous) wrote in #note-29:

Hi Matz and Ufuk, please consider that making no-args an error makes programming around this API dynamically more complicated

Indeed, that is true. However, personally, I find a with call with no arguments very hard to read and reason about. It just reads badly as an English construct. It is like saying, "can I have tea with".

Semantically one could describe this API behavior as: “an immutable value object with no changes is itself”?

Indeed this is a good idea for what the behaviour should be but does not alleviate the naming/readability concern that I've raised above.

As another reference point: in Scala case class this operation is called #copy, and I think it returns the original object instance in the no-args case.
https://docs.scala-lang.org/tour/case-classes.html

Indeed, it was along these lines that I'd suggested dup originally. Like I've said before, I am flexible on what the name should be and copy actually addresses the naming concern equally as well as dup in my opinion. It also makes sense to ask for a "copy" with no values changed and readability is not hurt in any way.

Updated by austin (Austin Ziegler) 3 days ago

ufuk (Ufuk Kayserilioglu) wrote in #note-31:

RubyBugs (A Nonymous) wrote in #note-29:

As another reference point: in Scala case class this operation is called #copy, and I think it returns the original object instance in the no-args case.
https://docs.scala-lang.org/tour/case-classes.html

Indeed, it was along these lines that I'd suggested dup originally. Like I've said before, I am flexible on what the name should be and copy actually addresses the naming concern equally as well as dup in my opinion. It also makes sense to ask for a "copy" with no values changed and readability is not hurt in any way.

I think that #copy is OK, but this is more of a (checked?) #merge, right? I think that would be where my question comes in. If origin is a Point containing x and y values, then origin#copy(x: 1, y: 2, z: 3) should do…what? Should it error or just take the values that it knows? Because the shape provided to #copy is no longer a Point shape.

Updated by ufuk (Ufuk Kayserilioglu) 3 days ago

austin (Austin Ziegler) wrote in #note-32:

If origin is a Point containing x and y values, then origin#copy(x: 1, y: 2, z: 3) should do…what? Should it error or just take the values that it knows?

The current implementation in the PR treats unknown keywords passed to this method as an error. I had already made it clear that I thought it should be an error at the end of my message earlier in this thread https://bugs.ruby-lang.org/issues/19000#note-7 and I think we are all in agreement on this point in this thread.

Updated by RubyBugs (A Nonymous) 3 days ago

…treats unknown keywords passed to this method as an error. I had already made it clear that I thought it should be an error at the end of my message earlier in this thread … and I think we are all in agreement on this point in this thread.

+1 agreement on treating unknown keywords as error! :-)

Actions

Also available in: Atom PDF

Like3
Like0Like1Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like1Like0Like0Like0Like0