Project

General

Profile

Actions

Feature #18035

open

Introduce general model/semantic for immutable by default.

Added by ioquatix (Samuel Williams) 3 months ago. Updated 1 day ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:104560]

Description

It would be good to establish some rules around mutability, immutability, frozen, and deep frozen in Ruby.

I see time and time again, incorrect assumptions about how this works in production code. Constants that aren't really constant, people using #freeze incorrectly, etc.

I don't have any particular preference but:

  • We should establish consistent patterns where possible, e.g.
    • Objects created by new are mutable.
    • Objects created by literal are immutable.

We have problems with how freeze works on composite data types, e.g. Hash#freeze does not impact children keys/values, same for Array. Do we need to introduce freeze(true) or #deep_freeze or some other method?

Because of this, frozen does not necessarily correspond to immutable. This is an issue which causes real world problems.

I also propose to codify this where possible, in terms of "this class of object is immutable" should be enforced by the language/runtime, e.g.

module Immutable
  def new(...)
    super.freeze
  end
end

class MyImmutableObject
  extend Immutable

  def initialize(x)
    @x = x
  end

  def freeze
    return self if frozen?

    @x.freeze

    super
  end
end

o = MyImmutableObject.new([1, 2, 3])
puts o.frozen?

Finally, this area has an impact to thread and fiber safe programming, so it is becoming more relevant and I believe that the current approach which is rather adhoc is insufficient.

I know that it's non-trivial to retrofit existing code, but maybe it can be done via magic comment, etc, which we already did for frozen string literals.

Updated by ioquatix (Samuel Williams) 3 months ago

  • Subject changed from Introduce general module for immutable by default. to Introduce general model/semantic for immutable by default.

Fix title.

Updated by duerst (Martin Dürst) 3 months ago

This is mostly just a generic comment that may not be very helpful, but I can only say that I fully agree. Even before talking about parallel stuff (thread/fiber), knowing some object is frozen can be of help when optimizing.

One thing that might be of interest is that in a method chain, a lot of the intermediate objects (e.g. arrays, hashes) may be taken to be immutable because they are just passed to the next method in the chain and never used otherwise (but in this case, it's just the top object that's immutable, not necessarily the components, which may be passed along the chain).

Updated by ioquatix (Samuel Williams) 3 months ago

Regarding method chains, one thing that's always bothered me a bit is this:

def foo(*arguments)
    pp object_id: arguments.object_id, frozen: arguments.frozen?
end

arguments = [1, 2, 3].freeze
pp object_id: arguments.object_id, frozen: arguments.frozen?

foo(*arguments)

I know it's hard to implement this and also retain compatibility, but I feel like the vast majority of allocations done by this style of code are almost always unneeded. We either need some form of escape/mutation analysis, or copy-on-write for Array/Hash (maybe we have it already and I don't know).

Actions #4

Updated by jeremyevans0 (Jeremy Evans) 3 months ago

  • Backport deleted (2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN)
  • Tracker changed from Bug to Feature
Actions #5

Updated by Eregon (Benoit Daloze) 2 months ago

  • Description updated (diff)

Updated by Eregon (Benoit Daloze) 2 months ago

Many things discussed in the description here.

I think it's important to differentiate shallow frozen (Kernel#frozen?) and deep frozen (= immutable), and not try to change their meaning.
So for example overriding freeze to deep freeze does not seem good.

There was a suggestion for deep_freeze in #17145, which IMHO would be a good addition.

Objects created by literal are immutable.

I don't agree, for instance [] and {} should not be frozen, that would just be counter-productive in many cases.

Maybe CONSTANT = value should .deep_freeze the value, this was discussed with Ractor.make_shareable but that was rejected (#17273).

There is also the question of how to mark a class as creating immutable objects.
And potentially still allow to subclass it, and what it should do with initialize_copy, allocate, etc.
That's illustrated with the Immutable above but otherwise not much discussed.
I think that's probably worth its own ticket, because it's a big enough subject of its own, I'll try to make one.

copy-on-write for Array

That's required for efficient Array#shift so you can assume it's there on all major Ruby implementations.

Updated by ioquatix (Samuel Williams) 28 days ago

Here is a proposed PR which implements the very basics.

https://github.com/ruby/ruby/pull/4879

I'm sure we can expand it to include many of the discussed features (e.g. dup/clone).

Whether we call deep freeze is implementation specific - user can override def freeze to suit the needs of the object.

Updated by Eregon (Benoit Daloze) 28 days ago

Immutable means deeply frozen to me, not just shallow frozen (which is just Kernel#frozen?).

Updated by Dan0042 (Daniel DeLorme) 28 days ago

I don't care much for this 'immutable' stuff, but as long as no backward compatibility is introduced (like making literals immutable) then I don't mind.

If immutable means deep frozen, one concern I have is about side effects of immutability. If I do MyImmutableObject.new(array) I might not expect array to suddenly become frozen. Should the definition of immutability include making a deep copy of non-immutable objects?

Updated by maciej.mensfeld (Maciej Mensfeld) 28 days ago

Should the definition of immutability include making a deep copy of non-immutable objects?

Deep copy or frozen args requirement in the first place.

I think this is a feature that for advanced users can bring a lot of benefits. Many things in dry ecosystem are forfrozen when built (contract classes for example). Could we extend it to include also class and module definitions, so once initialized, cannot be changed?

On one side it would break monkey-patching but at the same time would allow easier (more verbose?) expression of "this is not to be touched" when building libs.

Updated by jeremyevans0 (Jeremy Evans) 28 days ago

maciej.mensfeld (Maciej Mensfeld) alluded to this already, but one thing to consider is that no object in Ruby is truly immutable unless all entries in object.singleton_class.ancestors are also frozen/immutable. Additionally, depending on your definition of immutable, you may want all constants referenced by any method defined in any of object.singleton_class.ancestors to also be frozen/immutable.

Updated by tenderlovemaking (Aaron Patterson) 28 days ago

jeremyevans0 (Jeremy Evans) wrote in #note-11:

maciej.mensfeld (Maciej Mensfeld) alluded to this already, but one thing to consider is that no object in Ruby is truly immutable unless all entries in object.singleton_class.ancestors are also frozen/immutable.

Are they not? It seems like for Arrays they are (I haven't checked other types), so maybe there's some precedent:

x = [1, 2, 3].freeze

Mod = Module.new { def foo; end }

begin
  x.extend(Mod)
rescue FrozenError
  puts "can't extend"
end

begin
  def x.foo; end
rescue FrozenError
  puts "can't def"
end

begin
  y = x.singleton_class
  def y.foo; end
rescue FrozenError
  puts "can't def singleton"
end

Updated by jeremyevans0 (Jeremy Evans) 28 days ago

tenderlovemaking (Aaron Patterson) wrote in #note-12:

jeremyevans0 (Jeremy Evans) wrote in #note-11:

maciej.mensfeld (Maciej Mensfeld) alluded to this already, but one thing to consider is that no object in Ruby is truly immutable unless all entries in object.singleton_class.ancestors are also frozen/immutable.

Are they not? It seems like for Arrays they are (I haven't checked other types), so maybe there's some precedent:

x = [1, 2, 3].freeze

Mod = Module.new { def foo; end }

begin
  x.extend(Mod)
rescue FrozenError
  puts "can't extend"
end

begin
  def x.foo; end
rescue FrozenError
  puts "can't def"
end

begin
  y = x.singleton_class
  def y.foo; end
rescue FrozenError
  puts "can't def singleton"
end

Apologies for not being more clear. Freezing an object freezes the object's singleton class. However, it doesn't freeze the other ancestors in singleton_class.ancestors:

c = Class.new(Array)
a = c.new
a << 1
a.first # => 1
c.define_method(:first){0}
a.first # => 0

As this shows, an instance of a class is not immutable unless its class and all other ancestors of the singleton class are immutable.

Updated by tenderlovemaking (Aaron Patterson) 28 days ago

jeremyevans0 (Jeremy Evans) wrote in #note-13:

tenderlovemaking (Aaron Patterson) wrote in #note-12:

jeremyevans0 (Jeremy Evans) wrote in #note-11:

maciej.mensfeld (Maciej Mensfeld) alluded to this already, but one thing to consider is that no object in Ruby is truly immutable unless all entries in object.singleton_class.ancestors are also frozen/immutable.

Are they not? It seems like for Arrays they are (I haven't checked other types), so maybe there's some precedent:

x = [1, 2, 3].freeze

Mod = Module.new { def foo; end }

begin
  x.extend(Mod)
rescue FrozenError
  puts "can't extend"
end

begin
  def x.foo; end
rescue FrozenError
  puts "can't def"
end

begin
  y = x.singleton_class
  def y.foo; end
rescue FrozenError
  puts "can't def singleton"
end

Apologies for not being more clear. Freezing an object freezes the object's singleton class. However, it doesn't freeze the other ancestors in singleton_class.ancestors:

c = Class.new(Array)
a = c.new
a << 1
a.first # => 1
c.define_method(:first){0}
a.first # => 0

As this shows, an instance of a class is not immutable unless its class and all other ancestors of the singleton class are immutable.

Ah right. I think your example is missing a freeze, but I get it. If freezing an instance were to freeze all ancestors of the singleton, wouldn't that extend to Object / BasicObject? I feel like we'd have to stop freezing somewhere because it would be pretty surprising if you can't define a new class or something because someone did [].freeze. Simple statements like FOO = [1].freeze wouldn't work (as Object would get frozen before we could set the constant).

Maybe we could figure out a cheap way to copy things so that a mutation to the Class.new from your example wouldn't impact the instance a.

But regardless it seems like gradual introduction would be less surprising. IOW maybe the goal would be to make all references immutable, but that really isn't practical. Instead expand the frozen horizon as much as we can without breaking existing code?

Updated by ioquatix (Samuel Williams) 28 days ago

I would like us to define a model for immutability that has real world use cases and applicability - i.e. useful to developers in actual situations rather than theoretically sound and impossible to implement or impossible to use. Not that I'm saying theoretically sound is not important, just that we have to, as was said above, stop somewhere. Since this proposal includes a new module, it's totally optional. But that module is semantically independent from how we actually implement some kind of deep_freeze. The point of the module is to enforce it in a visible way - as in, this class will always be frozen. Such a design can then be used by the interpreter for constant propagation, etc.

Updated by jeremyevans0 (Jeremy Evans) 28 days ago

tenderlovemaking (Aaron Patterson) wrote in #note-14:

Ah right. I think your example is missing a freeze, but I get it. If freezing an instance were to freeze all ancestors of the singleton, wouldn't that extend to Object / BasicObject? I feel like we'd have to stop freezing somewhere because it would be pretty surprising if you can't define a new class or something because someone did [].freeze. Simple statements like FOO = [1].freeze wouldn't work (as Object would get frozen before we could set the constant).

Correct. Kernel#freeze behavior should not change anyway, it should continue to mean a shallow freeze. This does point out that a #deep_freeze method on an object doesn't result in true immutability. You would have to pick an arbitrary point in the class hierarchy unless you wanted it to freeze all classes. I don't like such an approach.

Maybe we could figure out a cheap way to copy things so that a mutation to the Class.new from your example wouldn't impact the instance a.

Copying method handles into a singleton is one simple idea, but I cannot see how that would work with super, and it would result in a significant performance decrease.

But regardless it seems like gradual introduction would be less surprising. IOW maybe the goal would be to make all references immutable, but that really isn't practical. Instead expand the frozen horizon as much as we can without breaking existing code?

I don't think we should change the semantics of Kernel#freeze. In regards to an Immutable module, I'm neither opposed to it nor in favor of it, but we should recorgnize that it would not be able to offer true immutability.

ioquatix (Samuel Williams) wrote in #note-15:

I would like us to define a model for immutability that has real world use cases and applicability - i.e. useful to developers in actual situations rather than theoretically sound and impossible to implement or impossible to use. Not that I'm saying theoretically sound is not important, just that we have to, as was said above, stop somewhere. Since this proposal includes a new module, it's totally optional. But that module is semantically independent from how we actually implement some kind of deep_freeze. The point of the module is to enforce it in a visible way - as in, this class will always be frozen. Such a design can then be used by the interpreter for constant propagation, etc.

I don't think Ruby necessarily has an immutability problem currently. You can override #freeze as needed in your classes to implement whatever frozen support you want, and freeze objects inside #initialize to have all instances be frozen (modulo directly calling allocate).

I have a lot of experience developing libraries that are designed to be frozen after application initialization. Both Sequel and Roda use this approach, either by default or as an option, and freezing results in significant performance improvements in both. I don't believe Ruby's current support for freezing objects is lacking, but I recognize that it could be made easier for users.

If you want to freeze the entire core class hierarchy, you can, and if you do it correctly, nothing breaks. I know this from experience as I run my production web applications with this approach (using https://github.com/jeremyevans/ruby-refrigerator). However, with this approach, the class/module freezing is not implicit due to instance freezing, it's explicit after an application is fully initialized, before accepting requests. The reason to do this is to ensure that nothing in your application is modifying the core classes at runtime.

Updated by ioquatix (Samuel Williams) 27 days ago

Maybe we can collect use cases where such an approach makes sense.

ko1 (Koichi Sasada) changed Process::Status to be frozen by default. What is the logic? What is the problem we are trying to solve by doing this? Is it to make things sharable by Ractor?

Eregon (Benoit Daloze) asserted that we should make as many of the core classes frozen by default. What's the advantage of this?

jeremyevans0 (Jeremy Evans) your general model makes sense to me and I admire your approach to freezing the runtime. Can you explain where the performance advantages come from? Also:

and freeze objects inside #initialize to have all instances be frozen

Doesn't this break sub-classes that perform mutable initialisation?

Updated by jeremyevans0 (Jeremy Evans) 27 days ago

ioquatix (Samuel Williams) wrote in #note-17:

jeremyevans0 (Jeremy Evans) your general model makes sense to me and I admire your approach to freezing the runtime. Can you explain where the performance advantages come from? Also:

Performance advantages come from two basic ideas:

1) If objects are frozen, it opens up additional opportunities for caching them. With mutable objects, caching is very tricky. You can obviously clear caches when you detect mutation of the current object, but if you can mutate the objects held in the cache, then cache invalidation is very hard to get right. Purely immutable objects don't support internal caching, since an immutable cache is worthless, so this approach relies on let's say mostly immutable objects. Sequel datasets use this approach. They are always frozen, and take heavy advantage of caching to reduce the amount of work on repetitive calls. This is what allows you to have call chains like ModelClass.active.recent.first that do almost no allocation in subsequent calls in Sequel, as both the intermediate datasets and the SQL to use for the call are cached after the first call.

2) When freezing a class, you can check if the default implementation of methods has not been modified by checking the owner of the method. If the default implementation of methods has not been modified, you can inline optimized versions for significantly improved performance. Roda uses this approach to improve routing and other aspects of its performance.

and freeze objects inside #initialize to have all instances be frozen

Doesn't this break sub-classes that perform mutable initialisation?

It doesn't break subclass initialization, as long as the subclass mutates the object in #initialize before calling super instead of after. Alternatively, you can have freeze in the subclass check for partial initialization, and finish initialization before calling super in that case.

Updated by ioquatix (Samuel Williams) 23 days ago

I'm happy with the current PR which invokes #freeze after calling #new.

If we can't guarantee full immutability, is this sufficient? Independently we could look at providing #deep_freeze which I think classes would opt into as in def freeze; deep_freeze; end.

If we have a problem with the terminology, what about calling the module Frozen rather than Immutable? However, I think Immutable sends a clearer message about the intention.

Can we make this compatible with Ractor.make_shareable? I think that's a valid use case. As in, I think we should have an interface for immutability which does not depend on/is compatible with Ractor.

Updated by ioquatix (Samuel Williams) 23 days ago

Looking at Ractor.make_shareable, wouldn't this implementation be a candidate for a potential #deep_freeze implementation? It seems really okay to me on the surface:

irb(main):005:0> Ractor.make_shareable(x)
=> [[1, 2, 3], [2, 3, 4]]
irb(main):006:0> x.frozen?
=> true
irb(main):007:0> x[0].frozen?
=> true

Updated by Eregon (Benoit Daloze) 17 days ago

I think nobody expects #freeze or #deep_freeze to ever freeze (non-singleton) classes/modules, so IMHO these methods simply not attempt that (except SomeModule.freeze of course).
It's the difference between state (ivars, and values of these ivars) and behavior (methods of a class).

ioquatix (Samuel Williams) wrote in #note-17:

Eregon (Benoit Daloze) asserted that we should make as many of the core classes frozen by default. What's the advantage of this?

I made an extensive list of immutable classes in core here, as well as the many advantages:
https://gist.github.com/eregon/bce555fa9e9133ed27fbfc1deb181573

I'll copy the advantages here:
Advantages:

  • No need to worry about .allocate-d but not #initialize-d objects => not need to check in every method if the object is #initialize-d
  • internal state/fields can be truly final/const.
  • simpler and faster 1-step allocation since there is no dynamic call to #initialize (instead of .new calls alloc_func and #initialize)
  • Known immutable by construction, no need for extra checks, no need to iterate instance variables since no instance variables
  • Potentially lower footprint due to no instance variables
  • Can be shared between Ractors freely and with no cost
  • Can be shared between different Ruby execution contexts in the same process and even in persisted JIT'd code
  • Easier to reason about both for implementers and users since there is no state
  • Can be freely cached as it will never change

There is a sub-category of classes with .allocate undefined or allocator undefined, and noop initialize, those only have the first 3 advantages, but still better than nothing.

The first advantage is I think quite important as it avoids needing to care about initialized checks for things like klass.allocate.some_method.

IMHO the most valuable advantages of immutable classes are that they are easier to reason about, but also they can be shared between Ractors, execution contexts in the same process (like V8 isolated contexts, I think JRuby also has those, it improves footprint and can improve warmup by JIT'ing once per process and not per context), and also in persisted JIT'd code.
Persisted JIT'd code is a feature being developed in TruffleRuby and it enables to save the JIT'ed code of a process and reuse it for the next processes.
For classes which have a literal notation, it's quite important they are immutable, otherwise one would need to reallocate one instance per execution context which feels clearly inefficient.

Given the many advantages, I think we should make more core classes immutable or classes with .allocate undefined or allocator undefined, and noop initialize, as much as possible.

To be shareable between execution contexts and persisted JIT'd code they need to have a well known class.
Subclassing is therefore not possible since Ruby classes are stateful.
Anyway it is highly discouraged to subclass core classes so I think that is not much of an issue.


For Process::Status, it's already in the immutable core classes, let's keep it that way.
I don't think making it subclassable is useful.
The way to create an instance for Ruby could be Process::Status.new(*args) and we override that .new to already freeze, or something like Process::Status(*args) or Process.status(*args).


Regarding making user classes immutable, I think one missing piece is this hardcoded list of immutable classes in Kernel#dup and Kernel#clone.
Overriding #dup and #clone in the user class works around it, but then it doesn't work for Kernel.instance_method(:clone).bind_call(obj) as that will actually return a mutable copy!
It's then possible to e.g. call initialize on that mutable copy to mutate it, which breaks the assumption of the author of the class.

So I think we need a way for a user class to define itself as immutable (extend Immutable is one way, could also be by defining MyClass.immutable?), and for Kernel#dup and Kernel#clone to then use that to just return self.
If a class is marked as immutable it should be guaranteed to be deeply frozen (otherwise it's incorrect to return self for dup/clone), so we should actually deep-freeze after the custom #freeze is called from .new:

def ImmutableClass.new(*args)
  obj = super(*args)
  obj.freeze
  Primitive.deep_freeze(obj) # not a call, some known function of the VM
end

That way we can know this object is truly immutable from the runtime point of view as well and e.g., can be passed to another Ractor.
Primitive.deep_freeze(obj) would set a flag on the object so it's fast to check if the object is immutable later on.

Updated by Eregon (Benoit Daloze) 17 days ago

I forgot to mention, it's also much better if all instances of a class (and potential subclasses) are immutable, if only part of the instances it's quickly confusing and most of the advantages disappear as the class is no longer truly immutable.
This is currently the case for Range and Regexp, I think and we should solve that by making all Range&Regexp instances frozen not just literals.
For String we probably need to keep it supporting both mutable and immutable for compatibility.

Updated by ko1 (Koichi Sasada) 1 day ago

ioquatix (Samuel Williams) wrote in #note-17:

ko1 (Koichi Sasada) changed Process::Status to be frozen by default. What is the logic? What is the problem we are trying to solve by doing this? Is it to make things sharable by Ractor?

Yes.

Actions

Also available in: Atom PDF