Project

General

Profile

Feature #17145

Ractor-aware `Object#deep_freeze`

Added by marcandre (Marc-Andre Lafortune) about 2 months ago. Updated 1 day ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:99885]

Description

I'd like to propose Object#deep_freeze:

Freezes recursively the contents of the receiver (by calling deep_freeze) and
then the receiver itself (by calling freeze).
Values that are shareable via Ractor (e.g. classes) are never frozen this way.

# freezes recursively:
ast = [:hash, [:pair, [:str, 'hello'], [:sym, :world]]].deep_freeze
ast.dig(1, 1) # => [:str, 'hello']
ast.dig(1, 1).compact! # => FrozenError

# does not freeze classes:
[[String]].deep_freeze
String.frozen? # => false

# calls `freeze`:
class Foo
  def freeze
    build_cache!
    puts "Ready for freeze"
    super
  end
  # ...
end
[[[Foo.new]]].deep_freeze # => Outputs "Ready for freeze"

I think a variant deep_freeze! that raises an exception if the result isn't Ractor-shareable would be useful too:

class Fire
  def freeze
    # do not call super
  end
end

x = [Fire.new]
x.deep_freeze! # => "Could not be deeply-frozen: #<Fire:0x00007ff151994748>"

Related issues

Related to Ruby master - Feature #2509: Recursive freezing?Rejectedmatz (Yukihiro Matsumoto)12/21/2009Actions
Related to Ruby master - Feature #17100: Ractor: a proposal for a new concurrent abstraction without thread-safety issuesClosedko1 (Koichi Sasada)Actions
Related to Ruby master - Feature #17274: Ractor.make_shareable(obj)OpenActions
#1

Updated by marcandre (Marc-Andre Lafortune) about 2 months ago

#2

Updated by Eregon (Benoit Daloze) about 2 months ago

  • Related to Feature #17100: Ractor: a proposal for a new concurrent abstraction without thread-safety issues added

Updated by Eregon (Benoit Daloze) about 2 months ago

A dynamic call to freeze causes extra calls, and needs checks that it was indeed frozen.
So for efficiency I think it would be better to mark as frozen internally without a call to freeze on every value.

Also a leaf freeze call could technically mutate the object referencing it and e.g., add a new @ivar, which would make it complicated to ensure everything is frozen
(would need to mark as shallow-frozen first to prevent that, and only as deep-frozen once all contained values are deeply-frozen).

Is there a compelling reason to call a user-defined freeze for every value?

Updated by ko1 (Koichi Sasada) about 2 months ago

One concern about the name "freeze" is, what happens on shareable objects on Ractors.
For example, Ractor objects are shareable and they don't need to freeze to send beyond Ractor boundary.

I also want to introduce Mutable but shareable objects using STM (or something similar) writing protocol (shareable Hash). What happens on deep_freeze?

Updated by Eregon (Benoit Daloze) about 2 months ago

Maybe we should have a method that ensure an object graph is shareable?
Not sure what a good name for that would be (shareable is hard to spell).
So that would noop for special shareable objects like you say, and freeze the rest.
Note that a shareable Hash or so would need to only allow shareable key/value/elements whenever writing to it.

Updated by marcandre (Marc-Andre Lafortune) about 2 months ago

Eregon (Benoit Daloze) wrote in #note-3:

A dynamic call to freeze causes extra calls

Yes

and needs checks that it was indeed frozen.

That won't add any noticeable overhead

Is there a compelling reason to call a user-defined freeze for every value?

Yes. Some objects may need to calculate lazy operations (@cache ||= potentially_long_calculation)

Updated by marcandre (Marc-Andre Lafortune) about 2 months ago

ko1 (Koichi Sasada) wrote in #note-4:

One concern about the name "freeze" is, what happens on shareable objects on Ractors.
For example, Ractor objects are shareable and they don't need to freeze to send beyond Ractor boundary.

I also want to introduce Mutable but shareable objects using STM (or something similar) writing protocol (shareable Hash). What happens on deep_freeze?

I think these objects should stop the propagation. The name make_shareable_via_ractor_by_deep_freezing_what_is_necessary would be more accurate but too long πŸ˜† Maybe ractorable?

Updated by ko1 (Koichi Sasada) about 2 months ago

I think these objects should stop the propagation. The name make_shareable_via_ractor_by_deep_freezing_what_is_necessary would be more accurate but too long πŸ˜†
Maybe ractorable?

For Ractor, it is very clear. Another candidate is to_shareable?

However, leave from Ractor's perspective, it is strange name, like:

CONST = [...].to_shareable

Maybe the author don't want to care about Ractor.
The author want to declare "I don't touch it". So "deep_freeze" is better.

hmmm.

Updated by duerst (Martin DΓΌrst) about 2 months ago

marcandre (Marc-Andre Lafortune) wrote in #note-7:

I think these objects should stop the propagation. The name make_shareable_via_ractor_by_deep_freezing_what_is_necessary would be more accurate but too long πŸ˜† Maybe ractorable?

What about ractor_freeze?

Updated by Eregon (Benoit Daloze) about 1 month ago

Maybe obj.shareable? And that would then return as-is if already shareable, and make it shareable if not (deep freezing until it reaches already-shareable objects)

I don't like anything with "ractor" in the name, that becomes not descriptive of what it does and IMHO looks weird for e.g. gems not specifically caring about Ractor.

How about first having deep_freeze that just freezes everything (except an object's class)?

And then maybe an extra method to deal with already-shareable (if needed), which is probably much less frequently needed.
(e.g. it seems rather uncommon that a module constant contains a Ractor reference)

The intention of deep_freeze is I think clear: make this object and whatever it refers to immutable, so it can be shared freely via reference without any copying.
And it applies to many other things besides just Ractor, so it seems a good functionality to have in general.
For instance, https://github.com/dkubb/ice_nine already exists, can define #deep_freeze and is widely used.

Updated by marcandre (Marc-Andre Lafortune) about 1 month ago

ko1 (Koichi Sasada) wrote in #note-8:

For Ractor, it is very clear. Another candidate is to_shareable

Methods to_... should be reserved for conversion methods, i.e. methods that may return a different object than the receiver. What I am proposing would always return the receiver. to_shareable would only be an acceptable name if it returned a deeply frozen copy, but I don't think that's what we need most.

Eregon (Benoit Daloze) wrote in #note-10:

Maybe obj.shareable?

shareable is good. It's more accurate than deep_freeze if we think of ractor sharable structures.

Another alternative would be freeze(shareable: true). It has the advantage of no possible conflict. It could also raise if it fails to produce a shareable object, which should be rare enough to warrant an exception. Something like shareable: :try could provide a non-raising alternative.

Updated by kirs (Kir Shatrov) 23 days ago

I'd really like to support this change for reasons I've describe in https://bugs.ruby-lang.org/issues/17180

We can take something as simple as URI::parse("http://example.com") as an example. Right now that can't be used from a Ractor because URI::parse refers to a few internal constants like URI::RFC3986_PARSER that are not frozen (and thus not sharable).

If we went with an explicit deep_freeze we might need to update all those constants like URI::RFC3986_PARSER in the standard library to be also call deep_freeze. That doesn't have to be a bad thing but something to be aware of.

Updated by marcandre (Marc-Andre Lafortune) 22 days ago

Maybe we can start with Ractor.shareable(obj)?

The major reason why we need it is that there is no way to create a Ractor sharable recursive structure currently.

child = {type: :foo}
node = {type: bar, children: [child].freeze}.freeze
child[:parent] = node
child.freeze

At this point, everything is frozen, but neither child nor node are Ractor sharable.

Ractor.shareable(node) would check for this, and set both node and child's "shareable" flag; after that Ractor.shareable?(node) would return true.

API-wise, Ractor.shareable(node) might be a good start, or node.freeze(shareable: true), or node.deep_freeze

Updated by ko1 (Koichi Sasada) 5 days ago

I implemented Object#deep_freeze(skip_shareable: false) for trial.
https://github.com/ko1/ruby/pull/new/deep_freeze

  • It doesn't call #freeze, but only set frozen flag.
  • It set frozen bit for any objects such as Binding, Thread, Queue and so on, but no effect except adding ivars, etc.
  • Two pass freezing method
    • (1) collect reachable objects
    • (2) freeze collected objects

This two pass strategy allows errors during collection phase (1). For example, if we introduce unable to "freeze" objects, we can stop collection and raise some error without any side-effect. However, if we call #freeze, we can't cancel intermediate state (some objects are frozen, some are not). I have no idea how to solve it, if we need to call freeze for each instance.

Object#deep_freeze(skip_shareable: true) skips shareable objects to freeze. Maybe it is ractor_freeze, shareable and so on. I don't have strong opinion about this naming, but I try to unify with Object#deep_freeze.

Updated by ko1 (Koichi Sasada) 5 days ago

Maybe we need to introduce new protocol for T_DATA.

For example...:

  • Time should be frozen.
  • Thread, Fiber should not be frozen. At least they can't become shareable objects.
  • Proc and Binding objects are also not frozen (error on lvar assignment for frozen?)
  • ... maybe many others ...

To make conservative, all of them are unable to freeze, or allow freezing but not shareable objects.

Updated by marcandre (Marc-Andre Lafortune) 5 days ago

ko1 (Koichi Sasada) wrote in #note-14:

I implemented Object#deep_freeze(skip_shareable: false) for trial.
https://github.com/ko1/ruby/pull/new/deep_freeze

  • It doesn't call #freeze, but only set frozen flag.
  • It set frozen bit for any objects such as Binding, Thread, Queue and so on, but no effect except adding ivars, etc.
  • Two pass freezing method
    • (1) collect reachable objects
    • (2) freeze collected objects

This two pass strategy allows errors during collection phase (1). For example, if we introduce unable to "freeze" objects, we can stop collection and raise some error without any side-effect. However, if we call #freeze, we can't cancel intermediate state (some objects are frozen, some are not). I have no idea how to solve it, if we need to call freeze for each instance.

Object#deep_freeze(skip_shareable: true) skips shareable objects to freeze. Maybe it is ractor_freeze, shareable and so on. I don't have strong opinion about this naming, but I try to unify with Object#deep_freeze.

This is great! πŸ’ͺ

Two major issues remain:
1) This does not work for recursive structures, in the sense that they will not be marked as Ractor shareable. This is capital, particularly since it is the (only) part of this method that can not be done in pure Ruby.

2) There is no callback mechanism for user classes (since this does not call #freeze). Typical use case would be for expensive calculations that are done lazily. A class may want to pre-build them before caching. There should be some callback mechanism. My opinion is that we should worry about things working well for well-programmed cases before we worry about exceptions (e.g. cases that fail do deeply freeze).
I think that freezing a class "behind its back" by not calling #freeze breaks the contract and could be a source of incompatibility.
I see no incompatibility with the two pass system and calling #freeze on the list of reachable objects. I have no issue with a structure being half frozen if things fail; the developper was ready to deep freeze all of it, so having only half frozen , even though it clearly was not the intention, does not feel like a big issue.

Updated by ko1 (Koichi Sasada) 5 days ago

marcandre (Marc-Andre Lafortune) wrote in #note-16:

1) This does not work for recursive structures, in the sense that they will not be marked as Ractor shareable. This is capital, particularly since it is the (only) part of this method that can not be done in pure Ruby.

Current implementation doesn't mark as shareable, but we can make it. The problem is what happens on half-frozen case... Half-shareable is not acceptable. If half-frozen is acceptable, freezing with traversing, and mark as shareable at second phase is possible.

2) There is no callback mechanism for user classes (since this does not call #freeze). Typical use case would be for expensive calculations that are done lazily. A class may want to pre-build them before caching. There should be some callback mechanism. My opinion is that we should worry about things working well for well-programmed cases before we worry about exceptions (e.g. cases that fail do deeply freeze).
I think that freezing a class "behind its back" by not calling #freeze breaks the contract and could be a source of incompatibility.
I see no incompatibility with the two pass system and calling #freeze on the list of reachable objects. I have no issue with a structure being half frozen if things fail; the developper was ready to deep freeze all of it, so having only half frozen , even though it clearly was not the intention, does not feel like a big issue.

$ gem-codesearch 'def freeze' | gist -p => https://gist.github.com/ko1/63b8d43218884249e782d63c2f27b927

I never realized that so many freeze redefinition are used. Checking the code, some of them freeze attribute objects, which they are frozen with deep_freeze. I can find some cases to calculate lazy (?).

mmm. Maybe this method is used with literals, so there is a chance to optimize for such cases.

I still not sure we can remain the half-frozen state, so I want to ask other comments.

Updated by marcandre (Marc-Andre Lafortune) 5 days ago

ko1 (Koichi Sasada) wrote in #note-17:

I never realized that so many freeze redefinition are used. Checking the code, some of them freeze attribute objects, which they are frozen with deep_freeze. I can find some cases to calculate lazy (?).

Indeed, I imagine that the majority are used to do a deep-freeze, so calling #freeze or not will not make a difference.

An example of lazy computation I can remember writing is a generic memoization module in DeepCover which is then used in different places. Yes, I'm lazy enough that I factorized the @cache ||= pattern πŸ˜†

I still not sure we can remain the half-frozen state, so I want to ask other comments.

How about 3 pass?
1) Collect reachable objects; abort on non-freezable.
2) Call #freeze on reachable objects. Exception due to custom method failing => half frozen state (just write better code πŸ˜…)
3) On success, mark all objects as Ractor shareable.

Updated by Eregon (Benoit Daloze) 4 days ago

ko1 (Koichi Sasada) wrote in #note-14:

I implemented Object#deep_freeze(skip_shareable: false) for trial.

This sounds good to me.

IMHO if something fails in the middle, it's OK for part of it to be frozen.
The intention was to freeze the whole object graph anyway.
The code using the object graph after deep_freeze assumes it was frozen, there shouldn't be any issue if some of it is frozen, the code expected that.
(also such code would only run if the exception is caught)

In some languages with unification (I know of Oz), it's also accepted that if it fails in the middle, then the partial state is still there and not magically undone.

I don't think we should call user methods, that will require a lot more scanning (before & after calling user freeze), and if some userfreeze would create new objects and attach them to the object graph, deep_freeze could never end.
Is there a concrete case where it is problematic to not call user's freeze?

I think the design should allow to do everything in a single pass for efficiency, and notably use the "deep frozen/shareable" bit as a "already visited object" marker.
This is also how I implemented "Deep Sharing" during my PhD in https://eregon.me/blog/assets/research/thread-safe-objects.pdf section 5.3, and I think we should design deep_freeze to allow for the same optimization to be used because it's a large performance gain compared to a generic routine.

Updated by Eregon (Benoit Daloze) 4 days ago

A draft to make this in a single pass (but it's late, I might have missed something):

  • if the object is already deeply-frozen/shareable, return, otherwise:

  • mark an object as frozen if not already (so the reachable objects from it don't change)

  • iterate children

  • mark the object as deeply frozen

but this doesn't handle recursion.

So we could mark as deeply frozen first, and remember to undo that if we cannot freeze some object.
However, is there any object that cannot be frozen? I would think not.

So:

  • if the object is already deeply-frozen/shareable, return, otherwise:

  • mark an object as deeply frozen

  • iterate children and recurse

Since that wouldn't call user methods, there is no need to worry about something observing an object marked as deeply frozen but not actually deeply frozen yet.
However, a different Thread could see that, so it can be a problem without a GIL.

So, I think we could use an extra bit for "visited by deep_freeze":

  • if the object is already deeply-frozen/shareable or deep_freeze-visited, return, otherwise:

  • mark an object as deep_freeze-visited and as frozen if not already (so the reachable objects from it don't change)

  • iterate children and recurse

  • mark an object as deeply frozen

Updated by marcandre (Marc-Andre Lafortune) 4 days ago

Eregon (Benoit Daloze) wrote in #note-19:

Is there a concrete case where it is problematic to not call user's freeze?

I thought I already answered that, but here's a short example:

def Stats
  # ...

  def freeze
    extended_results # build cache if need be
    super
  end

  def extended_results
    @extended_results_cache ||= calc_extended_results
  end

  private
  def calc_extended_results
    # lots of calculations
  end
end

There is currently no way for #frozen? to be true (and @instance ||= to fail) without going through the #freeze method. I think we should not break that, especially if there is no callback mechanism to circumvent this.

Updated by marcandre (Marc-Andre Lafortune) 4 days ago

Looking at def freeze in the top ~400 gems, I found 64 in sequel gem alone, and 28 definitions in the rest πŸ˜….

Excluding sequel, half do deep freeze. The other use cases:

Cache prebuilding:
https://github.com/rails/rails/blob/master/activesupport/lib/active_support/time_with_zone.rb#L503-L507
https://github.com/jeremyevans/sequel/blob/master/lib/sequel/adapters/mysql.rb#L150
https://github.com/mongodb/mongoid/blob/master/lib/mongoid/criteria.rb#L239
https://github.com/sporkmonger/addressable/blob/master/lib/addressable/uri.rb#L846-L858
https://github.com/sporkmonger/addressable/blob/master/lib/addressable/template.rb#247-L250

Instance variable dupping + freeze:
https://github.com/rails/rails/blob/master/activemodel/lib/active_model/attributes.rb#L118
https://github.com/rails/rails/blob/master/activerecord/lib/active_record/core.rb#L494

Another usecase is lazy defined methods (I presume like OpenStruct before the recent changes):
https://github.com/jeremyevans/sequel/blob/master/lib/sequel/plugins/finder.rb#L167
https://github.com/solnic/virtus/blob/master/liblib/virtus/instance_methods.rb#L148

Cases not actually freezing the receiver (?):
https://github.com/mongodb/mongoid/blob/master/lib/mongoid/document.rb#L55-L69
https://github.com/datamapper/dm_core/blob/master/lib/dm-core/support/lazy_array.rb#L300-L313
https://github.com/dtao/safe_yaml/blob/master/lib/safe_yaml/transform/transformation_map.rb#L24
https://github.com/paulelliott/fabrication/blob/master/lib/fabrication/schematic/manager.rb#L23

I've been wanting for a long while to propose an API for caching methods, and that could be made Ractor compatible and would resolve most of these cases, but maybe it's too early to consider it?

I was thinking cache_method :foo could "magically" cache the result of foo on first call (per Ractor), store it and return that in the future, with no possibility to invalidate that cache or guarantee that the method foo couldn't be called a few times in case of race conditions.

Updated by Dan0042 (Daniel DeLorme) 4 days ago

marcandre (Marc-Andre Lafortune) wrote in #note-22:

I've been wanting for a long while to propose an API for caching methods, and that could be made Ractor compatible and would resolve most of these cases

+1
The @var ||= expr idiom works well for most cases of memoization, but breaks down for nil values and frozen objects. It's easy enough to handle these cases with a small memoization library (and in fact I do so in my code). But since frozen objects are now going to be everywhere because of ractor, I think it's time to introduce this in core.

#24

Updated by Eregon (Benoit Daloze) 3 days ago

Updated by Eregon (Benoit Daloze) 3 days ago

marcandre (Marc-Andre Lafortune) Thanks, I didn't get the use-case from the DeepCover above (I just missed the freeze definition).

I think the last variant of #20 works too if we call user freeze instead of internal freeze.
As long we call it before iterating ivars/reachable_objects of that object, we should be fine.

Updated by ko1 (Koichi Sasada) 1 day ago

Sorry I misunderstood this proposal.
Ractor.make_shareable(obj) proposal (#17274) satisfies the functionality proposed here.
(I thought deep_freeze(skip_shareable: false) was the proposal. sorry I skipped to read).

We discussed about the name "deep_freeze", and Matz said deep_freeze should be only for freezing, not related to Ractor. So classes/module should be frozen if [C].deep_freeze. This is why I proposed a Object#deep_freeze(skip_shareable: true) and Ractor.make_shareable(obj).

Anyway maybe the functionality should be okay (calling freeze by make_shareable proposal).

So naming issue is reamained?

  • Object#deep_freeze (matz doesn't like it)
  • Object#deep_freeze(skip_sharable: true) (I don't know how Matz feel. And it is difficult to define Class/Module/... on skip_sharable: false)
  • Ractor.make_shareable(obj) (clear for me, but it is a bit long)
  • Ractor.shareable!(obj) (shorter. is it clear?)
  • Object#shareable! (is it acceptable?)
  • ... other ideas?

Also available in: Atom PDF