Feature #18498
closedIntroduce a public WeakKeysMap that compares by equality
Description
This is a clean take on #16038
Spec¶
Weak keys only¶
After a chat with @Eregon (Benoit Daloze) we believe that what would make the most sense and would be most useful would be
a "WeakKeysMap". Meaning the keys would be weak references, but the values would be strong references.
This behavior is also consistent with the terminology in other popular languages such as Javascript and Java were WeakMap
only have weak keys.
By default it would use equality semantic, just like a regular Hash
. Having an option to compare by indentity instead
could be useful, but not strictly required.
Immediate objects support¶
Many WeakMap implementation in other languages don't accept "immediate" objects (or their equivalent) in the weak slots.
This is because since they are never collected, the weak reference will never possibly expire.
ObjectSpace::WeakMap
currently accept them, but since both keys and values are weak references, there is legitimate use.
However in a WeakKeysMap
, using an immediate object as key should likely raise a TypeError
.
What is less clear is wether BIGNUM
(allocated Integer
) and dynamic symbols (allocated Symbol
) should be accepted.
I believe they shouldn't for consistency, so:
- No immediate objects
- No
Integer
- No
Symbol
member
method to lookup an existing key
For some use case, notably deduplication sets, a member
method that maps st_get_key
would be useful.
def member(key) -> existing key or nil
Not sure if member
is the best possible name, but #key
is already used for looking up a key by value.
That same method could be useful on Hash
and Set
as well, but that would be a distinct feature request.
Naming¶
Possible names:
::WeakMap
::WeakKeysMap
::WeakRef::Map
::WeakRef::WeakMap
::WeakRef::WeakKeysMap
::WeakRef::WeakHash
::WeakRef::WeakKeysHash
My personal, ligthly held, preference goes toward ::WeakRef::WeakKeysMap
.
Use cases¶
WeakSet¶
A WeakKeysMap
would be a good enough primitive to implement a WeakSet
in pure Ruby code, just like Set
is
implemented with a Hash
.
WeakSet
are useful for use cases such as avoiding cycles in an object graph without holding strong references.
Deduplication sets¶
Assuming WeakMap
have the #member
method.
A small variation on the "deduplicating constructors" use case, better suited for smaller but numerous objects.
The difference here is that we first build the object and then lookup for a pre-existing one. This is the
strategy used to deduplicate Active Record schema metadata.
class DeduplicationSet
def initialize
@set = WeakKeysMap.new
end
def dedup(object)
if existing_object = @set.member(object)
existing_object
else
@set[object] = true
object
end
end
end
Third party object extension¶
When you need to record some associated data on third party objects for which you don't control the lifetime.
A WeakMap can be used:
METADATA = WeakKeysMap.new
def do_something(third_party_object)
metadata = (METADATA[third_party_object] ||= Metadata.new)
metadata.foo = "bar"
end
Updated by byroot (Jean Boussier) about 3 years ago
- Related to Feature #16038: Provide a public WeakMap that compares by equality rather than by identity added
Updated by byroot (Jean Boussier) about 3 years ago
- Description updated (diff)
After another quick chat with @Eregon (Benoit Daloze) I removed one of the use cases because I made a mistake, it does require a WeakValuesMap
. Leaving it in comment for posterity
Deduplicating constructors.¶
Can be used by large value object classes to automatically re-use existing instances:
class SomeValueObject
CACHE = WeakKeysMap.new
def self.build(attributes)
CACHE[attributes] ||= new(attributes).freeze # The instance hold the reference
end
def initialize(attributes)
@attributes = attributes
... # compute some expensive or large data
end
...
end
Updated by Eregon (Benoit Daloze) about 3 years ago
This looks great to me.
I prefer WeakKeysMap
. WeakRef::WeakKeysMap
sounds fine too.
Anything with Hash
in the name is IMHO not great because Hash implies ordering and Map no ordering, per well-established concurrent-ruby classes (Concurrent::Hash and Concurrent::Map).
Having WeakKeys
in the name seems best/useful, as there might be a WeakValuesMap
in the future.
I believe WeakKeysMap#dedup
should be a core method (i.e., predefined on WeakKeysMap), that way it can be guaranteed thread-safe on all Ruby implementations with minimal overhead.
Doing it manually with Mutex when defined in Ruby would have a significant overhead.
ObjectSpace::WeakMap is already thread-safe on CRuby, JRuby and TruffleRuby so it only makes sense WeakKeysMap is thread-safe as well.
WeakKeysMap#dedup
also makes that use-case much easier as the user does not need to define dedup
themselves.
I would prefer no compare_by_identity
support for simplicity.
If compare_by_identity
is included, such choice should be made when creating the WeakKeysMap (e.g. WeakKeysMap.new(compare_by_identity: true)
), not later as that causes a whole lot of mess implementation-wise and makes the semantics unclear.
Updated by ko1 (Koichi Sasada) almost 3 years ago
Do you think this Hash
compatible? For example, default_proc
(I think it is too much)?
Updated by ko1 (Koichi Sasada) almost 3 years ago
ko1 (Koichi Sasada) wrote in #note-4:
Do you think this
Hash
compatible? For example,default_proc
(I think it is too much)?
ObjectSpace::WeakMap
is good starting point.
Updated by byroot (Jean Boussier) almost 3 years ago
Do you think this Hash compatible? For example, default_proc (I think it is too much)?
Yes, I'd rather not ask for methods that may cause difficulties in the future.
Updated by Dan0042 (Daniel DeLorme) almost 3 years ago
If ObjectSpace::WeakMap had a #compare_by_identity that could be set to false, wouldn't this be equivalent to this WeakKeysMap? (as long as you use non-GC-able values only)
Updated by byroot (Jean Boussier) almost 3 years ago
as long as you use non-GC-able values only
Then yes. But it's rarely convenient.
Updated by Dan0042 (Daniel DeLorme) almost 3 years ago
byroot (Jean Boussier) wrote in #note-8:
Then yes. But it's rarely convenient.
Can you elaborate? Because the DeduplicationSet example has @set[object] = true
, which is exactly a non-GC-able value.
And the METADATA example is not so convincing since it makes more sense to have metadata per object (identity); I can't imagine a use case where you'd want to share metadata for objects that are eql.
Updated by byroot (Jean Boussier) almost 3 years ago
I can't imagine a use case where you'd want to share metadata for objects that are eql.
As you certainly know, Object#==
defaults to Object#equal?
, so generally equality hashes are workable in scenarios where you actually need identity, the opposite isn't true. Hence why equality by default make sense.
Additionally ObjectSpace::WeakMap
is not really made to be public and is specifically tailored to be the backup store for WeakRef
.
I can't imagine a use case where you'd want to share metadata for objects that are eql.
A common case would be something like CACHE = { "https://example.com/foo" => "HTML_BODY" }
. Various "workers" could share such "http cache" without worrying about the lifecycle of the objects etc.
Same thing if you want to safely store some thread local data: DATA = { Thread.current => {key: "value"}}
.
Updated by matz (Yukihiro Matsumoto) almost 3 years ago
I agree with introducing a map (hash) with each key referenced weakly. It should be in the core since weak references should be tightly integrated with the garbage collector.
Since the current WeakMap
is hard to implement in JRuby, I also agree with making the existing WeakMap
obsolete (gradually).
We need to determine the detail:
-
Where to place the class: we have several options. I vote for putting it under
ObjectSpace
as the currentWeakMap
-
The name of the class: OP proposed
WeakKeysMap
. I preferWeakKeyMap
probably because my native language (Japanese) does not have noun plurality. But considering the class is "a map with each key referenced weakly", stillWeakKeyMap
looks better to me. -
Behavior: we should start with minimal set of methods. How about:
ObjectSpace::WeakKeyMap#[](key)
ObjectSpace::WeakKeyMap#[]=(key, val)
ObjectSpace::WeakKeyMap#getkey(key)
ObjectSpace::WeakKeyMap#inspect #=> #<ObjectSpace::WeakKeyMap size=100>
Matz.
Updated by byroot (Jean Boussier) almost 3 years ago
Thank you Matz!
I vote for putting it under ObjectSpace
I think it's fine. It's an advanced class, no need to pollute the main namespace with it.
we should start with minimal set of methods.
Agreed. A couple points though:
- I agree that
#each
isn't desirable, and tricky to implement correctly. - However
#keys
would be simpler to implement, and very useful for debugging purposes. If we can't introspect that is in the Hash, some bugs might be hard to resolve.
From the meeting notes:
mame: “Third party object extension” might be impossible without compare_by_identity. We need to confirm the OP if it is okay
I don't think it's a problem in practice. Except for "value objects" ==
usually has identity semantic. I wouldn't object to a compare_by_identity
option, but I don't think it's a dealbreaker if it's not there.
ko1: key(key)?
That would have been my preferred name too, but it's already defined on Hash
with a different semantic, so I think it would be confusing. #getkey(key)
is a good compromise I think.
Updated by byroot (Jean Boussier) almost 3 years ago
I took a stab at implementing the current spec: https://github.com/ruby/ruby/pull/5570
Methods¶
ObjectSpace::WeakKeyMap#[](key)
ObjectSpace::WeakKeyMap#[]=(key, val)
ObjectSpace::WeakKeyMap#getkey(key)
ObjectSpace::WeakKeyMap#size
ObjectSpace::WeakKeyMap#clear
ObjectSpace::WeakKeyMap#inspect #=> #<ObjectSpace::WeakKeyMap size=100>
For #[]=
:¶
- Trying to use an immediate as a key fail with
ArgumentError
- For consistency,
BigNum
and dynamic Symbol also raiseArgumentError
- Using an object without a
#hash
method raiseNoMethodError
like a regular hash.
Notes¶
For simplicity, I took a shortcut. When a key is finalized we can no longer hash it, so I traverse the table in O(N)
and delete entries pointing to the same object.
I'm not certain yet what's the best solution for this, but I assume we can record the hash inside a secondary table
with identity semantic.
But this is good enough to experiment with the defined interface and see if there is problems with the spec itself.
Updated by mame (Yusuke Endoh) almost 3 years ago
What is #size
method needed for? It may be error-prune to depend on a return value of the method because the size may change immediately after the method is called. For debugging, #inspect
includes the size information. For memory usage analysis, ObjectSpace.memsize_of
would be a better API.
Updated by byroot (Jean Boussier) almost 3 years ago
@mame (Yusuke Endoh), I think you convinced me, I'll remove it.
Updated by matz (Yukihiro Matsumoto) about 2 years ago
LGTM. Probably too late for 3.2. For 3.3 maybe?
Matz.
Updated by byroot (Jean Boussier) almost 2 years ago
- Status changed from Open to Closed
Applied in changeset git|2a5354e59324cb296a423c73ec15ff9191086964.
Implement ObjectSpace::WeakKeyMap basic allocator
[Feature #18498]