Project

General

Profile

Actions

Feature #18498

open

Introduce a public WeakKeysMap that compares by equality

Added by byroot (Jean Boussier) 4 months ago. Updated 3 months ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:107190]

Description

This is a clean take on #16038

Spec

Weak keys only

After a chat with @Eregon (Benoit Daloze) we believe that what would make the most sense and would be most useful would be
a "WeakKeysMap". Meaning the keys would be weak references, but the values would be strong references.
This behavior is also consistent with the terminology in other popular languages such as Javascript and Java were WeakMap only have weak keys.

By default it would use equality semantic, just like a regular Hash. Having an option to compare by indentity instead
could be useful, but not strictly required.

Immediate objects support

Many WeakMap implementation in other languages don't accept "immediate" objects (or their equivalent) in the weak slots.
This is because since they are never collected, the weak reference will never possibly expire.

ObjectSpace::WeakMap currently accept them, but since both keys and values are weak references, there is legitimate use.

However in a WeakKeysMap, using an immediate object as key should likely raise a TypeError.

What is less clear is wether BIGNUM (allocated Integer) and dynamic symbols (allocated Symbol) should be accepted.

I believe they shouldn't for consistency, so:

  • No immediate objects
  • No Integer
  • No Symbol

member method to lookup an existing key

For some use case, notably deduplication sets, a member method that maps st_get_key would be useful.

def member(key) -> existing key or nil

Not sure if member is the best possible name, but #key is already used for looking up a key by value.

That same method could be useful on Hash and Set as well, but that would be a distinct feature request.

Naming

Possible names:

  • ::WeakMap
  • ::WeakKeysMap
  • ::WeakRef::Map
  • ::WeakRef::WeakMap
  • ::WeakRef::WeakKeysMap
  • ::WeakRef::WeakHash
  • ::WeakRef::WeakKeysHash

My personal, ligthly held, preference goes toward ::WeakRef::WeakKeysMap.

Use cases

WeakSet

A WeakKeysMap would be a good enough primitive to implement a WeakSet in pure Ruby code, just like Set is
implemented with a Hash.

WeakSet are useful for use cases such as avoiding cycles in an object graph without holding strong references.

Deduplication sets

Assuming WeakMap have the #member method.

A small variation on the "deduplicating constructors" use case, better suited for smaller but numerous objects.
The difference here is that we first build the object and then lookup for a pre-existing one. This is the
strategy used to deduplicate Active Record schema metadata.

class DeduplicationSet
  def initialize
    @set = WeakKeysMap.new
  end

  def dedup(object)
    if existing_object = @set.member(object)
      existing_object
    else
      @set[object] = true
      object
    end
  end
end

Third party object extension

When you need to record some associated data on third party objects for which you don't control the lifetime.
A WeakMap can be used:

METADATA = WeakKeysMap.new
def do_something(third_party_object)
  metadata = (METADATA[third_party_object] ||= Metadata.new)
  metadata.foo = "bar"
end

Related issues 1 (0 open1 closed)

Related to Ruby master - Feature #16038: Provide a public WeakMap that compares by equality rather than by identity ClosedActions
Actions #1

Updated by byroot (Jean Boussier) 4 months ago

  • Related to Feature #16038: Provide a public WeakMap that compares by equality rather than by identity added

Updated by byroot (Jean Boussier) 4 months ago

  • Description updated (diff)

After another quick chat with @Eregon (Benoit Daloze) I removed one of the use cases because I made a mistake, it does require a WeakValuesMap. Leaving it in comment for posterity

Deduplicating constructors.

Can be used by large value object classes to automatically re-use existing instances:

class SomeValueObject
  CACHE = WeakKeysMap.new

  def self.build(attributes)
    CACHE[attributes] ||= new(attributes).freeze # The instance hold the reference
  end

  def initialize(attributes)
    @attributes = attributes
    ... # compute some expensive or large data
  end

  ...
end

Updated by Eregon (Benoit Daloze) 4 months ago

This looks great to me.

I prefer WeakKeysMap. WeakRef::WeakKeysMap sounds fine too.
Anything with Hash in the name is IMHO not great because Hash implies ordering and Map no ordering, per well-established concurrent-ruby classes (Concurrent::Hash and Concurrent::Map).
Having WeakKeys in the name seems best/useful, as there might be a WeakValuesMap in the future.

I believe WeakKeysMap#dedup should be a core method (i.e., predefined on WeakKeysMap), that way it can be guaranteed thread-safe on all Ruby implementations with minimal overhead.
Doing it manually with Mutex when defined in Ruby would have a significant overhead.
ObjectSpace::WeakMap is already thread-safe on CRuby, JRuby and TruffleRuby so it only makes sense WeakKeysMap is thread-safe as well.
WeakKeysMap#dedup also makes that use-case much easier as the user does not need to define dedup themselves.

I would prefer no compare_by_identity support for simplicity.
If compare_by_identity is included, such choice should be made when creating the WeakKeysMap (e.g. WeakKeysMap.new(compare_by_identity: true)), not later as that causes a whole lot of mess implementation-wise and makes the semantics unclear.

Updated by ko1 (Koichi Sasada) 3 months ago

Do you think this Hash compatible? For example, default_proc (I think it is too much)?

Updated by ko1 (Koichi Sasada) 3 months ago

ko1 (Koichi Sasada) wrote in #note-4:

Do you think this Hash compatible? For example, default_proc (I think it is too much)?

ObjectSpace::WeakMap is good starting point.

Updated by byroot (Jean Boussier) 3 months ago

Do you think this Hash compatible? For example, default_proc (I think it is too much)?

Yes, I'd rather not ask for methods that may cause difficulties in the future.

Updated by Dan0042 (Daniel DeLorme) 3 months ago

If ObjectSpace::WeakMap had a #compare_by_identity that could be set to false, wouldn't this be equivalent to this WeakKeysMap? (as long as you use non-GC-able values only)

Updated by byroot (Jean Boussier) 3 months ago

as long as you use non-GC-able values only

Then yes. But it's rarely convenient.

Updated by Dan0042 (Daniel DeLorme) 3 months ago

byroot (Jean Boussier) wrote in #note-8:

Then yes. But it's rarely convenient.

Can you elaborate? Because the DeduplicationSet example has @set[object] = true, which is exactly a non-GC-able value.

And the METADATA example is not so convincing since it makes more sense to have metadata per object (identity); I can't imagine a use case where you'd want to share metadata for objects that are eql.

Updated by byroot (Jean Boussier) 3 months ago

I can't imagine a use case where you'd want to share metadata for objects that are eql.

As you certainly know, Object#== defaults to Object#equal?, so generally equality hashes are workable in scenarios where you actually need identity, the opposite isn't true. Hence why equality by default make sense.

Additionally ObjectSpace::WeakMap is not really made to be public and is specifically tailored to be the backup store for WeakRef.

I can't imagine a use case where you'd want to share metadata for objects that are eql.

A common case would be something like CACHE = { "https://example.com/foo" => "HTML_BODY" }. Various "workers" could share such "http cache" without worrying about the lifecycle of the objects etc.

Same thing if you want to safely store some thread local data: DATA = { Thread.current => {key: "value"}}.

Updated by matz (Yukihiro Matsumoto) 3 months ago

I agree with introducing a map (hash) with each key referenced weakly. It should be in the core since weak references should be tightly integrated with the garbage collector.

Since the current WeakMap is hard to implement in JRuby, I also agree with making the existing WeakMap obsolete (gradually).

We need to determine the detail:

  • Where to place the class: we have several options. I vote for putting it under ObjectSpace as the current WeakMap
  • The name of the class: OP proposed WeakKeysMap. I prefer WeakKeyMap probably because my native language (Japanese) does not have noun plurality. But considering the class is "a map with each key referenced weakly", still WeakKeyMap looks better to me.
  • Behavior: we should start with minimal set of methods. How about:
    • ObjectSpace::WeakKeyMap#[](key)
    • ObjectSpace::WeakKeyMap#[]=(key, val)
    • ObjectSpace::WeakKeyMap#getkey(key)
    • ObjectSpace::WeakKeyMap#inspect #=> #<ObjectSpace::WeakKeyMap size=100>

Matz.

Updated by byroot (Jean Boussier) 3 months ago

Thank you Matz!

I vote for putting it under ObjectSpace

I think it's fine. It's an advanced class, no need to pollute the main namespace with it.

we should start with minimal set of methods.

Agreed. A couple points though:

  • I agree that #each isn't desirable, and tricky to implement correctly.
  • However #keys would be simpler to implement, and very useful for debugging purposes. If we can't introspect that is in the Hash, some bugs might be hard to resolve.

From the meeting notes:

mame: “Third party object extension” might be impossible without compare_by_identity. We need to confirm the OP if it is okay

I don't think it's a problem in practice. Except for "value objects" == usually has identity semantic. I wouldn't object to a compare_by_identity option, but I don't think it's a dealbreaker if it's not there.

ko1: key(key)?

That would have been my preferred name too, but it's already defined on Hash with a different semantic, so I think it would be confusing. #getkey(key) is a good compromise I think.

Updated by byroot (Jean Boussier) 3 months ago

I took a stab at implementing the current spec: https://github.com/ruby/ruby/pull/5570

Methods

  • ObjectSpace::WeakKeyMap#[](key)
  • ObjectSpace::WeakKeyMap#[]=(key, val)
  • ObjectSpace::WeakKeyMap#getkey(key)
  • ObjectSpace::WeakKeyMap#size
  • ObjectSpace::WeakKeyMap#clear
  • ObjectSpace::WeakKeyMap#inspect #=> #<ObjectSpace::WeakKeyMap size=100>

For #[]=:

  • Trying to use an immediate as a key fail with ArgumentError
  • For consistency, BigNum and dynamic Symbol also raise ArgumentError
  • Using an object without a #hash method raise NoMethodError like a regular hash.

Notes

For simplicity, I took a shortcut. When a key is finalized we can no longer hash it, so I traverse the table in O(N) and delete entries pointing to the same object.

I'm not certain yet what's the best solution for this, but I assume we can record the hash inside a secondary table with identity semantic.

But this is good enough to experiment with the defined interface and see if there is problems with the spec itself.

Updated by mame (Yusuke Endoh) 3 months ago

What is #size method needed for? It may be error-prune to depend on a return value of the method because the size may change immediately after the method is called. For debugging, #inspect includes the size information. For memory usage analysis, ObjectSpace.memsize_of would be a better API.

Updated by byroot (Jean Boussier) 3 months ago

@mame (Yusuke Endoh), I think you convinced me, I'll remove it.

Actions

Also available in: Atom PDF