Feature #5392

Symbol GC

Added by Kurt Stephens over 2 years ago. Updated 5 months ago.

[ruby-core:39881]
Status:Closed
Priority:Normal
Assignee:Narihiro Nakamura
Category:-
Target version:next minor

Description

I looked more into Symbol GC. The biggest problem is IDs are not VALUEs. My outburst at RubyConf based on my stupid assumption that they were -- I was trying to attack the problem using WeakRefs.

If IDs were VALUEs and Symbols were allocated like any other Object, the existing GC mark and root machinery (including C stack root scans), would take care of it, with an additional sweep of the global_symbol lookup tables.

However, the remaining issue is IDs stored in globals. No matter what, IDs stored in C globals will need to be rbgcregister_address(VALUE*) roots -- this means CRuby API/contract changes.

Adding a standalone ID mark table and a rbgcmark_id() function will not fix problem of lone IDs on the C stack.

What was the original reason to distinguish Symbol IDs from Object VALUEs, besides making lexer tokens simple to map.
Would changing IDs to be allocated VALUE objects simplify internals anyway? This change could also allow Anonymous Symbols and Anonymous Methods.

-- Kurt Stephens

History

#1 Updated by Konstantin Haase over 2 years ago

How would you ensure identity? Do a search on every Symbol creation? Keep a hash map?

On Oct 3, 2011, at 09:41 , Kurt Stephens wrote:

Issue #5392 has been reported by Kurt Stephens.


Feature #5392: Symbol GC
http://redmine.ruby-lang.org/issues/5392

Author: Kurt Stephens
Status: Open
Priority: Normal
Assignee:
Category:
Target version:

I looked more into Symbol GC. The biggest problem is IDs are not VALUEs. My outburst at RubyConf based on my stupid assumption that they were -- I was trying to attack the problem using WeakRefs.

If IDs were VALUEs and Symbols were allocated like any other Object, the existing GC mark and root machinery (including C stack root scans), would take care of it, with an additional sweep of the global_symbol lookup tables.

However, the remaining issue is IDs stored in globals. No matter what, IDs stored in C globals will need to be rbgcregister_address(VALUE*) roots -- this means CRuby API/contract changes.

Adding a standalone ID mark table and a rbgcmark_id() function will not fix problem of lone IDs on the C stack.

What was the original reason to distinguish Symbol IDs from Object VALUEs, besides making lexer tokens simple to map.
Would changing IDs to be allocated VALUE objects simplify internals anyway? This change could also allow Anonymous Symbols and Anonymous Methods.

-- Kurt Stephens

http://redmine.ruby-lang.org

#2 Updated by Kurt Stephens over 2 years ago

Konstantin Haase wrote:

How would you ensure identity? Do a search on every Symbol creation? Keep a hash map?

Unless I misunderstand your question, we would insure identity with the same mechanism that exists now: a String->Symbol hash map. The difference is the hash map is pruned of dead Symbols during GC sweep. If available, WeakRefs and RefQueues would reduce the cost.

#3 Updated by Yusuke Endoh about 2 years ago

  • Status changed from Open to Assigned
  • Assignee set to Narihiro Nakamura

#4 Updated by Yusuke Endoh over 1 year ago

  • Target version set to next minor

#5 Updated by Narihiro Nakamura 5 months ago

  • Status changed from Assigned to Closed

duplicated #7791

Also available in: Atom PDF