Feature #17147
closedNew method to get frozen strings from String objects
Description
Object deserializer (like JSON, MessagePack) instantiates many String objects (as keys of Hash objects), and many of those are in a set of names. (So the total number of keys is not infinite.)
In such use-case, the object deserializer is generating many string object instances. Those are impacting the VM performance (mainly for GC pressure), especially in the case when those objects keep staying in memory for a long time.
If we can de-duplicate those instances at the instantiation, we can reduce the performance impact of object instantiation. It can be achieved if we have C API to generate frozen strings.
On the other hand, if we have Ruby methods to get frozen strings from strings, we can implement object deserializer in Ruby. It should be valuable for many Ruby users because of MJIT optimization in the future (And that method can be used from C ext modules too).
So, in general, a Ruby method to get frozen (de-duplicated) strings will be valuable and can improve the Ruby performance so much. Deserializers (JSON, MessagePack) are used everywhere.
Updated by tagomoris (Satoshi Tagomori) about 4 years ago
I don't care of the name of that method, but here's some example if the discussion stops without options:
- String#frozen_string
- String#as_frozen_string
- ObjectSpace.get_frozen_string(str)
Updated by shyouhei (Shyouhei Urabe) about 4 years ago
Understand the needs. Not sure if what is needed is actually the concept called “frozen” though.
Updated by byroot (Jean Boussier) about 4 years ago
@tagomoris (Satoshi Tagomori) I've been advocating for exposing the fstring
family of function exactly for this. We load a lot of data from flat files, and it cost us a lot of memory alloc and then CPU to deduplicate them. And I was planning to submit patches to message pack once these API would be available.
However on the Ruby side, I'm not sure what your proposal do differently from String#-@
.
Updated by naruse (Yui NARUSE) about 4 years ago
- Is duplicate of Feature #13077: [PATCH] introduce String#fstring method added
Updated by naruse (Yui NARUSE) about 4 years ago
- Related to Feature #13381: [PATCH] Expose rb_fstring and its family to C extensions added
Updated by naruse (Yui NARUSE) about 4 years ago
- Status changed from Open to Closed
The feature is provided by -str
.
Updated by shyouhei (Shyouhei Urabe) about 4 years ago
- Status changed from Closed to Feedback
Not also sure if String#-@ saves the OP’s situation, though. The method dedups string contents but has nothing to do with GC pressures.
Can you test if String#-@ works?
Updated by byroot (Jean Boussier) about 4 years ago
Not also sure if String#-@ saves the OP’s situation, though
String#-@
doesn't as it's too late (the string was allocated already). But exposing rb_fstring()
would, at in some specific use cases it could drastically reduce allocations.
Updated by tagomoris (Satoshi Tagomori) about 4 years ago
Thank you for the beedbacks! I missed considering about String#-@
method. It looks worth to try, so I'll evaluate that option on the workload of msgpack-ruby (and Fluentd possibly).