Object deserializer (like JSON, MessagePack) instantiates many String objects (as keys of Hash objects), and many of those are in a set of names. (So the total number of keys is not infinite.)
In such use-case, the object deserializer is generating many string object instances. Those are impacting the VM performance (mainly for GC pressure), especially in the case when those objects keep staying in memory for a long time.
If we can de-duplicate those instances at the instantiation, we can reduce the performance impact of object instantiation. It can be achieved if we have C API to generate frozen strings.
On the other hand, if we have Ruby methods to get frozen strings from strings, we can implement object deserializer in Ruby. It should be valuable for many Ruby users because of MJIT optimization in the future (And that method can be used from C ext modules too).
So, in general, a Ruby method to get frozen (de-duplicated) strings will be valuable and can improve the Ruby performance so much. Deserializers (JSON, MessagePack) are used everywhere.
@tagomoris I've been advocating for exposing the fstring family of function exactly for this. We load a lot of data from flat files, and it cost us a lot of memory alloc and then CPU to deduplicate them. And I was planning to submit patches to message pack once these API would be available.
However on the Ruby side, I'm not sure what your proposal do differently from String#-@.
Not also sure if String#-@ saves the OP’s situation, though
String#-@ doesn't as it's too late (the string was allocated already). But exposing rb_fstring() would, at in some specific use cases it could drastically reduce allocations.
Thank you for the beedbacks! I missed considering about String#-@ method. It looks worth to try, so I'll evaluate that option on the workload of msgpack-ruby (and Fluentd possibly).