Project

General

Profile

Feature #17147

New method to get frozen strings from String objects

Added by tagomoris (Satoshi TAGOMORI) about 2 months ago. Updated about 2 months ago.

Status:
Feedback
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:99916]

Description

Object deserializer (like JSON, MessagePack) instantiates many String objects (as keys of Hash objects), and many of those are in a set of names. (So the total number of keys is not infinite.)
In such use-case, the object deserializer is generating many string object instances. Those are impacting the VM performance (mainly for GC pressure), especially in the case when those objects keep staying in memory for a long time.

If we can de-duplicate those instances at the instantiation, we can reduce the performance impact of object instantiation. It can be achieved if we have C API to generate frozen strings.
On the other hand, if we have Ruby methods to get frozen strings from strings, we can implement object deserializer in Ruby. It should be valuable for many Ruby users because of MJIT optimization in the future (And that method can be used from C ext modules too).

So, in general, a Ruby method to get frozen (de-duplicated) strings will be valuable and can improve the Ruby performance so much. Deserializers (JSON, MessagePack) are used everywhere.


Related issues

Related to Ruby master - Feature #13381: [PATCH] Expose rb_fstring and its family to C extensionsAssignedActions
Is duplicate of Ruby master - Feature #13077: [PATCH] introduce String#fstring methodClosedActions

Updated by tagomoris (Satoshi TAGOMORI) about 2 months ago

I don't care of the name of that method, but here's some example if the discussion stops without options:

  • String#frozen_string
  • String#as_frozen_string
  • ObjectSpace.get_frozen_string(str)

Updated by shyouhei (Shyouhei Urabe) about 2 months ago

Understand the needs. Not sure if what is needed is actually the concept called “frozen” though.

Updated by byroot (Jean Boussier) about 2 months ago

tagomoris (Satoshi TAGOMORI) I've been advocating for exposing the fstring family of function exactly for this. We load a lot of data from flat files, and it cost us a lot of memory alloc and then CPU to deduplicate them. And I was planning to submit patches to message pack once these API would be available.

However on the Ruby side, I'm not sure what your proposal do differently from String#-@.

#4

Updated by naruse (Yui NARUSE) about 2 months ago

  • Is duplicate of Feature #13077: [PATCH] introduce String#fstring method added
#5

Updated by naruse (Yui NARUSE) about 2 months ago

  • Related to Feature #13381: [PATCH] Expose rb_fstring and its family to C extensions added

Updated by naruse (Yui NARUSE) about 2 months ago

  • Status changed from Open to Closed

The feature is provided by -str.

Updated by shyouhei (Shyouhei Urabe) about 2 months ago

  • Status changed from Closed to Feedback

Not also sure if String#-@ saves the OP’s situation, though. The method dedups string contents but has nothing to do with GC pressures.

Can you test if String#-@ works?

Updated by byroot (Jean Boussier) about 2 months ago

Not also sure if String#-@ saves the OP’s situation, though

String#-@ doesn't as it's too late (the string was allocated already). But exposing rb_fstring() would, at in some specific use cases it could drastically reduce allocations.

Updated by tagomoris (Satoshi TAGOMORI) about 2 months ago

Thank you for the beedbacks! I missed considering about String#-@ method. It looks worth to try, so I'll evaluate that option on the workload of msgpack-ruby (and Fluentd possibly).

Also available in: Atom PDF