Project

General

Profile

Actions

Feature #13381

closed

[PATCH] Expose rb_fstring and its family to C extensions

Feature #13381: [PATCH] Expose rb_fstring and its family to C extensions

Added by eagletmt (Kohei Suzuki) over 8 years ago. Updated almost 5 years ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:80447]


Related issues 2 (0 open2 closed)

Related to Ruby - Feature #17147: New method to get frozen strings from String objectsFeedbackActions
Has duplicate Ruby - Feature #16029: Expose fstring related APIs to C-extensionsClosedActions

Updated by ko1 (Koichi Sasada) over 8 years ago Actions #1

  • Status changed from Open to Feedback

I can understand use cases but we shouldn't expose the name "fstring".

Updated by eagletmt (Kohei Suzuki) over 8 years ago Actions #2 [ruby-core:80451]

OK, I've read comments of #13077.

What do you think of renaming fstring to "deduped" string? "Deduped" strings are implicitly frozen.

  • Rename rb_fstring to rb_str_deduped
  • Rename rb_fstring_new to rb_str_deduped_new
  • Rename rb_fstring_cstr to rb_str_deduped_cstr
  • Rename rb_fstring_enc_new to rb_enc_str_deduped_new
  • Rename rb_fstring_enc_cstr to rb_enc_str_deduped_cstr
    • I think enc should come first for consistency

Updated by ko1 (Koichi Sasada) over 8 years ago Actions #3 [ruby-core:81703]

  • Status changed from Feedback to Assigned
  • Assignee set to ko1 (Koichi Sasada)

Updated by nobu (Nobuyoshi Nakada) over 8 years ago Actions #4 [ruby-core:81704]

  • Assignee deleted (ko1 (Koichi Sasada))

How about fixed_str?

Updated by k0kubun (Takashi Kokubun) almost 6 years ago Actions #5

  • Has duplicate Feature #16029: Expose fstring related APIs to C-extensions added

Updated by ko1 (Koichi Sasada) over 5 years ago Actions #6 [ruby-core:99199]

Nobu and I discussed the name, we want to choose rb_str_pool terminology.

  • rb_str_pool prefix

    • rb_str_pool(VALUE)
    • rb_str_pool_buff(const char *ptr, long len)
    • rb_str_pool_cstr(const char *ptr)
    • rb_enc_str_pool_buff(const char *ptr, long len, rb_encoding *enc)
    • rb_enc_str_pool_cstr(const char *ptr, rb_encoding *enc)
  • FYI: rb_str_new series:

    • rb_str_new(const char *ptr, long len)
    • rb_str_new_cstr(const char *ptr)
    • rb_enc_str_new(const char *ptr, long len, rb_encoding *enc)
    • rb_enc_str_new_cstr(const char *ptr, rb_encoding *enc)

If there is no comments, we will merge it soon.

Updated by Eregon (Benoit Daloze) over 5 years ago Actions #7 [ruby-core:99200]

I'd suggest buff => buffer, I don't think it's worth abbreviating that.
BTW, there is rb_alloc_tmp_buffer so buff would be inconsistent.

Is rb_str_pool(VALUE) useful?
I think it could be rb_str_uminus() (or just rb_funcall(str, rb_intern("-@"))).
And then we could be consistent with rb_str_new*, and drop confusing _buff suffixes (not really buffer (=> sounds mutable), just a C string with known length).

So I think this is better:

rb_str_uminus(VALUE) // or just rb_funcall(str, rb_intern("-@"))
rb_str_pool(const char *ptr, long len)
rb_str_pool_cstr(const char *ptr)
rb_enc_str_pool(const char *ptr, long len, rb_encoding *enc)
rb_enc_str_pool_cstr(const char *ptr, rb_encoding *enc)

Updated by nobu (Nobuyoshi Nakada) over 5 years ago Actions #8 [ruby-core:99236]

@ko1 (Koichi Sasada) suggests rb_str_pool_value(VALUE) for the first type.

As these may not create a new object, I thought of using pool as a verb instead of new however, it doesn't feel fit much.
Does anyone have other ideas?

Updated by byroot (Jean Boussier) over 5 years ago Actions #9 [ruby-core:99237]

Does anyone have other ideas?

A common term for this in intern / interning / interned. However that terminology is already used for symbols (rb_str_intern).

This would look like:

rb_interned_str(VALUE)
rb_interned_str(const char *ptr, long len)
rb_interned_str_cstr(const char *ptr)
rb_enc_interned_str(const char *ptr, long len, rb_encoding *enc)
rb_enc_interned_str_cstr(const char *ptr, rb_encoding *enc)

Updated by ko1 (Koichi Sasada) about 5 years ago Actions #10 [ruby-core:99564]

However that terminology is already used for symbols (rb_str_intern).

Yes. This is why we didn't propose it.

Updated by naruse (Yui NARUSE) about 5 years ago Actions #11

  • Related to Feature #17147: New method to get frozen strings from String objects added

Updated by ko1 (Koichi Sasada) about 5 years ago Actions #12 [ruby-core:100083]

I'm now positive "interned_str" instead of "interned".

Updated by byroot (Jean Boussier) about 5 years ago Actions #13 [ruby-core:100084]

Would you like a patch?

Updated by Dan0042 (Daniel DeLorme) about 5 years ago Actions #14 [ruby-core:100114]

Any of the earlier suggestions were good, such as "pool" or "deduped". But while "interned" is technically correct, it will lead to confusion with symbols. It's better to keep a separate terminology imho.

Updated by byroot (Jean Boussier) almost 5 years ago Actions #15

  • Status changed from Assigned to Closed

Applied in changeset git|ef19fb111a8c8bf1a71d46e6fcf34b227e086845.


Expose the rb_interned_str_* family of functions

Fixes [Feature #13381]

Updated by byroot (Jean Boussier) almost 5 years ago Actions #16 [ruby-core:100935]

After trying to use the new functions in json and messagepack I realized we overlooked something entirely.

The internal rb_fstring_* will build new strings with str_new_static, so will directly re-use the string pointer that was provided. That's pretty much unusable, and unsafe.

I submitted another pull request to fix this: https://github.com/ruby/ruby/pull/3786

Updated by nobu (Nobuyoshi Nakada) almost 5 years ago Actions #17 [ruby-core:101082]

It sounds like that fstring doesn't match that purpose.
Is it really better to divert fstring than separated string pools?

Updated by byroot (Jean Boussier) almost 5 years ago Actions #18 [ruby-core:101102]

It sounds like that fstring doesn't match that purpose.

I'm not sure why it wouldn't. Ultimately the prupose is the same than String#-@, but from the C API and by passing a char *.

Is it really better to divert fstring than separated string pools?

It would be way less efficient, consider the following case:

objects = JSON.load_file('path/to.json') # [{"field": 1}, {"field": 2}, ...]
objects.map { |o| o['field'] }

Here some_field since it is a literal is part of the fstring table. As of Ruby 2.7 the json extension has to rb_str_new() many times, before calling hash_aset which will then deduplicate these strings.

In some use cases like ours, this generates a huge amount of GC pressure that could be avoided if the json extension (and some others) could directly lookup interned strings from a char pointer.

Updated by byroot (Jean Boussier) almost 5 years ago Actions #19 [ruby-core:101103]

@alanwu (Alan Wu) just pointed to me that the confusion might come from the fact that this ticket need isn't quite the same than the one I originally opened: https://bugs.ruby-lang.org/issues/16029

They are close but not quite duplicates. @eagletmt wanted a fast API to convert static C strings into Ruby strings, I wanted an API for basically rb_str_new_frozen() that would deduplicate at the same time without having to allocate.

Updated by byroot (Jean Boussier) almost 5 years ago Actions #20 [ruby-core:101104]

Also if that helps, here's how it would be used by json: https://github.com/flori/json/pull/451

Actions

Also available in: PDF Atom