Project

General

Profile

Feature #13381

[PATCH] Expose rb_fstring and its family to C extensions

Added by eagletmt (Kohei Suzuki) over 3 years ago. Updated 7 days ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:80447]


Related issues

Related to Ruby master - Feature #17147: New method to get frozen strings from String objectsFeedbackActions
Has duplicate Ruby master - Feature #16029: Expose fstring related APIs to C-extensionsClosedActions
#1

Updated by ko1 (Koichi Sasada) over 3 years ago

  • Status changed from Open to Feedback

I can understand use cases but we shouldn't expose the name "fstring".

Updated by eagletmt (Kohei Suzuki) over 3 years ago

OK, I've read comments of #13077.

What do you think of renaming fstring to "deduped" string? "Deduped" strings are implicitly frozen.

  • Rename rb_fstring to rb_str_deduped
  • Rename rb_fstring_new to rb_str_deduped_new
  • Rename rb_fstring_cstr to rb_str_deduped_cstr
  • Rename rb_fstring_enc_new to rb_enc_str_deduped_new
  • Rename rb_fstring_enc_cstr to rb_enc_str_deduped_cstr
    • I think enc should come first for consistency

Updated by ko1 (Koichi Sasada) over 3 years ago

  • Assignee set to ko1 (Koichi Sasada)
  • Status changed from Feedback to Assigned

Updated by nobu (Nobuyoshi Nakada) over 3 years ago

  • Assignee deleted (ko1 (Koichi Sasada))

How about fixed_str?

#5

Updated by k0kubun (Takashi Kokubun) 11 months ago

  • Has duplicate Feature #16029: Expose fstring related APIs to C-extensions added

Updated by ko1 (Koichi Sasada) 4 months ago

Nobu and I discussed the name, we want to choose rb_str_pool terminology.

  • rb_str_pool prefix

    • rb_str_pool(VALUE)
    • rb_str_pool_buff(const char *ptr, long len)
    • rb_str_pool_cstr(const char *ptr)
    • rb_enc_str_pool_buff(const char *ptr, long len, rb_encoding *enc)
    • rb_enc_str_pool_cstr(const char *ptr, rb_encoding *enc)
  • FYI: rb_str_new series:

    • rb_str_new(const char *ptr, long len)
    • rb_str_new_cstr(const char *ptr)
    • rb_enc_str_new(const char *ptr, long len, rb_encoding *enc)
    • rb_enc_str_new_cstr(const char *ptr, rb_encoding *enc)

If there is no comments, we will merge it soon.

Updated by Eregon (Benoit Daloze) 4 months ago

I'd suggest buff => buffer, I don't think it's worth abbreviating that.
BTW, there is rb_alloc_tmp_buffer so buff would be inconsistent.

Is rb_str_pool(VALUE) useful?
I think it could be rb_str_uminus() (or just rb_funcall(str, rb_intern("-@"))).
And then we could be consistent with rb_str_new*, and drop confusing _buff suffixes (not really buffer (=> sounds mutable), just a C string with known length).

So I think this is better:

rb_str_uminus(VALUE) // or just rb_funcall(str, rb_intern("-@"))
rb_str_pool(const char *ptr, long len)
rb_str_pool_cstr(const char *ptr)
rb_enc_str_pool(const char *ptr, long len, rb_encoding *enc)
rb_enc_str_pool_cstr(const char *ptr, rb_encoding *enc)

Updated by nobu (Nobuyoshi Nakada) 4 months ago

ko1 (Koichi Sasada) suggests rb_str_pool_value(VALUE) for the first type.

As these may not create a new object, I thought of using pool as a verb instead of new however, it doesn't feel fit much.
Does anyone have other ideas?

Updated by byroot (Jean Boussier) 4 months ago

Does anyone have other ideas?

A common term for this in intern / interning / interned. However that terminology is already used for symbols (rb_str_intern).

This would look like:

rb_interned_str(VALUE)
rb_interned_str(const char *ptr, long len)
rb_interned_str_cstr(const char *ptr)
rb_enc_interned_str(const char *ptr, long len, rb_encoding *enc)
rb_enc_interned_str_cstr(const char *ptr, rb_encoding *enc)

Updated by ko1 (Koichi Sasada) 3 months ago

However that terminology is already used for symbols (rb_str_intern).

Yes. This is why we didn't propose it.

#11

Updated by naruse (Yui NARUSE) 3 months ago

  • Related to Feature #17147: New method to get frozen strings from String objects added

Updated by ko1 (Koichi Sasada) 2 months ago

I'm now positive "interned_str" instead of "interned".

Updated by byroot (Jean Boussier) 2 months ago

Would you like a patch?

Updated by Dan0042 (Daniel DeLorme) 2 months ago

Any of the earlier suggestions were good, such as "pool" or "deduped". But while "interned" is technically correct, it will lead to confusion with symbols. It's better to keep a separate terminology imho.

#15

Updated by byroot (Jean Boussier) 8 days ago

  • Status changed from Assigned to Closed

Applied in changeset git|ef19fb111a8c8bf1a71d46e6fcf34b227e086845.


Expose the rb_interned_str_* family of functions

Fixes [Feature #13381]

Updated by byroot (Jean Boussier) 7 days ago

After trying to use the new functions in json and messagepack I realized we overlooked something entirely.

The internal rb_fstring_* will build new strings with str_new_static, so will directly re-use the string pointer that was provided. That's pretty much unusable, and unsafe.

I submitted another pull request to fix this: https://github.com/ruby/ruby/pull/3786

Also available in: Atom PDF