Project

General

Profile

Actions

Bug #21842

open

Encoding of rb_interned_str

Bug #21842: Encoding of rb_interned_str

Added by herwin (Herwin W) about 24 hours ago. Updated about 17 hours ago.

Status:
Open
Assignee:
-
Target version:
-
ruby -v:
ruby 4.0.1 (2026-01-13 revision e04267a14b) +PRISM [x86_64-linux], but seen on 3.0 - 4.1-dev
[ruby-core:124579]

Description

This is one of the API methods to get an fstring. The documentation in the source says the following:

/**
 * Identical to rb_str_new(), except it returns an infamous "f"string.  What is
 * a  fstring?  Well  it is  a special  subkind of  strings that  is immutable,
 * deduped globally, and managed by our GC.   It is much like a Symbol (in fact
 * Symbols  are dynamic  these days  and are  backended using  fstrings).  This
 * concept has been  silently introduced at some point in  2.x era.  Since then
 * it  gained  wider acceptance  in  the  core.   Starting from  3.x  extension
 * libraries can also generate ones.
 *
 * @param[in]  ptr           A memory region of `len` bytes length.
 * @param[in]  len           Length  of  `ptr`,  in bytes,  not  including  the
 *                           terminating NUL character.
 * @exception  rb_eArgError  `len` is negative.
 * @return     A  found or  created instance  of ::rb_cString,  of `len`  bytes
 *             length, of  "binary" encoding,  whose contents are  identical to
 *             that of `ptr`.
 * @pre        At  least  `len` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 */
VALUE rb_interned_str(const char *ptr, long len);

I tried to create some specs for them (https://github.com/ruby/spec/pull/1327), but instead of binary encoding, the string is actually encoded as US-ASCII. This may result is some weird behaviour if the input contains bytes that are not valid in US-ASCII (the following is more an observation of the current behaviour)

it "support binary strings that are invalid in ASCII encoding" do
  str = "foo\x81bar\x82baz".b
  result = @s.rb_interned_str(str, str.bytesize)
  result.encoding.should == Encoding::US_ASCII
  result.should == str.dup.force_encoding(Encoding::US_ASCII)
  result.should_not.valid_encoding?
end

So it seems to me like either the implementation of the documentation is incorrect.

(rb_interned_str_cstr has the same behaviour, it's pretty much the same thing except using a null terminator instead of an explicit length argument).


Related issues 1 (0 open1 closed)

Related to Ruby - Feature #13381: [PATCH] Expose rb_fstring and its family to C extensionsClosedActions

Updated by herwin (Herwin W) about 24 hours ago Actions #1

  • ruby -v set to ruby 4.0.1 (2026-01-13 revision e04267a14b) +PRISM [x86_64-linux], but seen on 3.0 - 4.1-dev

Updated by byroot (Jean Boussier) about 22 hours ago Actions #2

  • Related to Feature #13381: [PATCH] Expose rb_fstring and its family to C extensions added

Updated by byroot (Jean Boussier) about 22 hours ago Actions #3 [ruby-core:124580]

Hum, good find. So the function was exposed as a result of [Feature #13381], before that the function was internal.

In that ticket we didn't discuss the default encoding, but it might be fair to assume it should have been BINARY (aka ASCII-8BIT) like rb_str_new*.

The function was later documented in https://github.com/ruby/ruby/commit/091faca99ca and assumed to default to ASCII-8BIT.

At first glance I'd say it makes sense to treat this as a bug and change the default encoding.

On the other hand, one could argue that interned binary strings don't make that much sense.

I don't have a strong opinion either way.

Updated by Eregon (Benoit Daloze) about 21 hours ago Actions #4 [ruby-core:124581]

From https://github.com/truffleruby/truffleruby/issues/4018#issuecomment-3549329873, it seems everyone's expectation is that it returns a BINARY String, like rb_str_new().
@byroot (Jean Boussier) Could you make a PR to fix it?

Updated by byroot (Jean Boussier) about 20 hours ago Actions #6

  • Backport changed from 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN to 3.2: WONTFIX, 3.3: REQUIRED, 3.4: REQUIRED, 4.0: REQUIRED

Updated by byroot (Jean Boussier) about 18 hours ago 1Actions #7 [ruby-core:124584]

Fixed merged (Redmine seem to be lagging behind, but will probably pick it up).

Backport PRs:

Updated by nobu (Nobuyoshi Nakada) about 17 hours ago ยท Edited Actions #8 [ruby-core:124585]

I think it should be US-ASCII for 7bit only strings, as well as Symbols.
GH-15894

Actions

Also available in: PDF Atom