Actions
Bug #21842
openEncoding of rb_interned_str
Bug #21842:
Encoding of rb_interned_str
Status:
Open
Assignee:
-
Target version:
-
ruby -v:
ruby 4.0.1 (2026-01-13 revision e04267a14b) +PRISM [x86_64-linux], but seen on 3.0 - 4.1-dev
Description
This is one of the API methods to get an fstring. The documentation in the source says the following:
/**
* Identical to rb_str_new(), except it returns an infamous "f"string. What is
* a fstring? Well it is a special subkind of strings that is immutable,
* deduped globally, and managed by our GC. It is much like a Symbol (in fact
* Symbols are dynamic these days and are backended using fstrings). This
* concept has been silently introduced at some point in 2.x era. Since then
* it gained wider acceptance in the core. Starting from 3.x extension
* libraries can also generate ones.
*
* @param[in] ptr A memory region of `len` bytes length.
* @param[in] len Length of `ptr`, in bytes, not including the
* terminating NUL character.
* @exception rb_eArgError `len` is negative.
* @return A found or created instance of ::rb_cString, of `len` bytes
* length, of "binary" encoding, whose contents are identical to
* that of `ptr`.
* @pre At least `len` bytes of continuous memory region shall be
* accessible via `ptr`.
*/
VALUE rb_interned_str(const char *ptr, long len);
I tried to create some specs for them (https://github.com/ruby/spec/pull/1327), but instead of binary encoding, the string is actually encoded as US-ASCII. This may result is some weird behaviour if the input contains bytes that are not valid in US-ASCII (the following is more an observation of the current behaviour)
it "support binary strings that are invalid in ASCII encoding" do
str = "foo\x81bar\x82baz".b
result = @s.rb_interned_str(str, str.bytesize)
result.encoding.should == Encoding::US_ASCII
result.should == str.dup.force_encoding(Encoding::US_ASCII)
result.should_not.valid_encoding?
end
So it seems to me like either the implementation of the documentation is incorrect.
(rb_interned_str_cstr has the same behaviour, it's pretty much the same thing except using a null terminator instead of an explicit length argument).
Actions