Project

General

Profile

Actions

Feature #15940

open

Coerce symbols internal fstrings in UTF8 rather than ASCII to better share memory with string literals

Added by byroot (Jean Boussier) over 5 years ago. Updated 8 months ago.

Status:
Assigned
Target version:
-
[ruby-core:93250]

Description

Patch: https://github.com/ruby/ruby/pull/2242

It's not uncommon for symbols to have literal string counterparts, e.g.

class User
  attr_accessor :name

  def as_json
    { 'name' => name }
  end
end

Since the default source encoding is UTF-8, and that symbols coerce their internal fstring to ASCII when possible, the above snippet will actually keep two instances of "name" in the fstring registry. One in ASCII, the other in UTF-8.

Considering that UTF-8 is a strict superset of ASCII, storing the symbols fstrings as UTF-8 instead makes no significant difference, but allows in most cases to reuse the equivalent string literals.

The only notable behavioral change is Symbol#to_s.

Previously :name.to_s.encoding would be #<Encoding:US-ASCII>.
After this patch it's #<Encoding:UTF-8>. I can't foresee any significant compatibility impact of this change on existing code.

However, there are several ruby specs asserting this behavior, but I don't know if they can be changed or not: https://github.com/ruby/spec/commit/a73a1c11f13590dccb975ba4348a04423c009453

If this specification is impossible to change, then we could consider changing the encoding of the String returned by Symbol#to_s, e.g in ruby pseudo code:

def to_s
  str = fstr.dup
  str.force_encoding(Encoding::ASCII) if str.ascii_only?
  str
end
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0