Project

General

Profile

Actions

Feature #22130

open

Add a new YARV instruction for a `String.new` fast path

Feature #22130: Add a new YARV instruction for a `String.new` fast path

Added by tenderlovemaking (Aaron Patterson) 5 days ago. Updated 1 day ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:125838]

Description

I would like to introduce a new YARV instruction, opt_string_new. It's similar to opt_new, but it is specialized for strings.

Today, we define the new method on String. The reason we define the new method on String is because people can call new with a capacity like this:

s = String.new(capacity: 1234)

We want to pass the capacity to the GC so that we can ask the GC to possibly allocate a "right sized" object that includes the underlying string buffer. If we didn't implement new, then we would be forced to allocate a regular 40 byte slot as well as a malloc buffer for the string.

There are a few downsides to the current implementation. First, users can subclass String and expect the signature they define on initialize be the same signature that is expected for new. For example

class CoolString < String
  def initialize(is_cool:)
    @is_cool = is_cool
    super(encoding: "UTF-8")
  end
end

CoolString.new(is_cool: true)

In order to handle this, the new implementation on String must check that the receiver is String, and if not, it forwards the call.

The user can call super from initialize, and they expect the string to be setup in the normal fashion (setting the encoding, etc). That means that the implementation of rb_str_s_new is very similar to rb_str_init (we have a lot of duplicated code).

The other down side is that since new can accept keyword arguments, we end up with an extra hash allocation when calling the C method.

I would like to propose adding an opt_string_new instruction that does the "right sized allocation" and then calls initialize on the instance. For example, when we compile code like String.new(capacity: 123), we can know where "capacity" will be stored on the stack at compile time. Since we have the capacity, we can emit an opt_string_new instruction that allocates the string and then delegates to initialize (which we'll rewrite in Ruby).

To make this more concrete, here are the iseqs today:

ruby --dump=insns -e'String.new(capacity: 123)'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,25)>
0000 opt_getconstant_path                   <ic:0 String>             (   1)[Li]
0002 putnil
0003 swap
0004 putobject                              123
0006 opt_new                                <calldata!mid:new, argc:1, kw:[#<Symbol:0x000000000086110c>], KWARG>, 13
0009 opt_send_without_block                 <calldata!mid:initialize, argc:1, kw:[#<Symbol:0x000000000086110c>], FCALL|KWARG>
0011 jump                                   16
0013 opt_send_without_block                 <calldata!mid:new, argc:1, kw:[#<Symbol:0x000000000086110c>], KWARG>
0015 swap
0016 pop
0017 leave

Here is what I'm proposing:

> ruby --dump=insns -e'String.new(capacity: 123)'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,25)>
0000 opt_getconstant_path                   <ic:0 String>             (   1)[Li]
0002 putnil
0003 swap
0004 putobject                              123
0006 opt_string_new                         <calldata!mid:new, argc:1, kw:[#<Symbol:0x000000000085f10c>], KWARG>, 13, 0
0010 pop
0011 jump                                   16
0013 opt_send_without_block                 <calldata!mid:new, argc:1, kw:[#<Symbol:0x000000000085f10c>], KWARG>
0015 swap
0016 pop
0017 leave

I've made a WIP pull request here: https://github.com/ruby/ruby/pull/17482

Here are a few benchmark results comparing against Ruby's master branch.

Interpreter + String.new (iterations / s, higher is better):

ips --ruby /Users/aaron/.rubies/arm64/master/bin/ruby --ruby $(which ruby) -e 'String.new'
ruby 4.1.0dev (2026-06-24T21:17:33Z master bb75c2893a) +PRISM [arm64-darwin25]
          String.new:    19.811M i/s (± 0.6%, GC  6.3%)

ruby 4.1.0dev (2026-06-25T19:50:18Z new-in-ruby 52a8c02e69) +PRISM [arm64-darwin25]
          String.new:    44.473M i/s (± 0.9%, GC 14.4%)


Summary
  ruby /Users/aaron/.rubies/arm64/new-in-ruby/bin/ruby ran
    2.24 ± 0.02 times faster than ruby /Users/aaron/.rubies/arm64/master/bin/ruby

Interpreter + `String.new(capacity: 123) (iterations / s, higher is better):

ips --ruby /Users/aaron/.rubies/arm64/master/bin/ruby --ruby $(which ruby) -e 'String.new(capacity: 123)'
ruby 4.1.0dev (2026-06-24T21:17:33Z master bb75c2893a) +PRISM [arm64-darwin25]
String.new(capacity: 123):    10.179M i/s (± 1.1%, GC 17.7%)

ruby 4.1.0dev (2026-06-25T19:50:18Z new-in-ruby 52a8c02e69) +PRISM [arm64-darwin25]
String.new(capacity: 123):    35.924M i/s (± 0.4%, GC 20.2%)


Summary
  ruby /Users/aaron/.rubies/arm64/new-in-ruby/bin/ruby ran
    3.53 ± 0.04 times faster than ruby /Users/aaron/.rubies/arm64/master/bin/ruby

Interpreter + String.new(encoding: "UTF-8") (iterations / s, higher is better):

ips --ruby /Users/aaron/.rubies/arm64/master/bin/ruby --ruby $(which ruby) -e 'String.new(encoding: "UTF-8")'
ruby 4.1.0dev (2026-06-24T21:17:33Z master bb75c2893a) +PRISM [arm64-darwin25]
String.new(encoding: "UTF-8"):     8.282M i/s (± 2.5%, GC 12.0%)

ruby 4.1.0dev (2026-06-25T19:50:18Z new-in-ruby 52a8c02e69) +PRISM [arm64-darwin25]
String.new(encoding: "UTF-8"):     7.770M i/s (± 1.6%, GC  2.7%)


Summary
  ruby /Users/aaron/.rubies/arm64/master/bin/ruby ran
    1.07 ± 0.03 times faster than ruby /Users/aaron/.rubies/arm64/new-in-ruby/bin/ruby

The first two cases easily win with this patch. Passing only an encoding may be slightly slower, but they are very close (and the allocations are decreased). Here are the same benchmarks but with YJIT enabled:

YJIT + String.new (iterations / s, higher is better):

ips --ruby /Users/aaron/.rubies/arm64/master/bin/ruby --ruby $(which ruby) -e 'String.new' --yjit
ruby 4.1.0dev (2026-06-24T21:17:33Z master bb75c2893a) +YJIT +PRISM [arm64-darwin25]
          String.new:    31.008M i/s (± 0.8%, GC  9.8%)

ruby 4.1.0dev (2026-06-25T19:50:18Z new-in-ruby 52a8c02e69) +YJIT +PRISM [arm64-darwin25]
          String.new:    97.603M i/s (± 0.8%, GC 30.9%)


Summary
  ruby /Users/aaron/.rubies/arm64/new-in-ruby/bin/ruby ran
    3.15 ± 0.04 times faster than ruby /Users/aaron/.rubies/arm64/master/bin/ruby

YJIT + `String.new(capacity: 123) (iterations / s, higher is better):

ips --ruby /Users/aaron/.rubies/arm64/master/bin/ruby --ruby $(which ruby) -e 'String.new(capacity: 123)' --yjit
ruby 4.1.0dev (2026-06-24T21:17:33Z master bb75c2893a) +YJIT +PRISM [arm64-darwin25]
String.new(capacity: 123):    12.986M i/s (± 1.3%, GC 22.2%)

ruby 4.1.0dev (2026-06-25T19:50:18Z new-in-ruby 52a8c02e69) +YJIT +PRISM [arm64-darwin25]
String.new(capacity: 123):    77.827M i/s (± 0.4%, GC 42.0%)


Summary
  ruby /Users/aaron/.rubies/arm64/new-in-ruby/bin/ruby ran
    5.99 ± 0.08 times faster than ruby /Users/aaron/.rubies/arm64/master/bin/ruby

YJIT + String.new(encoding: "UTF-8") (iterations / s, higher is better):

ips --ruby /Users/aaron/.rubies/arm64/master/bin/ruby --ruby $(which ruby) -e 'String.new(encoding: "UTF-8")' --yjit
ruby 4.1.0dev (2026-06-24T21:17:33Z master bb75c2893a) +YJIT +PRISM [arm64-darwin25]
String.new(encoding: "UTF-8"):     9.909M i/s (± 0.5%, GC 13.9%)

ruby 4.1.0dev (2026-06-25T19:50:18Z new-in-ruby 52a8c02e69) +YJIT +PRISM [arm64-darwin25]
String.new(encoding: "UTF-8"):    11.916M i/s (± 0.7%, GC  3.8%)


Summary
  ruby /Users/aaron/.rubies/arm64/new-in-ruby/bin/ruby ran
    1.20 ± 0.01 times faster than ruby /Users/aaron/.rubies/arm64/master/bin/ruby

The first two benchmarks are much faster than the master branch, and the last benchmark is faster because I moved initialize to Ruby. I've implemented this in ZJIT too, but I'm not going to post the numbers because they are very similar to YJIT for this micro benchmark.

Actions

Also available in: PDF Atom