Feature #22130: Add a new YARV instruction for a `String.new` fast path - Ruby - Ruby Issue Tracking System

Actions

Copy link

Feature #22130

open

Add a new YARV instruction for a `String.new` fast path

Feature #22130: Add a new YARV instruction for a `String.new` fast path

Added by tenderlovemaking (Aaron Patterson) 4 days ago. Updated about 12 hours ago.

Status:

Open

Assignee:

Target version:

[ruby-core:125838]

Description

I would like to introduce a new YARV instruction, opt_string_new. It's similar to opt_new, but it is specialized for strings.

Today, we define the new method on String. The reason we define the new method on String is because people can call new with a capacity like this:

s = String.new(capacity: 1234)

We want to pass the capacity to the GC so that we can ask the GC to possibly allocate a "right sized" object that includes the underlying string buffer. If we didn't implement new, then we would be forced to allocate a regular 40 byte slot as well as a malloc buffer for the string.

There are a few downsides to the current implementation. First, users can subclass String and expect the signature they define on initialize be the same signature that is expected for new. For example

class CoolString < String
  def initialize(is_cool:)
    @is_cool = is_cool
    super(encoding: "UTF-8")
  end
end

CoolString.new(is_cool: true)

In order to handle this, the new implementation on String must check that the receiver is String, and if not, it forwards the call.

The user can call super from initialize, and they expect the string to be setup in the normal fashion (setting the encoding, etc). That means that the implementation of rb_str_s_new is very similar to rb_str_init (we have a lot of duplicated code).

The other down side is that since new can accept keyword arguments, we end up with an extra hash allocation when calling the C method.

I would like to propose adding an opt_string_new instruction that does the "right sized allocation" and then calls initialize on the instance. For example, when we compile code like String.new(capacity: 123), we can know where "capacity" will be stored on the stack at compile time. Since we have the capacity, we can emit an opt_string_new instruction that allocates the string and then delegates to initialize (which we'll rewrite in Ruby).

To make this more concrete, here are the iseqs today:

ruby --dump=insns -e'String.new(capacity: 123)'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,25)>
0000 opt_getconstant_path                   <ic:0 String>             (   1)[Li]
0002 putnil
0003 swap
0004 putobject                              123
0006 opt_new                                <calldata!mid:new, argc:1, kw:[#<Symbol:0x000000000086110c>], KWARG>, 13
0009 opt_send_without_block                 <calldata!mid:initialize, argc:1, kw:[#<Symbol:0x000000000086110c>], FCALL|KWARG>
0011 jump                                   16
0013 opt_send_without_block                 <calldata!mid:new, argc:1, kw:[#<Symbol:0x000000000086110c>], KWARG>
0015 swap
0016 pop
0017 leave

Here is what I'm proposing:

> ruby --dump=insns -e'String.new(capacity: 123)'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,25)>
0000 opt_getconstant_path                   <ic:0 String>             (   1)[Li]
0002 putnil
0003 swap
0004 putobject                              123
0006 opt_string_new                         <calldata!mid:new, argc:1, kw:[#<Symbol:0x000000000085f10c>], KWARG>, 13, 0
0010 pop
0011 jump                                   16
0013 opt_send_without_block                 <calldata!mid:new, argc:1, kw:[#<Symbol:0x000000000085f10c>], KWARG>
0015 swap
0016 pop
0017 leave

I've made a WIP pull request here: https://github.com/ruby/ruby/pull/17482

Here are a few benchmark results comparing against Ruby's master branch.

Interpreter + String.new (iterations / s, higher is better):

ips --ruby /Users/aaron/.rubies/arm64/master/bin/ruby --ruby $(which ruby) -e 'String.new'
ruby 4.1.0dev (2026-06-24T21:17:33Z master bb75c2893a) +PRISM [arm64-darwin25]
          String.new:    19.811M i/s (± 0.6%, GC  6.3%)

ruby 4.1.0dev (2026-06-25T19:50:18Z new-in-ruby 52a8c02e69) +PRISM [arm64-darwin25]
          String.new:    44.473M i/s (± 0.9%, GC 14.4%)


Summary
  ruby /Users/aaron/.rubies/arm64/new-in-ruby/bin/ruby ran
    2.24 ± 0.02 times faster than ruby /Users/aaron/.rubies/arm64/master/bin/ruby

Interpreter + `String.new(capacity: 123) (iterations / s, higher is better):

ips --ruby /Users/aaron/.rubies/arm64/master/bin/ruby --ruby $(which ruby) -e 'String.new(capacity: 123)'
ruby 4.1.0dev (2026-06-24T21:17:33Z master bb75c2893a) +PRISM [arm64-darwin25]
String.new(capacity: 123):    10.179M i/s (± 1.1%, GC 17.7%)

ruby 4.1.0dev (2026-06-25T19:50:18Z new-in-ruby 52a8c02e69) +PRISM [arm64-darwin25]
String.new(capacity: 123):    35.924M i/s (± 0.4%, GC 20.2%)


Summary
  ruby /Users/aaron/.rubies/arm64/new-in-ruby/bin/ruby ran
    3.53 ± 0.04 times faster than ruby /Users/aaron/.rubies/arm64/master/bin/ruby

Interpreter + String.new(encoding: "UTF-8") (iterations / s, higher is better):

ips --ruby /Users/aaron/.rubies/arm64/master/bin/ruby --ruby $(which ruby) -e 'String.new(encoding: "UTF-8")'
ruby 4.1.0dev (2026-06-24T21:17:33Z master bb75c2893a) +PRISM [arm64-darwin25]
String.new(encoding: "UTF-8"):     8.282M i/s (± 2.5%, GC 12.0%)

ruby 4.1.0dev (2026-06-25T19:50:18Z new-in-ruby 52a8c02e69) +PRISM [arm64-darwin25]
String.new(encoding: "UTF-8"):     7.770M i/s (± 1.6%, GC  2.7%)


Summary
  ruby /Users/aaron/.rubies/arm64/master/bin/ruby ran
    1.07 ± 0.03 times faster than ruby /Users/aaron/.rubies/arm64/new-in-ruby/bin/ruby

The first two cases easily win with this patch. Passing only an encoding may be slightly slower, but they are very close (and the allocations are decreased). Here are the same benchmarks but with YJIT enabled:

YJIT + String.new (iterations / s, higher is better):

ips --ruby /Users/aaron/.rubies/arm64/master/bin/ruby --ruby $(which ruby) -e 'String.new' --yjit
ruby 4.1.0dev (2026-06-24T21:17:33Z master bb75c2893a) +YJIT +PRISM [arm64-darwin25]
          String.new:    31.008M i/s (± 0.8%, GC  9.8%)

ruby 4.1.0dev (2026-06-25T19:50:18Z new-in-ruby 52a8c02e69) +YJIT +PRISM [arm64-darwin25]
          String.new:    97.603M i/s (± 0.8%, GC 30.9%)


Summary
  ruby /Users/aaron/.rubies/arm64/new-in-ruby/bin/ruby ran
    3.15 ± 0.04 times faster than ruby /Users/aaron/.rubies/arm64/master/bin/ruby

YJIT + `String.new(capacity: 123) (iterations / s, higher is better):

ips --ruby /Users/aaron/.rubies/arm64/master/bin/ruby --ruby $(which ruby) -e 'String.new(capacity: 123)' --yjit
ruby 4.1.0dev (2026-06-24T21:17:33Z master bb75c2893a) +YJIT +PRISM [arm64-darwin25]
String.new(capacity: 123):    12.986M i/s (± 1.3%, GC 22.2%)

ruby 4.1.0dev (2026-06-25T19:50:18Z new-in-ruby 52a8c02e69) +YJIT +PRISM [arm64-darwin25]
String.new(capacity: 123):    77.827M i/s (± 0.4%, GC 42.0%)


Summary
  ruby /Users/aaron/.rubies/arm64/new-in-ruby/bin/ruby ran
    5.99 ± 0.08 times faster than ruby /Users/aaron/.rubies/arm64/master/bin/ruby

YJIT + String.new(encoding: "UTF-8") (iterations / s, higher is better):

ips --ruby /Users/aaron/.rubies/arm64/master/bin/ruby --ruby $(which ruby) -e 'String.new(encoding: "UTF-8")' --yjit
ruby 4.1.0dev (2026-06-24T21:17:33Z master bb75c2893a) +YJIT +PRISM [arm64-darwin25]
String.new(encoding: "UTF-8"):     9.909M i/s (± 0.5%, GC 13.9%)

ruby 4.1.0dev (2026-06-25T19:50:18Z new-in-ruby 52a8c02e69) +YJIT +PRISM [arm64-darwin25]
String.new(encoding: "UTF-8"):    11.916M i/s (± 0.7%, GC  3.8%)


Summary
  ruby /Users/aaron/.rubies/arm64/new-in-ruby/bin/ruby ran
    1.20 ± 0.01 times faster than ruby /Users/aaron/.rubies/arm64/master/bin/ruby

The first two benchmarks are much faster than the master branch, and the last benchmark is faster because I moved initialize to Ruby. I've implemented this in ZJIT too, but I'm not going to post the numbers because they are very similar to YJIT for this micro benchmark.

Updated by ko1 (Koichi Sasada) 4 days ago Actions
Copy link
#1 [ruby-core:125840]

Does it affect app performance?

Updated by tenderlovemaking (Aaron Patterson) 3 days ago Actions
Copy link
#2 [ruby-core:125845]

ko1 (Koichi Sasada) wrote in #note-1:

Does it affect app performance?

I don't think it slows down any applications since all cases of String.new are faster. But I also think it's rare for any application to call String.new so I doubt there is any performance improvement in railsbench for example.

The reason I'm interested in this is because we have an FFI extension that takes a maximum sized string buffer as input, writes bytes to it, and then sets the length.

This isn't the exact code, but it looks like this (the real code is here:

def decompress(input)
  metadata = Lz4FlexExt.get_decompression_metadata(input)
  expected_size = metadata & 0xffffffff

  output = String.new(capacity: expected_size)
  Lz4FlexExt.decompress_payload_into(input, data_offset, expected_size, output)
  output
end

The FFI code writes bytes in to the output buffer and sets the length (it doesn't always use the whole buffer). Profiling the above code, we found String.new to be the bottleneck (when input is small), and that String.new(capacity: xxx) is very slow compared to calling rb_str_buf_new for example. I want to keep as much code in Ruby as possible so I want to speed up String.new(capacity: xxx) instead of calling another C function.

Here are some benchmark results for the library. The first one is using String.new(capacity:) and compares the master branch vs my proposal. The second is comparing the proposal vs using a C extension.

1. Effect of the patch — `new-in-ruby` ÷ `master` (both using `String.new(capacity:)`)¶

operation	<1 KiB	1–64 KiB	≥64 KiB
compress	1.33x	1.33x	1.32x
decompress	1.34x	1.04x	0.95x

2. Patched `String.new(capacity:)` vs C extension — on `new-in-ruby`, YJIT¶

operation	<1 KiB	1–64 KiB	≥64 KiB
compress	1.01x	1.01x	1.00x
decompress	1.00x	0.99x	1.02x

With my proposal, it speeds up the Ruby case by ~1.34x for small payloads, and it performs about the same as using a C extension.

Updated by tenderlovemaking (Aaron Patterson) 3 days ago Actions
Copy link
#3 [ruby-core:125846]

BTW, this proposal also fixes a possible regression (though I don't think anyone cares about this case).

Given this code:

class String
  def initialize foo:
    p "hi"
    super()
  end
end

String.new(foo: 123)

Ruby 3.2:

> ruby -v test.rb
ruby 3.2.10 (2026-01-14 revision a3a6d25788) [arm64-darwin25]
test.rb:2: warning: method redefined; discarding old initialize
"hi"

Ruby 4.0:

> ruby -v test.rb
ruby 4.0.5 (2026-05-20 revision 64336ffd0e) +PRISM [arm64-darwin25]
test.rb:2: warning: method redefined; discarding old initialize
test.rb:8:in 'String.new': unknown keyword: :foo (ArgumentError)

    caller: test.rb:8
    | String.new(foo: 123)
            ^^^^
	from test.rb:8:in '<main>'

Updated by headius (Charles Nutter) 3 days ago Actions
Copy link
#4 [ruby-core:125847]

Is your benchmark published somewhere?

Updated by tenderlovemaking (Aaron Patterson) 3 days ago Actions
Copy link
#5 [ruby-core:125848]

headius (Charles Nutter) wrote in #note-4:

Is your benchmark published somewhere?

Ya, we're working with them here

Updated by byroot (Jean Boussier) 3 days ago Actions
Copy link
#6 [ruby-core:125860]

I support this. When working with low-level code that need to buffer IOs for parsing, it's very useful to be in control of the buffer size.

However what I found is that the overhead of argument handling when calling String.new(capacity: ...,encoding: ...) often waste more performance than what is gained by right sizing the buffer.

The problem is the same with Hash.new(capacity:).

Updated by ko1 (Koichi Sasada) about 13 hours ago Actions
Copy link
#7 [ruby-core:125868]

How about to introduce String.new_buffer(capacity) or some other name if it is important for the performance?

Updated by byroot (Jean Boussier) about 12 hours ago Actions
Copy link
#8 [ruby-core:125871]

How about to introduce String.new_buffer(capacity)

That was the original proposal back in [Feature #12024], but String.new(**) was considered more composable.

Personally, I must say that if we can make the existing API faster, it's much more convenient for gems and such, as we can just accept it's a bit slower on older rubies, rather than needing some respond_to? or method_defined? switch.

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Feature #22130

Add a new YARV instruction for a `String.new` fast path

Updated by ko1 (Koichi Sasada) 4 days ago Actions
Copy link
#1 [ruby-core:125840]

Updated by tenderlovemaking (Aaron Patterson) 3 days ago Actions
Copy link
#2 [ruby-core:125845]

1. Effect of the patch — `new-in-ruby` ÷ `master` (both using `String.new(capacity:)`)¶

2. Patched `String.new(capacity:)` vs C extension — on `new-in-ruby`, YJIT¶

Updated by tenderlovemaking (Aaron Patterson) 3 days ago Actions
Copy link
#3 [ruby-core:125846]

Updated by headius (Charles Nutter) 3 days ago Actions
Copy link
#4 [ruby-core:125847]

Updated by tenderlovemaking (Aaron Patterson) 3 days ago Actions
Copy link
#5 [ruby-core:125848]

Updated by byroot (Jean Boussier) 3 days ago Actions
Copy link
#6 [ruby-core:125860]

Updated by ko1 (Koichi Sasada) about 13 hours ago Actions
Copy link
#7 [ruby-core:125868]

Updated by byroot (Jean Boussier) about 12 hours ago Actions
Copy link
#8 [ruby-core:125871]

Project

General

Profile

Ruby

Custom queries

Feature #22130

Add a new YARV instruction for a `String.new` fast path

Updated by ko1 (Koichi Sasada) 4 days ago ActionsCopy link #1 [ruby-core:125840]

Updated by tenderlovemaking (Aaron Patterson) 3 days ago ActionsCopy link #2 [ruby-core:125845]

1. Effect of the patch — new-in-ruby ÷ master (both using String.new(capacity:))¶

2. Patched String.new(capacity:) vs C extension — on new-in-ruby, YJIT¶

Updated by tenderlovemaking (Aaron Patterson) 3 days ago ActionsCopy link #3 [ruby-core:125846]

Updated by headius (Charles Nutter) 3 days ago ActionsCopy link #4 [ruby-core:125847]

Updated by tenderlovemaking (Aaron Patterson) 3 days ago ActionsCopy link #5 [ruby-core:125848]

Updated by byroot (Jean Boussier) 3 days ago ActionsCopy link #6 [ruby-core:125860]

Updated by ko1 (Koichi Sasada) about 13 hours ago ActionsCopy link #7 [ruby-core:125868]

Updated by byroot (Jean Boussier) about 12 hours ago ActionsCopy link #8 [ruby-core:125871]

Updated by ko1 (Koichi Sasada) 4 days ago Actions
Copy link
#1 [ruby-core:125840]

Updated by tenderlovemaking (Aaron Patterson) 3 days ago Actions
Copy link
#2 [ruby-core:125845]

1. Effect of the patch — `new-in-ruby` ÷ `master` (both using `String.new(capacity:)`)¶

2. Patched `String.new(capacity:)` vs C extension — on `new-in-ruby`, YJIT¶

Updated by tenderlovemaking (Aaron Patterson) 3 days ago Actions
Copy link
#3 [ruby-core:125846]

Updated by headius (Charles Nutter) 3 days ago Actions
Copy link
#4 [ruby-core:125847]

Updated by tenderlovemaking (Aaron Patterson) 3 days ago Actions
Copy link
#5 [ruby-core:125848]

Updated by byroot (Jean Boussier) 3 days ago Actions
Copy link
#6 [ruby-core:125860]

Updated by ko1 (Koichi Sasada) about 13 hours ago Actions
Copy link
#7 [ruby-core:125868]

Updated by byroot (Jean Boussier) about 12 hours ago Actions
Copy link
#8 [ruby-core:125871]