Feature #22130
openAdd a new YARV instruction for a `String.new` fast path
Description
I would like to introduce a new YARV instruction, opt_string_new. It's similar to opt_new, but it is specialized for strings.
Today, we define the new method on String. The reason we define the new method on String is because people can call new with a capacity like this:
s = String.new(capacity: 1234)
We want to pass the capacity to the GC so that we can ask the GC to possibly allocate a "right sized" object that includes the underlying string buffer. If we didn't implement new, then we would be forced to allocate a regular 40 byte slot as well as a malloc buffer for the string.
There are a few downsides to the current implementation. First, users can subclass String and expect the signature they define on initialize be the same signature that is expected for new. For example
class CoolString < String
def initialize(is_cool:)
@is_cool = is_cool
super(encoding: "UTF-8")
end
end
CoolString.new(is_cool: true)
In order to handle this, the new implementation on String must check that the receiver is String, and if not, it forwards the call.
The user can call super from initialize, and they expect the string to be setup in the normal fashion (setting the encoding, etc). That means that the implementation of rb_str_s_new is very similar to rb_str_init (we have a lot of duplicated code).
The other down side is that since new can accept keyword arguments, we end up with an extra hash allocation when calling the C method.
I would like to propose adding an opt_string_new instruction that does the "right sized allocation" and then calls initialize on the instance. For example, when we compile code like String.new(capacity: 123), we can know where "capacity" will be stored on the stack at compile time. Since we have the capacity, we can emit an opt_string_new instruction that allocates the string and then delegates to initialize (which we'll rewrite in Ruby).
To make this more concrete, here are the iseqs today:
ruby --dump=insns -e'String.new(capacity: 123)'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,25)>
0000 opt_getconstant_path <ic:0 String> ( 1)[Li]
0002 putnil
0003 swap
0004 putobject 123
0006 opt_new <calldata!mid:new, argc:1, kw:[#<Symbol:0x000000000086110c>], KWARG>, 13
0009 opt_send_without_block <calldata!mid:initialize, argc:1, kw:[#<Symbol:0x000000000086110c>], FCALL|KWARG>
0011 jump 16
0013 opt_send_without_block <calldata!mid:new, argc:1, kw:[#<Symbol:0x000000000086110c>], KWARG>
0015 swap
0016 pop
0017 leave
Here is what I'm proposing:
> ruby --dump=insns -e'String.new(capacity: 123)'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,25)>
0000 opt_getconstant_path <ic:0 String> ( 1)[Li]
0002 putnil
0003 swap
0004 putobject 123
0006 opt_string_new <calldata!mid:new, argc:1, kw:[#<Symbol:0x000000000085f10c>], KWARG>, 13, 0
0010 pop
0011 jump 16
0013 opt_send_without_block <calldata!mid:new, argc:1, kw:[#<Symbol:0x000000000085f10c>], KWARG>
0015 swap
0016 pop
0017 leave
I've made a WIP pull request here: https://github.com/ruby/ruby/pull/17482
Here are a few benchmark results comparing against Ruby's master branch.
Interpreter + String.new (iterations / s, higher is better):
ips --ruby /Users/aaron/.rubies/arm64/master/bin/ruby --ruby $(which ruby) -e 'String.new'
ruby 4.1.0dev (2026-06-24T21:17:33Z master bb75c2893a) +PRISM [arm64-darwin25]
String.new: 19.811M i/s (± 0.6%, GC 6.3%)
ruby 4.1.0dev (2026-06-25T19:50:18Z new-in-ruby 52a8c02e69) +PRISM [arm64-darwin25]
String.new: 44.473M i/s (± 0.9%, GC 14.4%)
Summary
ruby /Users/aaron/.rubies/arm64/new-in-ruby/bin/ruby ran
2.24 ± 0.02 times faster than ruby /Users/aaron/.rubies/arm64/master/bin/ruby
Interpreter + `String.new(capacity: 123) (iterations / s, higher is better):
ips --ruby /Users/aaron/.rubies/arm64/master/bin/ruby --ruby $(which ruby) -e 'String.new(capacity: 123)'
ruby 4.1.0dev (2026-06-24T21:17:33Z master bb75c2893a) +PRISM [arm64-darwin25]
String.new(capacity: 123): 10.179M i/s (± 1.1%, GC 17.7%)
ruby 4.1.0dev (2026-06-25T19:50:18Z new-in-ruby 52a8c02e69) +PRISM [arm64-darwin25]
String.new(capacity: 123): 35.924M i/s (± 0.4%, GC 20.2%)
Summary
ruby /Users/aaron/.rubies/arm64/new-in-ruby/bin/ruby ran
3.53 ± 0.04 times faster than ruby /Users/aaron/.rubies/arm64/master/bin/ruby
Interpreter + String.new(encoding: "UTF-8") (iterations / s, higher is better):
ips --ruby /Users/aaron/.rubies/arm64/master/bin/ruby --ruby $(which ruby) -e 'String.new(encoding: "UTF-8")'
ruby 4.1.0dev (2026-06-24T21:17:33Z master bb75c2893a) +PRISM [arm64-darwin25]
String.new(encoding: "UTF-8"): 8.282M i/s (± 2.5%, GC 12.0%)
ruby 4.1.0dev (2026-06-25T19:50:18Z new-in-ruby 52a8c02e69) +PRISM [arm64-darwin25]
String.new(encoding: "UTF-8"): 7.770M i/s (± 1.6%, GC 2.7%)
Summary
ruby /Users/aaron/.rubies/arm64/master/bin/ruby ran
1.07 ± 0.03 times faster than ruby /Users/aaron/.rubies/arm64/new-in-ruby/bin/ruby
The first two cases easily win with this patch. Passing only an encoding may be slightly slower, but they are very close (and the allocations are decreased). Here are the same benchmarks but with YJIT enabled:
YJIT + String.new (iterations / s, higher is better):
ips --ruby /Users/aaron/.rubies/arm64/master/bin/ruby --ruby $(which ruby) -e 'String.new' --yjit
ruby 4.1.0dev (2026-06-24T21:17:33Z master bb75c2893a) +YJIT +PRISM [arm64-darwin25]
String.new: 31.008M i/s (± 0.8%, GC 9.8%)
ruby 4.1.0dev (2026-06-25T19:50:18Z new-in-ruby 52a8c02e69) +YJIT +PRISM [arm64-darwin25]
String.new: 97.603M i/s (± 0.8%, GC 30.9%)
Summary
ruby /Users/aaron/.rubies/arm64/new-in-ruby/bin/ruby ran
3.15 ± 0.04 times faster than ruby /Users/aaron/.rubies/arm64/master/bin/ruby
YJIT + `String.new(capacity: 123) (iterations / s, higher is better):
ips --ruby /Users/aaron/.rubies/arm64/master/bin/ruby --ruby $(which ruby) -e 'String.new(capacity: 123)' --yjit
ruby 4.1.0dev (2026-06-24T21:17:33Z master bb75c2893a) +YJIT +PRISM [arm64-darwin25]
String.new(capacity: 123): 12.986M i/s (± 1.3%, GC 22.2%)
ruby 4.1.0dev (2026-06-25T19:50:18Z new-in-ruby 52a8c02e69) +YJIT +PRISM [arm64-darwin25]
String.new(capacity: 123): 77.827M i/s (± 0.4%, GC 42.0%)
Summary
ruby /Users/aaron/.rubies/arm64/new-in-ruby/bin/ruby ran
5.99 ± 0.08 times faster than ruby /Users/aaron/.rubies/arm64/master/bin/ruby
YJIT + String.new(encoding: "UTF-8") (iterations / s, higher is better):
ips --ruby /Users/aaron/.rubies/arm64/master/bin/ruby --ruby $(which ruby) -e 'String.new(encoding: "UTF-8")' --yjit
ruby 4.1.0dev (2026-06-24T21:17:33Z master bb75c2893a) +YJIT +PRISM [arm64-darwin25]
String.new(encoding: "UTF-8"): 9.909M i/s (± 0.5%, GC 13.9%)
ruby 4.1.0dev (2026-06-25T19:50:18Z new-in-ruby 52a8c02e69) +YJIT +PRISM [arm64-darwin25]
String.new(encoding: "UTF-8"): 11.916M i/s (± 0.7%, GC 3.8%)
Summary
ruby /Users/aaron/.rubies/arm64/new-in-ruby/bin/ruby ran
1.20 ± 0.01 times faster than ruby /Users/aaron/.rubies/arm64/master/bin/ruby
The first two benchmarks are much faster than the master branch, and the last benchmark is faster because I moved initialize to Ruby. I've implemented this in ZJIT too, but I'm not going to post the numbers because they are very similar to YJIT for this micro benchmark.
Updated by ko1 (Koichi Sasada) 4 days ago
Does it affect app performance?
Updated by tenderlovemaking (Aaron Patterson) 3 days ago
ko1 (Koichi Sasada) wrote in #note-1:
Does it affect app performance?
I don't think it slows down any applications since all cases of String.new are faster. But I also think it's rare for any application to call String.new so I doubt there is any performance improvement in railsbench for example.
The reason I'm interested in this is because we have an FFI extension that takes a maximum sized string buffer as input, writes bytes to it, and then sets the length.
This isn't the exact code, but it looks like this (the real code is here:
def decompress(input)
metadata = Lz4FlexExt.get_decompression_metadata(input)
expected_size = metadata & 0xffffffff
output = String.new(capacity: expected_size)
Lz4FlexExt.decompress_payload_into(input, data_offset, expected_size, output)
output
end
The FFI code writes bytes in to the output buffer and sets the length (it doesn't always use the whole buffer). Profiling the above code, we found String.new to be the bottleneck (when input is small), and that String.new(capacity: xxx) is very slow compared to calling rb_str_buf_new for example. I want to keep as much code in Ruby as possible so I want to speed up String.new(capacity: xxx) instead of calling another C function.
Here are some benchmark results for the library. The first one is using String.new(capacity:) and compares the master branch vs my proposal. The second is comparing the proposal vs using a C extension.
1. Effect of the patch — new-in-ruby ÷ master (both using String.new(capacity:))¶
| operation | <1 KiB | 1–64 KiB | ≥64 KiB |
|---|---|---|---|
| compress | 1.33x | 1.33x | 1.32x |
| decompress | 1.34x | 1.04x | 0.95x |
2. Patched String.new(capacity:) vs C extension — on new-in-ruby, YJIT¶
| operation | <1 KiB | 1–64 KiB | ≥64 KiB |
|---|---|---|---|
| compress | 1.01x | 1.01x | 1.00x |
| decompress | 1.00x | 0.99x | 1.02x |
With my proposal, it speeds up the Ruby case by ~1.34x for small payloads, and it performs about the same as using a C extension.
Updated by tenderlovemaking (Aaron Patterson) 3 days ago
BTW, this proposal also fixes a possible regression (though I don't think anyone cares about this case).
Given this code:
class String
def initialize foo:
p "hi"
super()
end
end
String.new(foo: 123)
Ruby 3.2:
> ruby -v test.rb
ruby 3.2.10 (2026-01-14 revision a3a6d25788) [arm64-darwin25]
test.rb:2: warning: method redefined; discarding old initialize
"hi"
Ruby 4.0:
> ruby -v test.rb
ruby 4.0.5 (2026-05-20 revision 64336ffd0e) +PRISM [arm64-darwin25]
test.rb:2: warning: method redefined; discarding old initialize
test.rb:8:in 'String.new': unknown keyword: :foo (ArgumentError)
caller: test.rb:8
| String.new(foo: 123)
^^^^
from test.rb:8:in '<main>'
Updated by headius (Charles Nutter) 3 days ago
Is your benchmark published somewhere?
Updated by tenderlovemaking (Aaron Patterson) 3 days ago
Updated by byroot (Jean Boussier) 3 days ago
I support this. When working with low-level code that need to buffer IOs for parsing, it's very useful to be in control of the buffer size.
However what I found is that the overhead of argument handling when calling String.new(capacity: ...,encoding: ...) often waste more performance than what is gained by right sizing the buffer.
The problem is the same with Hash.new(capacity:).
Updated by ko1 (Koichi Sasada) about 13 hours ago
How about to introduce String.new_buffer(capacity) or some other name if it is important for the performance?
Updated by byroot (Jean Boussier) about 12 hours ago
How about to introduce String.new_buffer(capacity)
That was the original proposal back in [Feature #12024], but String.new(**) was considered more composable.
Personally, I must say that if we can make the existing API faster, it's much more convenient for gems and such, as we can just accept it's a bit slower on older rubies, rather than needing some respond_to? or method_defined? switch.