Project

General

Profile

Actions

Feature #21254

open

Inlining Class#new

Added by tenderlovemaking (Aaron Patterson) 7 days ago. Updated 5 days ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:121565]

Description

We would like to propose inlining YARV bytecode for speeding up object allocations, specifically inlining the Class#new method. In order to support inlining this method, we would like to introduce a new YARV instruction opt_new. This instruction will allocate an object if the default allocator is not overridden, otherwise it will jump to a “slow path” for calling a method.

Class#new especially benefits from inlining for two reasons:

  1. Calling initialize directly means we don't need to allocate a temporary hash for keyword arguments
  2. We are able to use an inline cache when calling the initialize method

The patch can be found here, but please find implementation details below.

Implementation Details

This patch modifies the compiler to emit special instructions when it sees a callsite that uses “new”. Before this patch, calling Object.new would result in bytecode like this:

ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path                   <ic:0 Object>             (   1)[Li]
0002 opt_send_without_block                 <calldata!mid:new, argc:0, ARGS_SIMPLE>
0004 leave

With this patch, the bytecode looks like this:

./ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path                   <ic:0 Object>             (   1)[Li]
0002 putnil
0003 swap
0004 opt_new                                <calldata!mid:new, argc:0, ARGS_SIMPLE>, 11
0007 opt_send_without_block                 <calldata!mid:initialize, argc:0, FCALL|ARGS_SIMPLE>
0009 jump                                   14
0011 opt_send_without_block                 <calldata!mid:new, argc:0, ARGS_SIMPLE>
0013 swap
0014 pop
0015 leave

The new opt_new instruction checks whether or not the new implementation is the default “allocator” implementation. If it is the default allocator, then the instruction will allocate the object and call initialize passing parameters to initialize but not to new. If the method is not the default allocator implementation, it will jump to the normal method dispatch instructions.
Performance Improvements
This patch improves performance of all allocations that use the normal “new” method for allocation. Here are two examples (all of these benchmarks compare Ruby 3.4.2 against Ruby master with inlining patch):

A simple Object.new in a hot loop improves by about 24%:

hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
  Time (mean ± σ):     436.6 ms ±   3.3 ms    [User: 432.3 ms, System: 3.8 ms]
  Range (min … max):   430.5 ms … 442.6 ms    10 runs
 
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
  Time (mean ± σ):     351.1 ms ±   3.6 ms    [User: 347.4 ms, System: 3.3 ms]
  Range (min … max):   343.9 ms … 357.4 ms    10 runs
 
Summary
  ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end' ran
    1.24 ± 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'

Using a single keyword argument is improved by about 72%:

> hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
  Time (mean ± σ):      1.082 s ±  0.007 s    [User: 1.074 s, System: 0.008 s]
  Range (min … max):    1.071 s …  1.091 s    10 runs
 
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
  Time (mean ± σ):     627.6 ms ±   4.8 ms    [User: 622.6 ms, System: 4.5 ms]
  Range (min … max):   622.1 ms … 637.2 ms    10 runs
 
Summary
  ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end' ran
    1.72 ± 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'

The performance increase depends on the number and type of parameters passed to initialize. For example, an initialize method that takes 3 parameters can see a speed improvement of ~3x:

aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
  def initialize a:, b:, c:
  end
end
i = 0
while i < 10_000_000
  Foo.new(a: 1, b: 2, c: 3)
  Foo.new(a: 1, b: 2, c: 3)
  Foo.new(a: 1, b: 2, c: 3)
  i += 1
end
aaron@tc-lan-adapter ~/g/ruby (inline-new)> hyperfine "ruby --disable-gems test.rb" "./ruby --disable-gems test.rb"
Benchmark 1: ruby --disable-gems test.rb
  Time (mean ± σ):      3.700 s ±  0.033 s    [User: 3.681 s, System: 0.018 s]
  Range (min … max):    3.636 s …  3.751 s    10 runs
 
Benchmark 2: ./ruby --disable-gems test.rb
  Time (mean ± σ):      1.182 s ±  0.013 s    [User: 1.173 s, System: 0.008 s]
  Range (min … max):    1.165 s …  1.203 s    10 runs
 
Summary
  ./ruby --disable-gems test.rb ran
    3.13 ± 0.04 times faster than ruby --disable-gems test.rb

One factor in the performance increase for keyword arguments is that inlining is able to eliminate the hash allocation when calling “through” the C implementation of Class#new:

aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
  def initialize a:, b:, c:
  end
end
def allocs
  x = GC.stat(:total_allocated_objects)
  yield
  GC.stat(:total_allocated_objects) - x
end
def test; allocs { Foo.new(a: 1, b: 2, c: 3) }; end
test
p test
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
2
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
1

Memory Increase

Of course this patch is not “free”. Inlining the method call adds extra YARV instructions. We estimate this patch increases new call sites by about 122 bytes:

aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
  def initialize
  end
end
def test
  Foo.new
end
puts ObjectSpace.memsize_of(RubyVM::InstructionSequence.of(method(:test)))
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
544
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
656

We’ve tested this in Shopify’s monolith, comparing Ruby 3.4.2 and Ruby 3.5+inlining, and it seems to increase total ISEQ memesize by about 3.8mb (roughly 0.5% increase in ISEQ size):

irb(main):001> 737191972 - 733354388
=> 3837584

However, Ruby 3.5 has more overall ISEQ objects than Ruby 3.4.2:

aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-inline.txt
  789545 sizes-inline.txt
aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-3.4.txt
  789479 sizes-3.4.txt

We see total heap size as reported by memsize to only increase by about 1MB:

irb(main):001> 3981075617 - 3979926505
=> 1149112

Changes to caller

This patch changes caller reporting in the initialize method:

aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"

class Foo
  def initialize
    puts caller
  end
end

def test
  Foo.new
end

test
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
test.rb:10:in 'Class#new'
test.rb:10:in 'Object#test'
test.rb:13:in '<main>'
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
test.rb:10:in 'Object#test'
test.rb:13:in '<main>'

As you can see in the above output, the Class#new frame is eliminated. I'm not sure if anyone really cares about this frame. We've tested this patch in Shopify's CI, and didn't find any code that depends on this callstack. However, this patch did require changes to ERB for emitting warnings.

That said, eliminating the frame also has the side-effect of making some of our allocation tracing tools a little more useful:

aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"

class Foo
  def test
    Object.new
  end
end

ObjectSpace.trace_object_allocations do
  obj = Foo.new.test
  puts ObjectSpace.allocation_class_path(obj)
  puts ObjectSpace.allocation_method_id(obj)
end
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
Class
new
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-07T19:40:59Z inline-new 2cf0efa18e) +PRISM [arm64-darwin24]
Foo
test

Before inlining, ObjectSpace would report the allocation class path and method id as Class#new which isn't very helpful. With the inlining patch, we can see that the object is allocated in Foo#test.

Summary

I think the overall memory increase is modest, and the change to caller is acceptable especially given the performance increase this patch provides.

Updated by ko1 (Koichi Sasada) 6 days ago

swap is remained?

Updated by tenderlovemaking (Aaron Patterson) 6 days ago

ko1 (Koichi Sasada) wrote in #note-1:

swap is remained?

I made a patch to remove swap but it makes Coverage tests break. I think we can eliminate the instruction but it will take a little more time.

Updated by Earlopain (Earlopain _) 6 days ago

As you can see in the above output, the Class#new frame is eliminated. I'm not sure if anyone really cares about this frame

Sorry if this is a dumb question, but wouldn't this also affect warn in general, similar to what you did for erb?

class Foo
  def initialize
    warn "don't call me like this!", uplevel: 1
  end
end

def bar
  Foo.new
end

bar

It currently points to Foo.new, would this change this? I wanted to try this out on the playground link myself but it seems broken ):

internal:gem_prelude:2:in 'Kernel#require': Not supported @ rb_check_realpath_internal - /usr/local/lib/ruby/3.5.0+0/rubygems.rb (Errno::ENOTSUP)
from internal:gem_prelude:2:in 'internal:gem_prelude'

Updated by tenderlovemaking (Aaron Patterson) 6 days ago

Earlopain (Earlopain _) wrote in #note-3:

As you can see in the above output, the Class#new frame is eliminated. I'm not sure if anyone really cares about this frame

Sorry if this is a dumb question, but wouldn't this also affect warn in general, similar to what you did for erb?

Not a dumb question. :)

class Foo
  def initialize
    warn "don't call me like this!", uplevel: 1
  end
end

def bar
  Foo.new
end

bar

It currently points to Foo.new, would this change this? I wanted to try this out on the playground link myself but it seems broken ):

It doesn't impact this case exactly, but I think it could impact something. The example you gave behaves the same way with or without inlining:

aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
  def initialize
    warn "don't call me like this!", uplevel: 1
  end
end

def bar
  Foo.new
end

bar
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
test.rb:8: warning: don't call me like this!
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./miniruby -v test.rb
ruby 3.5.0dev (2025-04-08T15:56:43Z inline-new a9a45360ce) +PRISM [arm64-darwin24]
test.rb:8: warning: don't call me like this!

However, I think the warning uplevel will skip C frames when counting. Since Class#new was a C frame, it would have been skipped by the uplevel anyway. I have to double check the implementation, but I think that's what is going on.

Updated by tenderlovemaking (Aaron Patterson) 6 days ago

Btw, @ko1 (Koichi Sasada) came up with this idea, so I want to say thanks to him.

Updated by jez (Jake Zimmerman) 5 days ago

@tenderlovemaking (Aaron Patterson) Question about an extension to the current implementation.

We have a fair amount of code that looks like this:

class HoldsEvenNumbers
  def initialize(even)
    @even = even
  end

  private_class_method :new

  def self.make(n)
    return nil unless n.even?
    
    self.new(n)
  end
end

I believe that in this snippet, the call to self.new(n) would not be equal to rb_class_new_instance_pass_kw when doing the vm_method_cfunc_is search, because there will have been another method entry created by the call to private_class_method :new.

I'm curious: could we add a second check after this check such that if the method entry is VM_METHOD_TYPE_ZSUPER, we keep searching to find the super method that would be called and see if the super method is rb_class_new_instance_pass_kw? Plus also inline the logic to check whether the private method call is okay in that second condition (before the final deoptimization jump).

Updated by tenderlovemaking (Aaron Patterson) 5 days ago

jez (Jake Zimmerman) wrote in #note-6:

I'm curious: could we add a second check after this check such that if the method entry is VM_METHOD_TYPE_ZSUPER, we keep searching to find the super method that would be called and see if the super method is rb_class_new_instance_pass_kw? Plus also inline the logic to check whether the private method call is okay in that second condition (before the final deoptimization jump).

Yes, I think we can add a special check for ZSUPER methods, but only for call sites where the receiver is self:

class A
  private_class_method :new

  def self.make
    new # fast path
  end
end

class B
  private_class_method :new

  def self.make
    self.new # fast path
  end
end

class C
  def self.make m
    m.new # fast path depends on m
  end
end

C.make(Object) # Fast Path
C.make(A) # Slow Path (exception)
C.make(B) # Slow Path (also exception)

I made a patch for it here, but I haven't tested it in CI yet.

Updated by tenderlovemaking (Aaron Patterson) 5 days ago

tenderlovemaking (Aaron Patterson) wrote in #note-7:

I made a patch for it here, but I haven't tested it in CI yet.

@jhawthorn (John Hawthorn) pointed out a problem to me with this patch that I didn't think about.

If we consider this code:

class A
  private_class_method :new

  def self.make
    new # fast path
  end
end
A.make
A.make

When we try to look up the new method it will fill out the inline cache with the ZSUPER entry. But since the ZSUPER entry won't pass the rb_class_new_instance_pass_kw check, we'll end up looking up the new method in from the superclass, which will fill out the inline cache with the method from the superclass.

On the next call to A.make, it will miss cache again because the receiver is A, basically repeating the above steps. In this case the inline cache will keep ping-ponging between the ZSUPER method and the superclasses method; never hitting.

Maybe we can figure out a way to do this in the future, but I'm not sure if it's a good idea right now. This particular case might actually be better handled by the JIT than the interpreter.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0