https://bugs.ruby-lang.org/https://bugs.ruby-lang.org/favicon.ico?17113305112022-05-11T21:04:47ZRuby Issue Tracking SystemRuby master - Feature #18776: Object Shapeshttps://bugs.ruby-lang.org/issues/18776?journal_id=975632022-05-11T21:04:47Ztenderlovemaking (Aaron Patterson)tenderlove@ruby-lang.org
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/97563/diff?detail_id=62492">diff</a>)</li></ul> Ruby master - Feature #18776: Object Shapeshttps://bugs.ruby-lang.org/issues/18776?journal_id=975642022-05-11T23:54:18Ztenderlovemaking (Aaron Patterson)tenderlove@ruby-lang.org
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/97564/diff?detail_id=62493">diff</a>)</li></ul> Ruby master - Feature #18776: Object Shapeshttps://bugs.ruby-lang.org/issues/18776?journal_id=975662022-05-12T01:24:01Zko1 (Koichi Sasada)
<ul></ul><p>Great patch.<br>
I'm looking forward to seeing evaluation results.</p>
<p>Questions:</p>
<ul>
<li>how to use parent id?</li>
<li>how to find next id with additional ivar?</li>
</ul> Ruby master - Feature #18776: Object Shapeshttps://bugs.ruby-lang.org/issues/18776?journal_id=975672022-05-12T12:41:05Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><blockquote>
<p>We want object shapes to be enabled on 32 bit systems and 64 bit systems so that limits us to the bottom 32 bits of the Object header.</p>
</blockquote>
<p>Might be a silly question, but how popular are 32bits systems these days? Would it be acceptable to make objects a bit bigger on 32 bits systems so that both 32bits and 64bits Ruby have a 32bit shape IDs?</p> Ruby master - Feature #18776: Object Shapeshttps://bugs.ruby-lang.org/issues/18776?journal_id=975702022-05-12T15:56:14Zjemmai (Jemma Issroff)
<ul></ul><p>Thanks for the feedback, Koichi.</p>
<p>ko1 (Koichi Sasada) wrote in <a href="#note-3">#note-3</a>:</p>
<blockquote>
<ul>
<li>how to use parent id?</li>
</ul>
</blockquote>
<p>The <code>rb_shape</code> type is as follows:</p>
<pre><code class="C syntaxhl" data-language="C"><span class="k">struct</span> <span class="n">rb_shape</span> <span class="p">{</span>
<span class="n">VALUE</span> <span class="n">flags</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">rb_shape</span> <span class="o">*</span> <span class="n">parent</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">rb_id_table</span> <span class="o">*</span> <span class="n">edges</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">rb_id_table</span> <span class="o">*</span> <span class="n">iv_table</span><span class="p">;</span>
<span class="n">ID</span> <span class="n">edge_name</span><span class="p">;</span>
<span class="p">};</span>
</code></pre>
<p><code>parent</code> is a pointer on the shape itself, and we use it primarily for GC</p>
<blockquote>
<ul>
<li>how to find next id with additional ivar?</li>
</ul>
</blockquote>
<p>When we add a new ivar, if there is no transition we have to find a new ID. Right now we're doing a linear scan of available IDs. Once we're confident in shape ID GC, we'll switch the algorithm to something more efficient like using a bitmap</p> Ruby master - Feature #18776: Object Shapeshttps://bugs.ruby-lang.org/issues/18776?journal_id=975712022-05-12T15:59:17Ztenderlovemaking (Aaron Patterson)tenderlove@ruby-lang.org
<ul></ul><p>byroot (Jean Boussier) wrote in <a href="#note-4">#note-4</a>:</p>
<blockquote>
<blockquote>
<p>We want object shapes to be enabled on 32 bit systems and 64 bit systems so that limits us to the bottom 32 bits of the Object header.</p>
</blockquote>
<p>Might be a silly question, but how popular are 32bits systems these days? Would it be acceptable to make objects a bit bigger on 32 bits systems so that both 32bits and 64bits Ruby have a 32bit shape IDs?</p>
</blockquote>
<p>I'm not sure how popular 32 bit systems are, especially where high performance is a requirement. But I do think the broader question of "how much work should we do to support 32 bit systems?" is something we should think about. For now I think we can make shapes work well on 32 bit machines so I'm not too worried about it at this point (though it definitely would be easier if we had more than 16 bits 😆)</p> Ruby master - Feature #18776: Object Shapeshttps://bugs.ruby-lang.org/issues/18776?journal_id=975732022-05-12T18:19:20Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><blockquote>
<p>it definitely would be easier if we had more than 16 bits</p>
</blockquote>
<p>Yeah, my worry is that while the Shopify monolith is probably among the bigger codebases, ~40k out of ~65k is really not that much leeway.</p> Ruby master - Feature #18776: Object Shapeshttps://bugs.ruby-lang.org/issues/18776?journal_id=975742022-05-12T18:46:32Ztenderlovemaking (Aaron Patterson)tenderlove@ruby-lang.org
<ul></ul><p>byroot (Jean Boussier) wrote in <a href="#note-7">#note-7</a>:</p>
<blockquote>
<blockquote>
<p>it definitely would be easier if we had more than 16 bits</p>
</blockquote>
<p>Yeah, my worry is that while the Shopify monolith is probably among the bigger codebases, ~40k out of ~65k is really not that much leeway.</p>
</blockquote>
<p>When we measured on Shopify core it was before we had started implementing shape GC. There were ~40k shapes, but that was the total number of shapes ever seen. Hopefully with shape GC we'll see a lower number of live shapes.</p> Ruby master - Feature #18776: Object Shapeshttps://bugs.ruby-lang.org/issues/18776?journal_id=975762022-05-12T20:05:11Zmasterleep (Bill Lipa)dojo@masterleep.com
<ul></ul><p>If you call memoized methods in a different order, would that cause instances of the same class to have multiple shapes?</p> Ruby master - Feature #18776: Object Shapeshttps://bugs.ruby-lang.org/issues/18776?journal_id=975772022-05-12T20:20:34Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><blockquote>
<p>If you call memoized methods in a different order, would that cause instances of the same class to have multiple shapes?</p>
</blockquote>
<p>Yes.</p> Ruby master - Feature #18776: Object Shapeshttps://bugs.ruby-lang.org/issues/18776?journal_id=975782022-05-13T01:06:59Zko1 (Koichi Sasada)
<ul></ul><p>jemmai (Jemma Issroff) wrote in <a href="#note-5">#note-5</a>:</p>
<blockquote>
<p>When we add a new ivar, if there is no transition we have to find a new ID. Right now we're doing a linear scan of available IDs. Once we're confident in shape ID GC, we'll switch the algorithm to something more efficient like using a bitmap</p>
</blockquote>
<p>Ah, my question was how to find an existing transition.</p> Ruby master - Feature #18776: Object Shapeshttps://bugs.ruby-lang.org/issues/18776?journal_id=975792022-05-13T01:11:02Zko1 (Koichi Sasada)
<ul></ul><blockquote>
<p>struct rb_id_table * edges;</p>
</blockquote>
<p>and I understand <code>edges</code> manages the next transitions. Thanks.</p> Ruby master - Feature #18776: Object Shapeshttps://bugs.ruby-lang.org/issues/18776?journal_id=986862022-08-17T17:14:12Zjemmai (Jemma Issroff)
<ul><li><strong>File</strong> <a href="/attachments/9341">object-shapes.patch</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/9341/object-shapes.patch">object-shapes.patch</a> added</li></ul><a name="Object-Shapes-Update"></a>
<h1 >Object Shapes Update<a href="#Object-Shapes-Update" class="wiki-anchor">¶</a></h1>
<p>We are writing with an update on the Object Shapes implementation, and to ask what needs to be done before we can merge our work. I have continued work on this alongisde Aaron and Eileen.</p>
<a name="Code-changes"></a>
<h2 >Code changes<a href="#Code-changes" class="wiki-anchor">¶</a></h2>
<p><a href="https://github.com/ruby/ruby/pull/6248" class="external">These</a> are our proposed code changes to implement Object Shapes in CRuby.</p>
<p>This patch adds an object shape implementation. Each object has a shape, which represents attributes of the object, such as which slots ivars are stored in and whether objects are frozen or not. The inline caches are updated to use shape IDs as the key, rather than the class of the object. This means we don't have to read the class from the object to check IC validity. It also allows more cache hits in some cases, and will allow JITs to optimize instance variable reading and writing.</p>
<p>The patch currently limits the number of available shape IDs to 65,536 (using 16 bits). We created a new IMEMO type that represents the shape, so shapes can be garbage collected. Collected shape IDs can be reused later.</p>
<a name="CPU-performance"></a>
<h2 >CPU performance:<a href="#CPU-performance" class="wiki-anchor">¶</a></h2>
<p>We measured performance with microbenchmarks, <a href="https://github.com/k0kubun/railsbench" class="external">RailsBench</a>, and <a href="https://github.com/Shopify/yjit-bench" class="external">YJIT bench</a>. Here are the performance metrics we gathered.</p>
<p>These are all microbenchmarks which measure ivar performance:</p>
<pre><code>$ make benchmark ITEM=vm_ivar
compare-ruby: ruby 3.2.0dev (2022-08-16T15:58:56Z master ac890ec062) [arm64-darwin21]
built-ruby: ruby 3.2.0dev (2022-08-16T20:12:55Z object-shapes-prot.. 872fa488c3) [arm64-darwin21]
# Iteration per second (i/s)
| |compare-ruby|built-ruby|
|:--------------------------|-----------:|---------:|
|vm_ivar | 98.231M| 102.161M|
| | -| 1.04x|
|vm_ivar_embedded_obj_init | 33.351M| 33.331M|
| | 1.00x| -|
|vm_ivar_extended_obj_init | 25.055M| 26.265M|
| | -| 1.05x|
|vm_ivar_generic_get | 18.374M| 17.215M|
| | 1.07x| -|
|vm_ivar_generic_set | 12.361M| 14.537M|
| | -| 1.18x|
|vm_ivar_of_class | 8.378M| 8.928M|
| | -| 1.07x|
|vm_ivar_of_class_set | 9.485M| 10.264M|
| | -| 1.08x|
|vm_ivar_set | 89.411M| 91.632M|
| | -| 1.02x|
|vm_ivar_init_subclass | 6.104M| 12.928M|
| | -| 2.12x|
</code></pre>
<p>To address the outliers above:</p>
<ul>
<li>
<code>vm_ivar_generic_set</code> is faster because this patch adds inline caches to generic ivars, which did not exist previously</li>
<li>
<code>vm_ivar_init_subclass</code> is significantly faster because, with shapes, subclasses can hit caches (as class is no longer part of the cache key)</li>
</ul>
<p>Object Shapes and Ruby master perform roughly the same on <a href="https://github.com/k0kubun/railsbench" class="external">RailsBench</a>.</p>
<p>On the following measurement, Ruby master had 1852.1 requests per second, while Object Shapes had 1842.7 requests per second.</p>
<pre><code>$ RAILS_ENV=production bin/bench
ruby 3.2.0dev (2022-08-15T14:00:03Z master 0264424d58) [arm64-darwin21]
1852.1
</code></pre>
<pre><code>$ RAILS_ENV=production bin/bench
ruby 3.2.0dev (2022-08-15T15:20:22Z object-shapes-prot.. d3dbefd6cd) [arm64-darwin21]
1842.7
</code></pre>
<a name="Memory-performance"></a>
<h2 >Memory performance<a href="#Memory-performance" class="wiki-anchor">¶</a></h2>
<p>Each Ruby object contains a shape ID. The shape ID corresponds to an index in an array. We can easily look up the shape object given a shape ID. Currently, we have a fixed size array which stores pointers to all active shapes (or NULL in the case that the shape is yet to be used). That array is ~64k * sizeof(uintptr_t) (about 500kb) and is currently a fixed size overhead for the Ruby process.</p>
<p>Running an empty Ruby script, we can see this overhead. For instance:</p>
<p>On Ruby master:</p>
<pre><code>$ /usr/bin/time -l ruby -v -e' '
ruby 3.2.0dev (2022-08-15T14:00:03Z master 0264424d58) [arm64-darwin21]
28639232 maximum resident set size
</code></pre>
<p>With the shapes branch:</p>
<pre><code>$ /usr/bin/time -l ./ruby -v -e' '
ruby 3.2.0dev (2022-08-15T15:20:22Z object-shapes-prot.. d3dbefd6cd) [arm64-darwin21]
28917760 maximum resident set size
</code></pre>
<p>This is roughly a 0.97% memory increase on an empty Ruby script. Obviously, on bigger Ruby processes, it would represent an even smaller memory increase.</p>
<a name="YJIT-Statistics"></a>
<h2 >YJIT Statistics<a href="#YJIT-Statistics" class="wiki-anchor">¶</a></h2>
<p>We also ran YJIT-bench and got the following results:</p>
<p>on Ruby master:</p>
<pre><code>end_time="2022-08-17 09:31:36 PDT (-0700)"
yjit_opts=""
ruby_version="ruby 3.2.0dev (2022-08-16T15:58:56Z master ac890ec062) [x86_64-linux]"
git_branch="master"
git_commit="ac890ec062"
------------- ----------- ---------- --------- ---------- ----------- ------------
bench interp (ms) stddev (%) yjit (ms) stddev (%) interp/yjit yjit 1st itr
30k_ifelse 2083.0 0.1 203.6 0.0 10.23 0.80
30k_methods 5140.1 0.0 476.7 0.1 10.78 3.95
activerecord 188.1 0.1 99.5 0.2 1.89 1.23
binarytrees 804.8 0.1 409.2 1.1 1.97 1.93
cfunc_itself 232.5 2.4 43.3 1.5 5.36 5.34
chunky_png 2316.9 0.2 757.3 0.3 3.06 2.86
erubi 412.1 0.4 281.3 1.0 1.46 1.47
erubi_rails 31.1 2.2 17.4 2.7 1.78 0.33
fannkuchredux 11414.6 0.2 2773.5 1.3 4.12 1.00
fib 591.8 1.1 41.7 4.5 14.20 13.93
getivar 234.2 3.1 23.5 0.1 9.95 1.00
hexapdf 4755.7 1.0 2517.3 3.0 1.89 1.51
keyword_args 520.7 0.6 54.6 0.2 9.55 9.24
lee 2274.1 0.2 1133.3 0.2 2.01 1.98
liquid-render 296.7 0.3 139.3 2.8 2.13 1.46
mail 212.9 0.1 127.9 0.1 1.66 0.72
nbody 225.4 0.2 78.3 0.2 2.88 2.70
optcarrot 14592.1 0.7 4072.8 0.3 3.58 3.43
psych-load 3947.8 0.0 2075.5 0.1 1.90 1.88
railsbench 2826.0 0.6 1774.4 1.9 1.59 1.26
respond_to 424.3 0.2 154.5 3.1 2.75 2.76
rubykon 22545.1 0.4 6993.5 1.3 3.22 3.24
setivar 185.9 5.6 97.0 0.0 1.92 1.00
str_concat 123.1 0.9 28.6 2.0 4.31 3.35
------------- ----------- ---------- --------- ---------- ----------- ------------
Legend:
- interp/yjit: ratio of interp/yjit time. Higher is better. Above 1 represents a speedup.
- 1st itr: ratio of interp/yjit time for the first benchmarking iteration.
</code></pre>
<p>with the shapes branch:</p>
<pre><code>end_time="2022-08-16 13:56:32 PDT (-0700)"
yjit_opts=""
ruby_version="ruby 3.2.0dev (2022-08-15T18:35:34Z object-shapes-prot.. 51a23756c3) [x86_64-linux]"
git_branch="object-shapes-prototyping"
git_commit="51a23756c3"
------------- ----------- ---------- --------- ---------- ----------- ------------
bench interp (ms) stddev (%) yjit (ms) stddev (%) interp/yjit yjit 1st itr
30k_ifelse 2135.2 0.0 340.1 0.1 6.28 0.95
30k_methods 5180.7 0.0 906.2 0.1 5.72 3.56
activerecord 189.2 0.1 174.5 0.1 1.08 0.83
binarytrees 783.2 1.0 438.7 2.5 1.79 1.82
cfunc_itself 225.2 1.6 44.0 0.6 5.11 5.01
chunky_png 2394.9 0.2 1657.0 0.2 1.45 1.44
erubi 418.1 0.5 284.3 1.1 1.47 1.45
erubi_rails 31.6 1.5 26.2 2.1 1.21 0.34
fannkuchredux 12208.5 0.1 2821.6 0.4 4.33 0.99
fib 565.7 0.3 41.3 0.1 13.69 13.59
getivar 247.6 0.1 244.9 2.0 1.01 1.02
hexapdf 4961.0 1.6 4926.1 0.9 1.01 0.94
keyword_args 499.7 0.8 57.0 0.4 8.77 8.65
lee 2360.0 0.6 2138.6 0.6 1.10 1.11
liquid-render 294.7 0.7 274.9 1.4 1.07 0.91
mail 216.6 0.1 157.7 0.7 1.37 0.70
nbody 232.7 0.2 237.2 0.5 0.98 0.99
optcarrot 15095.8 0.7 18309.2 0.5 0.82 0.83
psych-load 4174.5 0.1 3707.9 0.1 1.13 1.13
railsbench 2923.7 0.8 2548.4 1.4 1.15 0.98
respond_to 409.2 0.3 162.6 1.7 2.52 2.52
rubykon 22554.1 0.7 20160.6 0.9 1.12 1.10
setivar 249.6 0.1 169.5 0.1 1.47 0.99
str_concat 137.8 0.8 29.0 2.4 4.75 3.50
------------- ----------- ---------- --------- ---------- ----------- ------------
Legend:
- interp/yjit: ratio of interp/yjit time. Higher is better. Above 1 represents a speedup.
- 1st itr: ratio of interp/yjit time for the first benchmarking iteration.
</code></pre>
<p>We are seeing some variations in YJIT benchmark numbers, and are working on addressing them.</p>
<a name="32-bit-architectures"></a>
<h2 >32 bit architectures<a href="#32-bit-architectures" class="wiki-anchor">¶</a></h2>
<p>We're storing the shape ID for T_OBJECT types in the top 32 bits of the flags field (sharing space with the ractor ID). Consequently 32 bit machines do not benefit from this patch. This patch makes 32 bit machines always miss on inline caches.</p>
<a name="Instance-variables-with-ID-0"></a>
<h2 >Instance variables with ID == 0<a href="#Instance-variables-with-ID-0" class="wiki-anchor">¶</a></h2>
<p>This is minor, but we also do not support instance variables whose ID is 0 because the outgoing edge tables are <code>id_table</code>s which do not support <code>0</code> as a key. There is <a href="https://github.com/ruby/ruby/blob/ac890ec0624e3d8a44d85d67127bc94322caa34e/test/-ext-/marshal/test_internal_ivar.rb#L9-L21" class="external">one test for this feature</a>, and we have marked it as pending in this patch.</p>
<a name="Merging"></a>
<h2 >Merging<a href="#Merging" class="wiki-anchor">¶</a></h2>
<p>We think this feature is ready to merge. Please give us feedback, and let us know if it is possible to merge now. If it's not possible, please let us know what needs to be improved so that we can merge.</p>
<a name="Future-work"></a>
<h2 >Future work<a href="#Future-work" class="wiki-anchor">¶</a></h2>
<p>We plan to work next on speeding up the class instance variables. We will implement caching for this, and see the full benefits of object shapes in this case.</p> Ruby master - Feature #18776: Object Shapeshttps://bugs.ruby-lang.org/issues/18776?journal_id=986872022-08-17T18:05:34Zmaximecb (Maxime Chevalier-Boisvert)maxime.chevalierboisvert@shopify.com
<ul></ul><blockquote>
<p>These are our proposed code changes to implement Object Shapes in CRuby.</p>
</blockquote>
<p>I think it would be a good idea to open a draft pull request so that it's easier to look at the diff and comment on it.</p>
<blockquote>
<p>We also ran YJIT-bench and got the following results:</p>
</blockquote>
<p>Can you guys investigate a bit why there are large slowdowns with YJIT?</p>
<p>I believe you said you had written code to make use of object shapes in YJIT. I would have expected the performance difference to be small. Since it's so big,<br>
I suspect that maybe there are a large number of side-exits happening, or something like that.</p>
<p>We could pair over it tomorrow if that's helpful. It's probably not difficult to fix.</p> Ruby master - Feature #18776: Object Shapeshttps://bugs.ruby-lang.org/issues/18776?journal_id=986882022-08-17T19:03:33Zjemmai (Jemma Issroff)
<ul></ul><blockquote>
<p>I think it would be a good idea to open a draft pull request so that it's easier to look at the diff and comment on it.</p>
</blockquote>
<p>Good idea, <a href="https://github.com/ruby/ruby/pull/6248" class="external">here is a draft PR</a> and updated it above as well</p>
<blockquote>
<p>Can you guys investigate a bit why there are large slowdowns with YJIT?</p>
</blockquote>
<p>We realized we accidentally ran these in debug mode. We will get new numbers and repost them, sorry about that!</p> Ruby master - Feature #18776: Object Shapeshttps://bugs.ruby-lang.org/issues/18776?journal_id=986892022-08-17T19:51:47Zjemmai (Jemma Issroff)
<ul></ul><p>After running the YJIT benchmarks in release mode, we found that setting instance variables is, indeed, slower on our branch than the master branch.</p>
<p>YJIT is not exiting. YJIT is calling out to <code>rb_vm_setinstancevariable</code> to set instance variables. We disassembled this function on both master and our branch and it looks like the compiler is emitting unfortunate machine code on our branch. It looks like we are spilling data to the stack where we weren't in master.</p>
<p><img src="https://user-images.githubusercontent.com/1988560/185229672-d0d7e1e5-9897-4673-a9e4-8460165cefce.png" alt=""></p>
<p>On the left is the machine code in our branch, on the right is the machine code for master. It looks like the machine code on our branch is spilling data to the stack and we think this is why there's a speed difference. We can work on changing the C code to get more efficient machine code.</p> Ruby master - Feature #18776: Object Shapeshttps://bugs.ruby-lang.org/issues/18776?journal_id=986902022-08-17T20:32:43Zmaximecb (Maxime Chevalier-Boisvert)maxime.chevalierboisvert@shopify.com
<ul></ul><p>It's unfortunate that there are spills there and there might be ways to reduce that by reorganizing the code a bit, but I would expect the performance impact of the spills to be relatively small, because in practice in most kinds of software, there are many times more ivar reads than ivar writes, and those spills are a drop in the bucket.</p>
<p>Also, if you look at the getivar benchmark before and after, you can see that the speedup went from 9.95 to 1.01. That suggests that we're probably side-exiting or not running the getinstancevariable code in YJIT for some reason. IMO getinstancevariable should be the first place to look.</p> Ruby master - Feature #18776: Object Shapeshttps://bugs.ruby-lang.org/issues/18776?journal_id=986922022-08-18T04:50:45Znaruse (Yui NARUSE)naruse@airemix.jp
<ul></ul><p>About the general principle, since Ruby is used on many production environment by many companies, non-optional feature needs to be production ready.</p>
<p>It means it shouldn't have a downside including CPU performance, Memory consumption, and so on. Since YJIT is already concerned as a production feature, it's also a downside if it cause slow down of YJIT. Also note that increasing code complexity is also a downside.</p>
<p>Though a critical downside needs to be fixed, a small downside can be accepted if it has a larger upside. For example code complexity, it is accepted while it introduces a enough performance improvement compared to the complexity.</p>
<p>As far as I understand, the Object Shape is introduce to improve performance and it's complex. Therefore it needs to show some performance improvement compared to the code complexity before we merge it to ruby-master.</p> Ruby master - Feature #18776: Object Shapeshttps://bugs.ruby-lang.org/issues/18776?journal_id=987212022-08-18T15:40:51ZDan0042 (Daniel DeLorme)
<ul></ul><p>Thank you for this important work. In particular I think shapes will be very useful in the future to improve the performance of keyword arguments.</p>
<p>The following comments/nitpicks may be irrelevant (or I may have misunderstood the code completely), but I had to get them off my brain:</p>
<p>(1) It would be nice to convert the magic numbers into constants. So maybe we could have <code>#define SHAPE_BITS 16</code> and <code>#define SHAPE_MASK ((1<<SHAPE_BITS)-1)</code> and then other constants that derive from that, and maybe it would be possible to compile a version with 17-bit shapes.</p>
<p>(2) It looks to me like <code>rb_shape_get_iv_index</code> is really critical to performance, but the linked-list structure has bad performance characteristics for locality of memory access. If an object has 6 ivars you need to read 7 non-contiguous structs for <em>each</em> ivar access. It should be much faster if each shape struct had the full list of IDs in the shape. With a SIMD instruction it would be super-fast to find the index for a given ID, but even without SIMD the memory locality would be much better.</p>
<p>(3) The frozen flag is represented as a different root shape, but this results in many extra shapes that are not used by any object:</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="n">b</span><span class="p">,</span><span class="n">c</span><span class="p">,</span><span class="n">d</span><span class="p">)</span>
<span class="vi">@a</span><span class="p">,</span><span class="vi">@b</span><span class="p">,</span><span class="vi">@c</span><span class="p">,</span><span class="vi">@d</span> <span class="o">=</span> <span class="n">a</span><span class="p">,</span><span class="n">b</span><span class="p">,</span><span class="n">c</span><span class="p">,</span><span class="n">d</span>
<span class="nb">freeze</span>
<span class="k">end</span>
</code></pre>
<pre><code>0(root) -> 2(@a) -> 3(@a,@b) -> 4(@a,@b,@c) -> 5(@a,@b,@c,@d)
1(frozen) -> 6(@a) -> 7(@a,@b) -> 8(@a,@b,@c) -> 9(@a,@b,@c,@d)
</code></pre>
<p>If the frozen flag was represented by a leaf node, this would use fewer shapes. (It would also mirror the order of operations and the fact that after freezing it's not possible add more ivars.)</p>
<pre><code>0(root) -> 1(@a) -> 2(@a,@b) -> 3(@a,@b,@c) -> 4(@a,@b,@c,@d)
|
5(@a,@b,@c,@d,frozen)
</code></pre> Ruby master - Feature #18776: Object Shapeshttps://bugs.ruby-lang.org/issues/18776?journal_id=987252022-08-18T17:38:16Zjeremyevans0 (Jeremy Evans)merch-redmine@jeremyevans.net
<ul></ul><p>Dan0042 (Daniel DeLorme) wrote in <a href="#note-19">#note-19</a>:</p>
<blockquote>
<p>In particular I think shapes will be very useful in the future to improve the performance of keyword arguments.</p>
</blockquote>
<p>Can you explain how shapes could improve performance of keyword arguments? Nobody else has mentioned the possibility yet, and I'm not sure how they are related.</p>
<blockquote>
<p>(3) The frozen flag is represented as a different root shape, but this results in many extra shapes that are not used by any object:</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="n">b</span><span class="p">,</span><span class="n">c</span><span class="p">,</span><span class="n">d</span><span class="p">)</span>
<span class="vi">@a</span><span class="p">,</span><span class="vi">@b</span><span class="p">,</span><span class="vi">@c</span><span class="p">,</span><span class="vi">@d</span> <span class="o">=</span> <span class="n">a</span><span class="p">,</span><span class="n">b</span><span class="p">,</span><span class="n">c</span><span class="p">,</span><span class="n">d</span>
<span class="nb">freeze</span>
<span class="k">end</span>
</code></pre>
<pre><code>0(root) -> 2(@a) -> 3(@a,@b) -> 4(@a,@b,@c) -> 5(@a,@b,@c,@d)
1(frozen) -> 6(@a) -> 7(@a,@b) -> 8(@a,@b,@c) -> 9(@a,@b,@c,@d)
</code></pre>
</blockquote>
<p>If an object is frozen, you cannot add instance variables to it. I don't see how you could generate shapes 6-9 in your example. I would guess the implementation already works similarly to the leaf node approach you suggested.</p> Ruby master - Feature #18776: Object Shapeshttps://bugs.ruby-lang.org/issues/18776?journal_id=987282022-08-18T18:06:51Ztenderlovemaking (Aaron Patterson)tenderlove@ruby-lang.org
<ul></ul><p>Dan0042 (Daniel DeLorme) wrote in <a href="#note-19">#note-19</a>:</p>
<blockquote>
<p>Thank you for this important work. In particular I think shapes will be very useful in the future to improve the performance of keyword arguments.</p>
</blockquote>
<p>I don't think this will have any impact on keyword arguments.</p>
<blockquote>
<p>(2) It looks to me like <code>rb_shape_get_iv_index</code> is really critical to performance, but the linked-list structure has bad performance characteristics for locality of memory access. If an object has 6 ivars you need to read 7 non-contiguous structs for <em>each</em> ivar access. It should be much faster if each shape struct had the full list of IDs in the shape. With a SIMD instruction it would be super-fast to find the index for a given ID, but even without SIMD the memory locality would be much better.</p>
</blockquote>
<p><code>rb_shape_get_iv_index</code> is rarely called because the index is stored in the inline cache. We only need to call this when the cache misses. It does make cache misses expensive in terms of CPU, but memory is reduced.</p>
<blockquote>
<p>(3) The frozen flag is represented as a different root shape, but this results in many extra shapes that are not used by any object:</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="n">b</span><span class="p">,</span><span class="n">c</span><span class="p">,</span><span class="n">d</span><span class="p">)</span>
<span class="vi">@a</span><span class="p">,</span><span class="vi">@b</span><span class="p">,</span><span class="vi">@c</span><span class="p">,</span><span class="vi">@d</span> <span class="o">=</span> <span class="n">a</span><span class="p">,</span><span class="n">b</span><span class="p">,</span><span class="n">c</span><span class="p">,</span><span class="n">d</span>
<span class="nb">freeze</span>
<span class="k">end</span>
</code></pre>
<pre><code>0(root) -> 2(@a) -> 3(@a,@b) -> 4(@a,@b,@c) -> 5(@a,@b,@c,@d)
1(frozen) -> 6(@a) -> 7(@a,@b) -> 8(@a,@b,@c) -> 9(@a,@b,@c,@d)
</code></pre>
</blockquote>
<p>This case can't happen because you can't add ivars after an object has been frozen.</p>
<p>We have a "frozen root shape" just as an optimization for objects that go from the root shape and are immediately frozen. For example when using the <code>frozing_string_literals</code> directive. Objects that are not T_OBJECT, T_CLASS, or T_MODULE store their shape id in the gen iv table. We didn't want to make a geniv table for every object that goes from root -> frozen (again think of the number of frozen string literals in an application), so we pre-allocate that shape, then assign it "at birth" (using the frozen bit to indicate that we're using the "frozen root shape" singleton).</p>
<blockquote>
<p>If the frozen flag was represented by a leaf node, this would use fewer shapes. (It would also mirror the order of operations and the fact that after freezing it's not possible add more ivars.)</p>
<pre><code>0(root) -> 1(@a) -> 2(@a,@b) -> 3(@a,@b,@c) -> 4(@a,@b,@c,@d)
|
5(@a,@b,@c,@d,frozen)
</code></pre>
</blockquote>
<p>It's currently implemented this way. :)</p>
<p>naruse (Yui NARUSE) wrote in <a href="#note-18">#note-18</a>:</p>
<blockquote>
<p>About the general principle, since Ruby is used on many production environment by many companies, non-optional feature needs to be production ready.</p>
<p>It means it shouldn't have a downside including CPU performance, Memory consumption, and so on. Since YJIT is already concerned as a production feature, it's also a downside if it cause slow down of YJIT. Also note that increasing code complexity is also a downside.</p>
<p>Though a critical downside needs to be fixed, a small downside can be accepted if it has a larger upside. For example code complexity, it is accepted while it introduces a enough performance improvement compared to the complexity.</p>
<p>As far as I understand, the Object Shape is introduce to improve performance and it's complex. Therefore it needs to show some performance improvement compared to the code complexity before we merge it to ruby-master.</p>
</blockquote>
<p>We're working to fix the YJIT cases. I think we can see higher speeds in YJIT using Object Shapes than the current caching mechanisms as the machine code YJIT generates can be much more simple.</p>
<p>For the non-JIT case, we're seeing good improvements in some benchmarks like this:</p>
<pre><code>|vm_ivar_init_subclass | 6.104M| 12.928M|
| | -| 2.12x|
</code></pre>
<p>This is a case where shapes can hit inline caches but the current mechanism cannot, like this code:</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="k">class</span> <span class="nc">A</span>
<span class="k">def</span> <span class="nf">initialize</span>
<span class="vi">@a</span> <span class="o">=</span> <span class="mi">1</span>
<span class="vi">@b</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">class</span> <span class="nc">B</span> <span class="o"><</span> <span class="no">A</span><span class="p">;</span> <span class="k">end</span>
<span class="kp">loop</span> <span class="k">do</span>
<span class="no">A</span><span class="p">.</span><span class="nf">new</span>
<span class="no">B</span><span class="p">.</span><span class="nf">new</span>
<span class="k">end</span>
</code></pre>
<p>Our current cache cannot hit because the classes change, but shapes <em>can</em> hit because the shapes are the same.</p>
<p>As for complexity, I think the strategy is easier to understand than our current caching mechanism and it actually simplifies cache checks. We only need to compare shape id, not class serial + frozen status. Code complexity is hard to measure though, and I admit this patch is pretty big. 😅</p> Ruby master - Feature #18776: Object Shapeshttps://bugs.ruby-lang.org/issues/18776?journal_id=987292022-08-18T18:22:51Zchrisseaton (Chris Seaton)chris@chrisseaton.com
<ul></ul><p>Reference keyword arguments - what we're doing is using the same idea of shapes, but applying them to the keyword arguments hash to be able to extract keyword arguments without any control-flow. We can implement keyword arguments like this with just a single machine-word comparison overhead over positional arguments.</p>
<p><a href="https://www.youtube.com/watch?v=RVqY1FRUm_8" class="external">https://www.youtube.com/watch?v=RVqY1FRUm_8</a></p> Ruby master - Feature #18776: Object Shapeshttps://bugs.ruby-lang.org/issues/18776?journal_id=987322022-08-18T18:49:23ZDan0042 (Daniel DeLorme)
<ul></ul><p>tenderlovemaking (Aaron Patterson) wrote in <a href="#note-21">#note-21</a>:</p>
<blockquote>
<p>It's currently implemented this way. :)</p>
</blockquote>
<p>Ok, thank you for the explanation. I got completely confused by the role of the frozen root shape. Sorry for the noise.</p>
<p>jeremyevans0 (Jeremy Evans) wrote in <a href="#note-20">#note-20</a>:</p>
<blockquote>
<p>Can you explain how shapes could improve performance of keyword arguments? Nobody else has mentioned the possibility yet, and I'm not sure how they are related.</p>
</blockquote>
<p>Well, I believe shapes can be considered a general-purpose mechanism to store "named tuples" or "hash of symbols" kind of data structures.</p>
<p>Keyword arguments like <code>foo(a:1, b:2)</code> are already pretty efficient because they use a strategy similar to shapes (VM_CALL_KWARG) where keys {a,b} are stored once per-callsite and values [1,2] are passed on the stack (IIRC).</p>
<p>But as soon as you have a splat you need to allocate a hash on the heap. (VM_CALL_KW_SPLAT)</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="n">h</span> <span class="o">=</span> <span class="p">{</span><span class="n">c</span><span class="p">:</span><span class="mi">3</span><span class="p">,</span> <span class="n">d</span><span class="p">:</span><span class="mi">4</span><span class="p">}</span>
<span class="n">foo</span><span class="p">(</span><span class="n">a</span><span class="p">:</span><span class="mi">1</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span><span class="mi">2</span><span class="p">,</span> <span class="o">**</span><span class="n">h</span><span class="p">)</span>
</code></pre>
<p>With shapes you could start with {a,b} and then add the hash keys to get shape {a,b,c,d} and pass all values [1,2,3,4] plus the shape id on the stack. No need to allocate a hash and associated st_table/ar_table. I think.</p> Ruby master - Feature #18776: Object Shapeshttps://bugs.ruby-lang.org/issues/18776?journal_id=987332022-08-18T18:50:22Zchrisseaton (Chris Seaton)chris@chrisseaton.com
<ul></ul><blockquote>
<p>With shapes you could start with {a,b} and then add the hash keys to get shape {a,b,c,d} and pass all values [1,2,3,4] plus the shape id on the stack.</p>
</blockquote>
<p>Yes that's the idea in the video I linked.</p> Ruby master - Feature #18776: Object Shapeshttps://bugs.ruby-lang.org/issues/18776?journal_id=991472022-09-15T19:23:17Zjemmai (Jemma Issroff)
<ul></ul><a name="Summary"></a>
<h1 >Summary<a href="#Summary" class="wiki-anchor">¶</a></h1>
<p>The implementation has been updated to solve some performance problems and simplify both source code and generated code.</p>
<p>The performance of this branch against multiple benchmarks, including microbenchmarks, RailsBench and YJIT Bench show that the performance of Object Shapes is equivalent to, if not better than, the existing ivar implementation.</p>
<p>We have also improved the memory overhead and made it half of what it was previously.</p>
<p>Overall, code complexity has been decreased, memory overhead is small and performance is better or on par with <code>master</code>. We think this is a good state for this feature to be merged. We would like to get feedback from Ruby committers on <a href="https://github.com/ruby/ruby/pull/6386" class="external">this PR</a> or in this issue for that reason.</p>
<a name="Details"></a>
<h1 >Details<a href="#Details" class="wiki-anchor">¶</a></h1>
<p>Since our previous update, we have made the following changes:</p>
<ul>
<li>The shape ID is now 32 bits on 64 bit machines when debugging is disabled. This gives us significantly more shapes in most cases. When debugging is enabled, Ractor check mode is also enabled. Ractor check mode <a href="https://github.com/ruby/ruby/blob/abb1a273319e1cac8736253421a821d014f32ed7/ractor_core.h#L290-L304" class="external">stores the Ractor ID in the flags bits</a>, and shapes needs to use that space too. Given this constraint, when debug mode is enabled Shapes only consumes 16 bits leaving the other 16 bits for Ractor IDs.</li>
<li>The shape ID is now stored in the upper 32 bits of the flags field. This ensures a consistent location for finding the shape id. All objects allocated from the heap will have their shape id stored in the top 32 bits. This simplifies looking up the shape id for a given object, which reduces code complexity. It also simplifies the machine code emitted from the JIT.</li>
<li>On platforms that support mmap, we now use mmap to allocate the shape list. Since mmap lazily maps to physical pages, this allows us to lazily allocate space for the shape list.</li>
</ul>
<a name="CPU-performance"></a>
<h1 >CPU performance<a href="#CPU-performance" class="wiki-anchor">¶</a></h1>
<p>These are our updated perfomance results:</p>
<a name="Microbenchmarks"></a>
<h2 >Microbenchmarks<a href="#Microbenchmarks" class="wiki-anchor">¶</a></h2>
<p>We ran microbenchmarks comparing master to object shapes and got the following results:</p>
<pre><code>$ make benchmark ITEM=vm_ivar
...
compare-ruby: ruby 3.2.0dev (2022-09-13T06:44:29Z master 316b44df09) [arm64-darwin21]
built-ruby: ruby 3.2.0dev (2022-09-13T11:25:59Z object-shapes-prot.. ef42354c33) [arm64-darwin21]
# Iteration per second (i/s)
| |compare-ruby|built-ruby|
|:--------------------------|-----------:|---------:|
|vm_ivar | 100.020M| 108.705M|
| | -| 1.09x|
|vm_ivar_embedded_obj_init | 33.584M| 33.415M|
| | 1.01x| -|
|vm_ivar_extended_obj_init | 26.073M| 26.635M|
| | -| 1.02x|
|vm_ivar_generic_get | 16.599M| 18.103M|
| | -| 1.09x|
|vm_ivar_generic_set | 12.505M| 18.616M|
| | -| 1.49x|
|vm_ivar_get | 8.533| 8.566|
| | -| 1.00x|
|vm_ivar_get_uninitialized | 81.117M| 79.294M|
| | 1.02x| -|
|vm_ivar_lazy_set | 1.921| 1.949|
| | -| 1.01x|
|vm_ivar_of_class | 8.359M| 9.094M|
| | -| 1.09x|
|vm_ivar_of_class_set | 10.678M| 10.331M|
| | 1.03x| -|
|vm_ivar_set | 90.398M| 92.034M|
| | -| 1.02x|
|vm_ivar_set_on_instance | 14.269| 14.307|
| | -| 1.00x|
|vm_ivar_init_subclass | 6.048M| 13.029M|
| | -| 2.15x|
</code></pre>
<p>Class instance variables have never benefited from inline caching, so always take the non-cached slow path. Object shapes made the slow path slower, so the <code>vm_ivar_of_class_set</code> benchmark slowdown is expected.</p>
<p>As follow up to object shapes, <a class="user active user-mention" href="https://bugs.ruby-lang.org/users/11657">@jhawthorn (John Hawthorn)</a> is planning to re-implement instance variables on classes to use arrays instead of <code>st_table</code>s. With his change, class instance variables will be able to realize the benefits of object shapes by taking the cached, fast path.</p>
<a name="Railsbench"></a>
<h2 >Railsbench<a href="#Railsbench" class="wiki-anchor">¶</a></h2>
<p>We ran Railsbench 10 times on master and shapes, and got the following results:</p>
<img height="700px" src="https://i.imgur.com/BMvML1Q.png" alt="RailsBench Box Plot">
<p>Master:</p>
<pre><code>ruby 3.2.0dev (2022-09-13T06:44:29Z master 316b44df09) [arm64-darwin21]
Request per second: 1898.3 [#/s] (mean)
Request per second: 1890.8 [#/s] (mean)
Request per second: 1894.7 [#/s] (mean)
Request per second: 1885.0 [#/s] (mean)
Request per second: 1868.2 [#/s] (mean)
Request per second: 1884.2 [#/s] (mean)
Request per second: 1860.6 [#/s] (mean)
Request per second: 1902.1 [#/s] (mean)
Request per second: 1927.1 [#/s] (mean)
Request per second: 1894.8 [#/s] (mean)
</code></pre>
<p>This averages to <code>1890.58</code> requests per second</p>
<p>Shapes:</p>
<pre><code>ruby 3.2.0dev (2022-09-13T11:25:59Z object-shapes-prot.. ef42354c33) [arm64-darwin21]
Request per second: 1901.9 [#/s] (mean)
Request per second: 1900.1 [#/s] (mean)
Request per second: 1903.1 [#/s] (mean)
Request per second: 1902.2 [#/s] (mean)
Request per second: 1905.2 [#/s] (mean)
Request per second: 1903.0 [#/s] (mean)
Request per second: 1910.7 [#/s] (mean)
Request per second: 1916.3 [#/s] (mean)
Request per second: 1905.6 [#/s] (mean)
Request per second: 1894.4 [#/s] (mean)
</code></pre>
<p>This averages to <code>1904.25</code> requests per second</p>
<a name="YJIT-Bench"></a>
<h2 >YJIT Bench:<a href="#YJIT-Bench" class="wiki-anchor">¶</a></h2>
<p>We ran YJIT Bench on master and shapes, excluding benchmarks which do not measure instance variables, and got the following results:</p>
<p>Master:</p>
<pre><code>ruby_version="ruby 3.2.0dev (2022-09-13T06:44:29Z master 316b44df09) [x86_64-linux]"
git_branch="master"
git_commit="316b44df09"
------------- ----------- ---------- --------- ---------- ----------- ------------
bench interp (ms) stddev (%) yjit (ms) stddev (%) interp/yjit yjit 1st itr
activerecord 115.5 0.3 74.7 2.1 1.55 1.31
chunky_png 748.0 0.1 500.9 0.1 1.49 1.46
erubi 260.4 0.6 202.8 1.0 1.28 1.25
erubi_rails 18.4 2.0 13.2 3.5 1.39 0.50
getivar 91.9 1.4 23.1 0.2 3.98 0.98
hexapdf 2183.0 0.9 1479.4 3.0 1.48 1.31
liquid-render 144.3 0.4 87.6 1.5 1.65 1.32
mail 125.6 0.2 104.5 0.3 1.20 0.81
optcarrot 5013.9 0.7 2227.7 0.5 2.25 2.20
psych-load 1804.1 0.1 1333.0 0.0 1.35 1.35
railsbench 1933.3 1.1 1442.5 1.6 1.34 1.20
rubykon 9838.4 0.4 4915.1 0.2 2.00 2.11
setivar 64.9 1.4 27.9 3.1 2.32 1.01
</code></pre>
<p>Shapes:</p>
<pre><code>ruby_version="ruby 3.2.0dev (2022-09-13T11:25:59Z object-shapes-prot.. ef42354c33) [x86_64-linux]"
git_branch="object-shapes-prototyping"
git_commit="ef42354c33"
------------- ----------- ---------- --------- ---------- ----------- ------------
bench interp (ms) stddev (%) yjit (ms) stddev (%) interp/yjit yjit 1st itr
activerecord 118.6 0.1 76.4 0.2 1.55 1.27
chunky_png 760.5 0.2 488.8 0.3 1.56 1.52
erubi 252.4 0.6 199.9 1.0 1.26 1.25
erubi_rails 18.5 2.5 13.7 3.3 1.35 0.53
getivar 89.8 1.2 23.3 0.0 3.85 1.00
hexapdf 2364.9 1.0 1649.2 2.8 1.43 1.30
liquid-render 147.3 0.6 90.3 1.7 1.63 1.30
mail 128.7 0.3 106.0 0.2 1.21 0.82
optcarrot 5170.7 0.8 2681.7 0.3 1.93 1.88
psych-load 1786.4 0.1 1480.2 0.0 1.21 1.20
railsbench 1988.9 0.8 1482.4 1.7 1.34 1.23
rubykon 9729.6 1.2 4841.3 1.7 2.01 2.17
setivar 61.6 0.3 32.2 0.1 1.91 1.00
</code></pre>
<p>Here are comparison numbers between the two measurements. (> 1 means shapes is faster, < 1 means master is faster):</p>
<pre><code>bench interp (shapes / master) yjit (shapes / master)
activerecord 1.03 1.02
chunky_png 1.02 0.98
erubi 0.97 0.99
erubi_rails 1.01 1.04
getivar 0.98 1.01
hexapdf 1.08 1.11
liquid-render 1.02 1.03
mail 1.02 1.01
optcarrot 1.03 1.20
psych-load 0.99 1.11
railsbench 1.03 1.03
rubykon 0.99 0.98
setivar 0.95 1.15
</code></pre>
<a name="Memory-consumption"></a>
<h1 >Memory consumption<a href="#Memory-consumption" class="wiki-anchor">¶</a></h1>
<p>Due to the mmap change, our memory consumption has decreased since our last update. Measuring execution of an empty script over 10 runs, we saw an average consumption of 29,107,814 bytes on master, and 29,273,293 bytes with object shapes. This means on an empty script, shapes has a 0.5% memory increase. Obviously, this difference would represent a significantly lower percentage memory increase on larger, production-scale applications.</p>
<a name="Code-complexity"></a>
<h1 >Code complexity<a href="#Code-complexity" class="wiki-anchor">¶</a></h1>
<p>We have reduced overall code complexity by:</p>
<ul>
<li>Removing the <code>iv_index</code> table on the class and replacing it with a shape tree which can be used independent of object types</li>
<li>Reducing the checks in the set instance variable and get instance variable cached code paths by removing the frozen check, class serial check and <code>vm_ic_entry_p</code> check in favor of a shape check.</li>
</ul>
<p>The code below is the fast code path (cache hit case) for setting instance variables. (Assertions and debug counters removed for clarity.)</p>
<p>Master:</p>
<pre><code class="c syntaxhl" data-language="c"><span class="k">if</span> <span class="p">(</span><span class="n">LIKELY</span><span class="p">(</span><span class="o">!</span><span class="n">RB_OBJ_FROZEN_RAW</span><span class="p">(</span><span class="n">obj</span><span class="p">)))</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">LIKELY</span><span class="p">(</span>
<span class="p">(</span><span class="n">vm_ic_entry_p</span><span class="p">(</span><span class="n">ic</span><span class="p">)</span> <span class="o">&&</span> <span class="n">ic</span><span class="o">-></span><span class="n">entry</span><span class="o">-></span><span class="n">class_serial</span> <span class="o">==</span> <span class="n">RCLASS_SERIAL</span><span class="p">(</span><span class="n">RBASIC</span><span class="p">(</span><span class="n">obj</span><span class="p">)</span><span class="o">-></span><span class="n">klass</span><span class="p">))))</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">UNLIKELY</span><span class="p">(</span><span class="n">index</span> <span class="o">>=</span> <span class="n">ROBJECT_NUMIV</span><span class="p">(</span><span class="n">obj</span><span class="p">)))</span> <span class="p">{</span>
<span class="n">rb_init_iv_list</span><span class="p">(</span><span class="n">obj</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">VALUE</span> <span class="o">*</span><span class="n">ptr</span> <span class="o">=</span> <span class="n">ROBJECT_IVPTR</span><span class="p">(</span><span class="n">obj</span><span class="p">);</span>
<span class="n">RB_OBJ_WRITE</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="o">&</span><span class="n">ptr</span><span class="p">[</span><span class="n">index</span><span class="p">],</span> <span class="n">val</span><span class="p">);</span>
<span class="k">return</span> <span class="n">val</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre>
<p>Shapes:</p>
<pre><code class="c syntaxhl" data-language="c"><span class="n">shape_id_t</span> <span class="n">shape_id</span> <span class="o">=</span> <span class="n">ROBJECT_SHAPE_ID</span><span class="p">(</span><span class="n">obj</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">shape_id</span> <span class="o">==</span> <span class="n">source_shape_id</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">dest_shape_id</span> <span class="o">!=</span> <span class="n">shape_id</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">UNLIKELY</span><span class="p">(</span><span class="n">index</span> <span class="o">>=</span> <span class="n">ROBJECT_NUMIV</span><span class="p">(</span><span class="n">obj</span><span class="p">)))</span> <span class="p">{</span>
<span class="n">rb_init_iv_list</span><span class="p">(</span><span class="n">obj</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">ROBJECT_SET_SHAPE_ID</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="n">dest_shape_id</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">VALUE</span> <span class="o">*</span><span class="n">ptr</span> <span class="o">=</span> <span class="n">ROBJECT_IVPTR</span><span class="p">(</span><span class="n">obj</span><span class="p">);</span>
<span class="n">RB_OBJ_WRITE</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="o">&</span><span class="n">ptr</span><span class="p">[</span><span class="n">index</span><span class="p">],</span> <span class="n">val</span><span class="p">);</span>
<span class="k">return</span> <span class="n">val</span><span class="p">;</span>
<span class="p">}</span>
</code></pre>
<ul>
<li>Reducing the number of instructions for instance variable access in YJIT. Here are the x86_64 assembly code instructions generated by YJIT which represent guard comparisons for getting instance variables:</li>
</ul>
<p>Master:</p>
<pre><code># guard known class
0x55cfff14857a: movabs rcx, 0x7f3e1fceb0f8
0x55cfff148584: cmp qword ptr [rax + 8], rcx
0x55cfff148588: jne 0x55d007137491
0x55cfff14858e: mov rax, qword ptr [r13 + 0x18]
# Is the IV in range?
0x55cfff148592: cmp qword ptr [rax + 0x10], 0
0x55cfff148597: jbe 0x55d007137446
# guard embedded getivar
# Is object embedded?
0x55cfff14859d: test word ptr [rax], 0x2000
0x55cfff1485a2: je 0x55d0071374aa
</code></pre>
<p>Shapes:</p>
<pre><code># guard shape, embedded, and T_*
0x55a89f8c7cff: mov rcx, qword ptr [rax]
0x55a89f8c7d02: movabs r11, 0xffff00000000201f
0x55a89f8c7d0c: and rcx, r11
0x55a89f8c7d0f: movabs r11, 0x58000000002001
0x55a89f8c7d19: cmp rcx, r11
0x55a89f8c7d1c: jne 0x55a8a78b5d4e
</code></pre>
<p>The shapes code has one comparison and one memory read whereas the master code has three comparisons and four memory reads.</p>
<a name="Conclusion"></a>
<h1 >Conclusion<a href="#Conclusion" class="wiki-anchor">¶</a></h1>
<p>As we said at the beginning of this update, based on the metrics and rationale described above, we think object shapes is ready to merge. Please give us feedback <a href="https://github.com/ruby/ruby/pull/6386" class="external">on our PR</a> or this issue.</p>
<p>Thank you!</p> Ruby master - Feature #18776: Object Shapeshttps://bugs.ruby-lang.org/issues/18776?journal_id=991702022-09-16T19:55:13Zmaximecb (Maxime Chevalier-Boisvert)maxime.chevalierboisvert@shopify.com
<ul></ul><p>The performance numbers look good and I'm very happy with the improvements that you've made wrt shape ids being 32 bits. It's also nice to see that the generated code for property accesses in YJIT is shorter than before.</p>
<p>There seems to be a (likely minor) bug in the PR that should be fixed before we merge but besides that the PR looks to me like it's in a mergeable state, it delivers on what was promised.</p> Ruby master - Feature #18776: Object Shapeshttps://bugs.ruby-lang.org/issues/18776?journal_id=993462022-09-26T18:55:06Zjemmai (Jemma Issroff)
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Closed</i></li></ul><p>Applied in changeset <a class="changeset" title="This commit implements the Object Shapes technique in CRuby. Object Shapes is used for accessing..." href="https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/9ddfd2ca004d1952be79cf1b84c52c79a55978f4">git|9ddfd2ca004d1952be79cf1b84c52c79a55978f4</a>.</p>
<hr>
<p>This commit implements the Object Shapes technique in CRuby.</p>
<p>Object Shapes is used for accessing instance variables and representing the<br>
"frozenness" of objects. Object instances have a "shape" and the shape<br>
represents some attributes of the object (currently which instance variables are<br>
set and the "frozenness"). Shapes form a tree data structure, and when a new<br>
instance variable is set on an object, that object "transitions" to a new shape<br>
in the shape tree. Each shape has an ID that is used for caching. The shape<br>
structure is independent of class, so objects of different types can have the<br>
same shape.</p>
<p>For example:</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="k">class</span> <span class="nc">Foo</span>
<span class="k">def</span> <span class="nf">initialize</span>
<span class="c1"># Starts with shape id 0</span>
<span class="vi">@a</span> <span class="o">=</span> <span class="mi">1</span> <span class="c1"># transitions to shape id 1</span>
<span class="vi">@b</span> <span class="o">=</span> <span class="mi">1</span> <span class="c1"># transitions to shape id 2</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">class</span> <span class="nc">Bar</span>
<span class="k">def</span> <span class="nf">initialize</span>
<span class="c1"># Starts with shape id 0</span>
<span class="vi">@a</span> <span class="o">=</span> <span class="mi">1</span> <span class="c1"># transitions to shape id 1</span>
<span class="vi">@b</span> <span class="o">=</span> <span class="mi">1</span> <span class="c1"># transitions to shape id 2</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="n">foo</span> <span class="o">=</span> <span class="no">Foo</span><span class="p">.</span><span class="nf">new</span> <span class="c1"># `foo` has shape id 2</span>
<span class="n">bar</span> <span class="o">=</span> <span class="no">Bar</span><span class="p">.</span><span class="nf">new</span> <span class="c1"># `bar` has shape id 2</span>
</code></pre>
<p>Both <code>foo</code> and <code>bar</code> instances have the same shape because they both set<br>
instance variables of the same name in the same order.</p>
<p>This technique can help to improve inline cache hits as well as generate more<br>
efficient machine code in JIT compilers.</p>
<p>This commit also adds some methods for debugging shapes on objects. See<br>
<code>RubyVM::Shape</code> for more details.</p>
<p>For more context on Object Shapes, see [Feature: <a class="issue tracker-2 status-5 priority-4 priority-default closed" title="Feature: Object Shapes (Closed)" href="https://bugs.ruby-lang.org/issues/18776">#18776</a>]</p>
<p>Co-Authored-By: Aaron Patterson <a href="mailto:tenderlove@ruby-lang.org" class="email">tenderlove@ruby-lang.org</a><br>
Co-Authored-By: Eileen M. Uchitelle <a href="mailto:eileencodes@gmail.com" class="email">eileencodes@gmail.com</a><br>
Co-Authored-By: John Hawthorn <a href="mailto:john@hawthorn.email" class="email">john@hawthorn.email</a></p>