https://bugs.ruby-lang.org/https://bugs.ruby-lang.org/favicon.ico?17113305112020-05-07T08:12:57ZRuby Issue Tracking SystemRuby master - Feature #16837: Can we make Ruby 3.0 as fast as Ruby 2.7 with the new assertions?https://bugs.ruby-lang.org/issues/16837?journal_id=854122020-05-07T08:12:57Zk0kubun (Takashi Kokubun)takashikkbn@gmail.com
<ul><li><strong>Tracker</strong> changed from <i>Bug</i> to <i>Feature</i></li><li><strong>Backport</strong> deleted (<del><i>2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN</i></del>)</li></ul> Ruby master - Feature #16837: Can we make Ruby 3.0 as fast as Ruby 2.7 with the new assertions?https://bugs.ruby-lang.org/issues/16837?journal_id=854142020-05-07T08:16:30Zk0kubun (Takashi Kokubun)takashikkbn@gmail.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/85414/diff?detail_id=56975">diff</a>)</li></ul> Ruby master - Feature #16837: Can we make Ruby 3.0 as fast as Ruby 2.7 with the new assertions?https://bugs.ruby-lang.org/issues/16837?journal_id=854152020-05-07T08:17:18Zk0kubun (Takashi Kokubun)takashikkbn@gmail.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/85415/diff?detail_id=56976">diff</a>)</li></ul> Ruby master - Feature #16837: Can we make Ruby 3.0 as fast as Ruby 2.7 with the new assertions?https://bugs.ruby-lang.org/issues/16837?journal_id=854222020-05-07T09:14:02Zshyouhei (Shyouhei Urabe)shyouhei@ruby-lang.org
<ul></ul><p>I would like to suggest that if a user really favor speed over sanity check, they should just compiler everything with <code>-DNDEBUG</code>. This has been the <em>standard</em> C manner since long before Ruby's birth.</p> Ruby master - Feature #16837: Can we make Ruby 3.0 as fast as Ruby 2.7 with the new assertions?https://bugs.ruby-lang.org/issues/16837?journal_id=854232020-05-07T09:22:17Zshyouhei (Shyouhei Urabe)shyouhei@ruby-lang.org
<ul></ul><p>Some analysis of the slowdown.</p>
<p>Looking at the generated binary and <code>perf</code> output, the slowdown is because some functions are not inlined. Might depend on compilers, but for me <code>rb_array_len()</code> is one of such victim:</p>
<pre><code>zsh % gdb -batch -ex 'file miniruby' -ex 'disassemble rb_array_len'
Dump of assembler code for function rb_array_len:
0x0000000000295540 <+0>: push %rbx
0x0000000000295541 <+1>: mov %rdi,%rbx
0x0000000000295544 <+4>: test $0x7,%bl
0x0000000000295547 <+7>: jne 0x2955be <rb_array_len+126>
0x0000000000295549 <+9>: mov %rbx,%rax
0x000000000029554c <+12>: and $0xfffffffffffffff7,%rax
0x0000000000295550 <+16>: je 0x2955be <rb_array_len+126>
0x0000000000295552 <+18>: mov (%rbx),%rax
0x0000000000295555 <+21>: mov %eax,%edx
0x0000000000295557 <+23>: and $0x1f,%edx
0x000000000029555a <+26>: mov $0x7,%ecx
0x000000000029555f <+31>: cmp $0x7,%edx
0x0000000000295562 <+34>: jne 0x295585 <rb_array_len+69>
0x0000000000295564 <+36>: test $0x2000,%eax ;; <- This is `RB_FL_ANY_RAW(a, RARRAY_EMBED_FLAG)`
0x0000000000295569 <+41>: jne 0x295571 <rb_array_len+49>
0x000000000029556b <+43>: mov 0x10(%rbx),%rax ;; <-
0x000000000029556f <+47>: pop %rbx ;; <- This is `return RARRAY(a)->as.heap.len;`
0x0000000000295570 <+48>: retq ;; <-
0x0000000000295571 <+49>: cmp $0x7,%ecx
0x0000000000295574 <+52>: jne 0x2955a2 <rb_array_len+98>
0x0000000000295576 <+54>: test $0x2000,%eax
0x000000000029557b <+59>: je 0x2955ea <rb_array_len+170>
0x000000000029557d <+61>: shr $0xf,%eax ;; <-
0x0000000000295580 <+64>: and $0x3,%eax ;; <- This is `return RARRAY_EMBED_LEN(a);`
0x0000000000295583 <+67>: pop %rbx ;; <-
0x0000000000295584 <+68>: retq ;; <-
0x0000000000295585 <+69>: mov %rbx,%rdi
0x0000000000295588 <+72>: mov $0x7,%esi
0x000000000029558d <+77>: callq 0xcaea2 <rb_check_type>
0x0000000000295592 <+82>: mov (%rbx),%rax
0x0000000000295595 <+85>: mov %eax,%ecx
0x0000000000295597 <+87>: and $0x1f,%ecx
0x000000000029559a <+90>: cmp $0x1b,%rcx
0x000000000029559e <+94>: jne 0x295564 <rb_array_len+36>
0x00000000002955a0 <+96>: jmp 0x2955cb <rb_array_len+139>
0x00000000002955a2 <+98>: mov %rbx,%rdi
0x00000000002955a5 <+101>: mov $0x7,%esi
0x00000000002955aa <+106>: callq 0xcaea2 <rb_check_type>
0x00000000002955af <+111>: mov (%rbx),%rax
0x00000000002955b2 <+114>: mov %eax,%ecx
0x00000000002955b4 <+116>: and $0x1f,%ecx
0x00000000002955b7 <+119>: cmp $0x1b,%ecx
0x00000000002955ba <+122>: jne 0x295576 <rb_array_len+54>
0x00000000002955bc <+124>: jmp 0x2955cb <rb_array_len+139>
0x00000000002955be <+126>: mov %rbx,%rdi
0x00000000002955c1 <+129>: mov $0x7,%esi
0x00000000002955c6 <+134>: callq 0xcaea2 <rb_check_type>
0x00000000002955cb <+139>: lea 0x142fe(%rip),%rdi # 0x2a98d0
0x00000000002955d2 <+146>: lea 0x1432f(%rip),%rdx # 0x2a9908
0x00000000002955d9 <+153>: lea 0x14337(%rip),%rcx # 0x2a9917
0x00000000002955e0 <+160>: mov $0xea,%esi
0x00000000002955e5 <+165>: callq 0xcad86 <rb_assert_failure>
0x00000000002955ea <+170>: lea 0x14338(%rip),%rdi # 0x2a9929
0x00000000002955f1 <+177>: lea 0x1436d(%rip),%rdx # 0x2a9965
0x00000000002955f8 <+184>: lea 0x14377(%rip),%rcx # 0x2a9976
0x00000000002955ff <+191>: mov $0x79,%esi
0x0000000000295604 <+196>: callq 0xcad86 <rb_assert_failure>
End of assembler dump.
</code></pre>
<p>Here, assertions practically never fail. This means jumps are 100% predicted (almost no-op). They don't slow things. The problem is those unreachable branches. If you can read the assembly you see almost 2/3 of the above function just never reach. They blow the generated binary up significantly. <code>rb_array_len</code> is thus now considered too big to be inlined, to my compiler at least.</p>
<p>An obvious ad-hoc remedy is to supply <code>__attribute__((__always_inline__))</code> for everything. But I don't think that's a good idea, because what is inlined and what is not depends very much on compilers, versions, target architectures, and almost everything.</p> Ruby master - Feature #16837: Can we make Ruby 3.0 as fast as Ruby 2.7 with the new assertions?https://bugs.ruby-lang.org/issues/16837?journal_id=854242020-05-07T09:38:22Zshyouhei (Shyouhei Urabe)shyouhei@ruby-lang.org
<ul></ul><p>If you recompile everything using <code>./configure cppflags=-DNDEBUG</code>, then those assertions are eliminated, to let compilers inline <code>rb_array_len</code> again.</p> Ruby master - Feature #16837: Can we make Ruby 3.0 as fast as Ruby 2.7 with the new assertions?https://bugs.ruby-lang.org/issues/16837?journal_id=854252020-05-07T12:48:56Zshevegen (Robert A. Heiler)shevegen@gmail.com
<ul></ul><p>I have a question concerning one point mentioned above.</p>
<p>k0kubun wrote:</p>
<blockquote>
<p>Provide .so for an assertion-enabled mode? (ko1's idea)</p>
</blockquote>
<p>Could someone briefly explain the general idea behind this? I assume for a .so<br>
file the ruby user would have to require/load that file, but what may be the<br>
perceived benefits/disadvantages for doing so?</p> Ruby master - Feature #16837: Can we make Ruby 3.0 as fast as Ruby 2.7 with the new assertions?https://bugs.ruby-lang.org/issues/16837?journal_id=854322020-05-07T17:08:05Zk0kubun (Takashi Kokubun)takashikkbn@gmail.com
<ul></ul><blockquote>
<p>I would like to suggest that if a user really favor speed over sanity check, they should just compiler everything with -DNDEBUG. This has been the standard C manner since long before Ruby's birth.</p>
</blockquote>
<p>Got it. I'll consider using -DNDEBUG in benchmark servers at least. Also maybe it's worth noting it in NEWS for those who package Ruby for performance-sensitive usages?</p>
<blockquote>
<p>An obvious ad-hoc remedy is to supply <code>__attribute__((__always_inline__))</code> for everything. But I don't think that's a good idea, because what is inlined and what is not depends very much on compilers, versions, target architectures, and almost everything.</p>
</blockquote>
<p>Agreed. While it's not a good idea to always inline <em>everything</em>, some may be worth a consideration though.</p>
<blockquote>
<p>I assume for a .so file the ruby user would have to require/load that file</p>
</blockquote>
<p>His idea was to install the .so file to Ruby prefix by default and add a <code>--debug-xxx</code> option to load it.</p> Ruby master - Feature #16837: Can we make Ruby 3.0 as fast as Ruby 2.7 with the new assertions?https://bugs.ruby-lang.org/issues/16837?journal_id=854462020-05-07T21:02:02Zk0kubun (Takashi Kokubun)takashikkbn@gmail.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-5 priority-4 priority-default closed" href="/issues/16840">Bug #16840</a>: Decrease in Hash#[]= performance with object keys</i> added</li></ul> Ruby master - Feature #16837: Can we make Ruby 3.0 as fast as Ruby 2.7 with the new assertions?https://bugs.ruby-lang.org/issues/16837?journal_id=854512020-05-08T02:46:56Znobu (Nobuyoshi Nakada)nobu@ruby-lang.org
<ul></ul><p>Not only assertions, some optimizations can no longer be applied.</p>
<p>For instance, <code>rb_str_new_cstr</code> was defined as following in 2.7,</p>
<pre><code class="C syntaxhl" data-language="C"><span class="cp">#define rb_str_new_cstr(str) RB_GNUC_EXTENSION_BLOCK( \
(__builtin_constant_p(str)) ? \
rb_str_new_static((str), (long)strlen(str)) : \
rb_str_new_cstr(str) \
)
</span></code></pre>
<p>and <code>rb_str_new_cstr("...")</code> has been expected to be compiled as <code>rb_str_new_static("...", 3)</code>.</p>
<p>The below is the master version.</p>
<pre><code class="C syntaxhl" data-language="C"><span class="k">static</span> <span class="kr">inline</span> <span class="n">VALUE</span>
<span class="nf">ruby3_str_new_cstr</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">str</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="cm">/* constexpr */</span> <span class="p">(</span><span class="o">!</span> <span class="n">RUBY3_CONSTANT_P</span><span class="p">(</span><span class="n">str</span><span class="p">))</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">rb_str_new_cstr</span><span class="p">(</span><span class="n">str</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">else</span> <span class="p">{</span>
<span class="kt">long</span> <span class="n">len</span> <span class="o">=</span> <span class="n">ruby3_strlen</span><span class="p">(</span><span class="n">str</span><span class="p">);</span>
<span class="k">return</span> <span class="n">rb_str_new_static</span><span class="p">(</span><span class="n">str</span><span class="p">,</span> <span class="n">len</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre>
<p>As <code>str</code> is an argument variable and <code>RUBY3_CONSTANT_P(str)</code> is always false here, <code>_static</code> function is never used (in Apple clang 11.0.3 and gcc 10.1.0-RC-20200430_0).</p>
<p>I'm uncertain how this particular case affects the whole performance though, similar un-optimizations might be more.</p> Ruby master - Feature #16837: Can we make Ruby 3.0 as fast as Ruby 2.7 with the new assertions?https://bugs.ruby-lang.org/issues/16837?journal_id=854532020-05-08T07:34:18Zshyouhei (Shyouhei Urabe)shyouhei@ruby-lang.org
<ul></ul><p>nobu (Nobuyoshi Nakada) wrote in <a href="#note-10">#note-10</a>:</p>
<blockquote>
<p>As <code>str</code> is an argument variable and <code>RUBY3_CONSTANT_P(str)</code> is always false here,</p>
</blockquote>
<p>Well, thank you pointing this out. As I wrote in <code>include/ruby/3/constant_p.h</code>, you can apply <code>__builtin_constant_p</code> to an inline function argument. I thought that <code>RUBY3_CONSTANT_P(str)</code> is not always false. However <a href="https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html" class="external">https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html</a> says:</p>
<blockquote>
<p>You may use this built-in function in either a macro or an inline function. However, if you use it in an inlined function and pass an argument of the function as the argument to the built-in, GCC never returns 1 when you call the inline function with a string constant or ...</p>
</blockquote>
<p>In this <code>ruby3_str_new_cstr()</code>'s particular case, the argument is a string. There is no chance. This is in fact wrong. We have to fix.</p> Ruby master - Feature #16837: Can we make Ruby 3.0 as fast as Ruby 2.7 with the new assertions?https://bugs.ruby-lang.org/issues/16837?journal_id=855022020-05-11T16:46:39Znaruse (Yui NARUSE)naruse@airemix.jp
<ul></ul><p>I want Ruby 2.8/3.0 is faster than 2.7 by default.<br>
NDEBUG is not acceptable.<br>
I think Microsoft's _DEBUG approach is more reasonable.</p> Ruby master - Feature #16837: Can we make Ruby 3.0 as fast as Ruby 2.7 with the new assertions?https://bugs.ruby-lang.org/issues/16837?journal_id=855172020-05-12T07:34:06Zshyouhei (Shyouhei Urabe)shyouhei@ruby-lang.org
<ul></ul><p>naruse (Yui NARUSE) wrote in <a href="#note-12">#note-12</a>:</p>
<blockquote>
<p>NDEBUG is not acceptable.</p>
</blockquote>
<p>NDEBUG is not my invention. Please file a bug report to upstream (ISO/IEC JTC1/SC22/WG14).</p>
<p>I'm not against defining it by default, though.</p> Ruby master - Feature #16837: Can we make Ruby 3.0 as fast as Ruby 2.7 with the new assertions?https://bugs.ruby-lang.org/issues/16837?journal_id=857872020-05-25T18:21:39Zko1 (Koichi Sasada)
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Closed</i></li></ul><p>Applied in changeset <a class="changeset" title="Use RUBY_DEBUG instead of NDEBUG Assertions in header files slows down an interpreter, so they s..." href="https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/21991e6ca59274e41a472b5256bd3245f6596c90">git|21991e6ca59274e41a472b5256bd3245f6596c90</a>.</p>
<hr>
<p>Use RUBY_DEBUG instead of NDEBUG</p>
<p>Assertions in header files slows down an interpreter, so they should be<br>
turned off by default (simple <code>make</code>). To enable them, define a macro<br>
<code>RUBY_DEBUG=1</code> (e.g. <code>make cppflags=-DRUBY_DEBUG</code> or use <code>#define</code> at<br>
the very beggining of the file. Note that even if <code>NDEBUG=1</code> is defined,<br>
<code>RUBY_DEBUG=1</code> enables all assertions.<br>
[Feature <a class="issue tracker-2 status-5 priority-4 priority-default closed" title="Feature: Can we make Ruby 3.0 as fast as Ruby 2.7 with the new assertions? (Closed)" href="https://bugs.ruby-lang.org/issues/16837">#16837</a>]<br>
related: <a href="https://github.com/ruby/ruby/pull/3120" class="external">https://github.com/ruby/ruby/pull/3120</a></p>
<p><code>assert()</code> lines in MRI *.c is not disabled even if <code>RUBY_DEBUG=0</code> and<br>
it can be disabled with <code>NDEBUG=1</code>. So please consider to use<br>
<code>RUBY_ASSERT()</code> if you want to disable them when <code>RUBY_DEBUG=0</code>.</p>