Feature #16837
closedCan we make Ruby 3.0 as fast as Ruby 2.7 with the new assertions?
Added by k0kubun (Takashi Kokubun) over 5 years ago. Updated over 5 years ago.
Description
Problem¶
How can we make Ruby 3.0 as fast as (or faster than) Ruby 2.7?
Background¶
- Split ruby.h https://github.com/ruby/ruby/pull/2991 added some new assertions
- While it has been helpful for revealing various bugs, it also made some Ruby programs notably slow, especially Optcarrot https://benchmark-driver.github.io/benchmarks/optcarrot/commits.html
Possible approaches¶
I have no strong preference yet. Here are some random ideas:
- Optimize the assertion code somehow
- Enable the new assertions only on CIs, at least ones in hot spots
- Not sure which places have large impact on Optcarrot yet
- Make some other not-so-important assertions CI-only to offset the impact from new ones
- Provide .so for an assertion-enabled mode? (ko1's idea)
I hope people will comment more ideas in this ticket.
Updated by k0kubun (Takashi Kokubun) over 5 years ago
Actions
#1
- Tracker changed from Bug to Feature
- Backport deleted (
2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN)
Updated by k0kubun (Takashi Kokubun) over 5 years ago
Actions
#2
- Description updated (diff)
Updated by k0kubun (Takashi Kokubun) over 5 years ago
Actions
#3
- Description updated (diff)
Updated by shyouhei (Shyouhei Urabe) over 5 years ago
Actions
#4
[ruby-core:98182]
I would like to suggest that if a user really favor speed over sanity check, they should just compiler everything with -DNDEBUG. This has been the standard C manner since long before Ruby's birth.
Updated by shyouhei (Shyouhei Urabe) over 5 years ago
Actions
#5
[ruby-core:98183]
Some analysis of the slowdown.
Looking at the generated binary and perf output, the slowdown is because some functions are not inlined. Might depend on compilers, but for me rb_array_len() is one of such victim:
zsh % gdb -batch -ex 'file miniruby' -ex 'disassemble rb_array_len'
Dump of assembler code for function rb_array_len:
0x0000000000295540 <+0>: push %rbx
0x0000000000295541 <+1>: mov %rdi,%rbx
0x0000000000295544 <+4>: test $0x7,%bl
0x0000000000295547 <+7>: jne 0x2955be <rb_array_len+126>
0x0000000000295549 <+9>: mov %rbx,%rax
0x000000000029554c <+12>: and $0xfffffffffffffff7,%rax
0x0000000000295550 <+16>: je 0x2955be <rb_array_len+126>
0x0000000000295552 <+18>: mov (%rbx),%rax
0x0000000000295555 <+21>: mov %eax,%edx
0x0000000000295557 <+23>: and $0x1f,%edx
0x000000000029555a <+26>: mov $0x7,%ecx
0x000000000029555f <+31>: cmp $0x7,%edx
0x0000000000295562 <+34>: jne 0x295585 <rb_array_len+69>
0x0000000000295564 <+36>: test $0x2000,%eax ;; <- This is `RB_FL_ANY_RAW(a, RARRAY_EMBED_FLAG)`
0x0000000000295569 <+41>: jne 0x295571 <rb_array_len+49>
0x000000000029556b <+43>: mov 0x10(%rbx),%rax ;; <-
0x000000000029556f <+47>: pop %rbx ;; <- This is `return RARRAY(a)->as.heap.len;`
0x0000000000295570 <+48>: retq ;; <-
0x0000000000295571 <+49>: cmp $0x7,%ecx
0x0000000000295574 <+52>: jne 0x2955a2 <rb_array_len+98>
0x0000000000295576 <+54>: test $0x2000,%eax
0x000000000029557b <+59>: je 0x2955ea <rb_array_len+170>
0x000000000029557d <+61>: shr $0xf,%eax ;; <-
0x0000000000295580 <+64>: and $0x3,%eax ;; <- This is `return RARRAY_EMBED_LEN(a);`
0x0000000000295583 <+67>: pop %rbx ;; <-
0x0000000000295584 <+68>: retq ;; <-
0x0000000000295585 <+69>: mov %rbx,%rdi
0x0000000000295588 <+72>: mov $0x7,%esi
0x000000000029558d <+77>: callq 0xcaea2 <rb_check_type>
0x0000000000295592 <+82>: mov (%rbx),%rax
0x0000000000295595 <+85>: mov %eax,%ecx
0x0000000000295597 <+87>: and $0x1f,%ecx
0x000000000029559a <+90>: cmp $0x1b,%rcx
0x000000000029559e <+94>: jne 0x295564 <rb_array_len+36>
0x00000000002955a0 <+96>: jmp 0x2955cb <rb_array_len+139>
0x00000000002955a2 <+98>: mov %rbx,%rdi
0x00000000002955a5 <+101>: mov $0x7,%esi
0x00000000002955aa <+106>: callq 0xcaea2 <rb_check_type>
0x00000000002955af <+111>: mov (%rbx),%rax
0x00000000002955b2 <+114>: mov %eax,%ecx
0x00000000002955b4 <+116>: and $0x1f,%ecx
0x00000000002955b7 <+119>: cmp $0x1b,%ecx
0x00000000002955ba <+122>: jne 0x295576 <rb_array_len+54>
0x00000000002955bc <+124>: jmp 0x2955cb <rb_array_len+139>
0x00000000002955be <+126>: mov %rbx,%rdi
0x00000000002955c1 <+129>: mov $0x7,%esi
0x00000000002955c6 <+134>: callq 0xcaea2 <rb_check_type>
0x00000000002955cb <+139>: lea 0x142fe(%rip),%rdi # 0x2a98d0
0x00000000002955d2 <+146>: lea 0x1432f(%rip),%rdx # 0x2a9908
0x00000000002955d9 <+153>: lea 0x14337(%rip),%rcx # 0x2a9917
0x00000000002955e0 <+160>: mov $0xea,%esi
0x00000000002955e5 <+165>: callq 0xcad86 <rb_assert_failure>
0x00000000002955ea <+170>: lea 0x14338(%rip),%rdi # 0x2a9929
0x00000000002955f1 <+177>: lea 0x1436d(%rip),%rdx # 0x2a9965
0x00000000002955f8 <+184>: lea 0x14377(%rip),%rcx # 0x2a9976
0x00000000002955ff <+191>: mov $0x79,%esi
0x0000000000295604 <+196>: callq 0xcad86 <rb_assert_failure>
End of assembler dump.
Here, assertions practically never fail. This means jumps are 100% predicted (almost no-op). They don't slow things. The problem is those unreachable branches. If you can read the assembly you see almost 2/3 of the above function just never reach. They blow the generated binary up significantly. rb_array_len is thus now considered too big to be inlined, to my compiler at least.
An obvious ad-hoc remedy is to supply __attribute__((__always_inline__)) for everything. But I don't think that's a good idea, because what is inlined and what is not depends very much on compilers, versions, target architectures, and almost everything.
Updated by shyouhei (Shyouhei Urabe) over 5 years ago
Actions
#6
[ruby-core:98184]
If you recompile everything using ./configure cppflags=-DNDEBUG, then those assertions are eliminated, to let compilers inline rb_array_len again.
Updated by shevegen (Robert A. Heiler) over 5 years ago
Actions
#7
[ruby-core:98185]
I have a question concerning one point mentioned above.
k0kubun wrote:
Provide .so for an assertion-enabled mode? (ko1's idea)
Could someone briefly explain the general idea behind this? I assume for a .so
file the ruby user would have to require/load that file, but what may be the
perceived benefits/disadvantages for doing so?
Updated by k0kubun (Takashi Kokubun) over 5 years ago
Actions
#8
[ruby-core:98194]
I would like to suggest that if a user really favor speed over sanity check, they should just compiler everything with -DNDEBUG. This has been the standard C manner since long before Ruby's birth.
Got it. I'll consider using -DNDEBUG in benchmark servers at least. Also maybe it's worth noting it in NEWS for those who package Ruby for performance-sensitive usages?
An obvious ad-hoc remedy is to supply
__attribute__((__always_inline__))for everything. But I don't think that's a good idea, because what is inlined and what is not depends very much on compilers, versions, target architectures, and almost everything.
Agreed. While it's not a good idea to always inline everything, some may be worth a consideration though.
I assume for a .so file the ruby user would have to require/load that file
His idea was to install the .so file to Ruby prefix by default and add a --debug-xxx option to load it.
Updated by k0kubun (Takashi Kokubun) over 5 years ago
Actions
#9
- Related to Bug #16840: Decrease in Hash#[]= performance with object keys added
Updated by nobu (Nobuyoshi Nakada) over 5 years ago
Actions
#10
[ruby-core:98212]
Not only assertions, some optimizations can no longer be applied.
For instance, rb_str_new_cstr was defined as following in 2.7,
#define rb_str_new_cstr(str) RB_GNUC_EXTENSION_BLOCK( \
(__builtin_constant_p(str)) ? \
rb_str_new_static((str), (long)strlen(str)) : \
rb_str_new_cstr(str) \
)
and rb_str_new_cstr("...") has been expected to be compiled as rb_str_new_static("...", 3).
The below is the master version.
static inline VALUE
ruby3_str_new_cstr(const char *str)
{
if /* constexpr */ (! RUBY3_CONSTANT_P(str)) {
return rb_str_new_cstr(str);
}
else {
long len = ruby3_strlen(str);
return rb_str_new_static(str, len);
}
}
As str is an argument variable and RUBY3_CONSTANT_P(str) is always false here, _static function is never used (in Apple clang 11.0.3 and gcc 10.1.0-RC-20200430_0).
I'm uncertain how this particular case affects the whole performance though, similar un-optimizations might be more.
Updated by shyouhei (Shyouhei Urabe) over 5 years ago
Actions
#11
[ruby-core:98214]
nobu (Nobuyoshi Nakada) wrote in #note-10:
As
stris an argument variable andRUBY3_CONSTANT_P(str)is always false here,
Well, thank you pointing this out. As I wrote in include/ruby/3/constant_p.h, you can apply __builtin_constant_p to an inline function argument. I thought that RUBY3_CONSTANT_P(str) is not always false. However https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html says:
You may use this built-in function in either a macro or an inline function. However, if you use it in an inlined function and pass an argument of the function as the argument to the built-in, GCC never returns 1 when you call the inline function with a string constant or ...
In this ruby3_str_new_cstr()'s particular case, the argument is a string. There is no chance. This is in fact wrong. We have to fix.
Updated by naruse (Yui NARUSE) over 5 years ago
Actions
#12
[ruby-core:98264]
I want Ruby 2.8/3.0 is faster than 2.7 by default.
NDEBUG is not acceptable.
I think Microsoft's _DEBUG approach is more reasonable.
Updated by shyouhei (Shyouhei Urabe) over 5 years ago
Actions
#13
[ruby-core:98277]
naruse (Yui NARUSE) wrote in #note-12:
NDEBUG is not acceptable.
NDEBUG is not my invention. Please file a bug report to upstream (ISO/IEC JTC1/SC22/WG14).
I'm not against defining it by default, though.
Updated by ko1 (Koichi Sasada) over 5 years ago
Actions
#14
- Status changed from Open to Closed
Applied in changeset git|21991e6ca59274e41a472b5256bd3245f6596c90.
Use RUBY_DEBUG instead of NDEBUG
Assertions in header files slows down an interpreter, so they should be
turned off by default (simple make). To enable them, define a macro
RUBY_DEBUG=1 (e.g. make cppflags=-DRUBY_DEBUG or use #define at
the very beggining of the file. Note that even if NDEBUG=1 is defined,
RUBY_DEBUG=1 enables all assertions.
[Feature #16837]
related: https://github.com/ruby/ruby/pull/3120
assert() lines in MRI *.c is not disabled even if RUBY_DEBUG=0 and
it can be disabled with NDEBUG=1. So please consider to use
RUBY_ASSERT() if you want to disable them when RUBY_DEBUG=0.