Project

General

Profile

Actions

Feature #16837

closed

Can we make Ruby 3.0 as fast as Ruby 2.7 with the new assertions?

Added by k0kubun (Takashi Kokubun) almost 4 years ago. Updated almost 4 years ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:98174]

Description

Problem

How can we make Ruby 3.0 as fast as (or faster than) Ruby 2.7?

Background

Possible approaches

I have no strong preference yet. Here are some random ideas:

  • Optimize the assertion code somehow
  • Enable the new assertions only on CIs, at least ones in hot spots
    • Not sure which places have large impact on Optcarrot yet
  • Make some other not-so-important assertions CI-only to offset the impact from new ones
  • Provide .so for an assertion-enabled mode? (ko1's idea)

I hope people will comment more ideas in this ticket.


Related issues 1 (0 open1 closed)

Related to Ruby master - Bug #16840: Decrease in Hash#[]= performance with object keysClosedActions
Actions #1

Updated by k0kubun (Takashi Kokubun) almost 4 years ago

  • Tracker changed from Bug to Feature
  • Backport deleted (2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN)
Actions #2

Updated by k0kubun (Takashi Kokubun) almost 4 years ago

  • Description updated (diff)
Actions #3

Updated by k0kubun (Takashi Kokubun) almost 4 years ago

  • Description updated (diff)

Updated by shyouhei (Shyouhei Urabe) almost 4 years ago

I would like to suggest that if a user really favor speed over sanity check, they should just compiler everything with -DNDEBUG. This has been the standard C manner since long before Ruby's birth.

Updated by shyouhei (Shyouhei Urabe) almost 4 years ago

Some analysis of the slowdown.

Looking at the generated binary and perf output, the slowdown is because some functions are not inlined. Might depend on compilers, but for me rb_array_len() is one of such victim:

zsh % gdb -batch -ex 'file miniruby' -ex 'disassemble rb_array_len'
Dump of assembler code for function rb_array_len:
   0x0000000000295540 <+0>:     push   %rbx
   0x0000000000295541 <+1>:     mov    %rdi,%rbx
   0x0000000000295544 <+4>:     test   $0x7,%bl
   0x0000000000295547 <+7>:     jne    0x2955be <rb_array_len+126>
   0x0000000000295549 <+9>:     mov    %rbx,%rax
   0x000000000029554c <+12>:    and    $0xfffffffffffffff7,%rax
   0x0000000000295550 <+16>:    je     0x2955be <rb_array_len+126>
   0x0000000000295552 <+18>:    mov    (%rbx),%rax
   0x0000000000295555 <+21>:    mov    %eax,%edx
   0x0000000000295557 <+23>:    and    $0x1f,%edx
   0x000000000029555a <+26>:    mov    $0x7,%ecx
   0x000000000029555f <+31>:    cmp    $0x7,%edx
   0x0000000000295562 <+34>:    jne    0x295585 <rb_array_len+69>
   0x0000000000295564 <+36>:    test   $0x2000,%eax                 ;; <- This is `RB_FL_ANY_RAW(a, RARRAY_EMBED_FLAG)`
   0x0000000000295569 <+41>:    jne    0x295571 <rb_array_len+49>
   0x000000000029556b <+43>:    mov    0x10(%rbx),%rax              ;; <-
   0x000000000029556f <+47>:    pop    %rbx                         ;; <- This is `return RARRAY(a)->as.heap.len;`
   0x0000000000295570 <+48>:    retq                                ;; <-
   0x0000000000295571 <+49>:    cmp    $0x7,%ecx
   0x0000000000295574 <+52>:    jne    0x2955a2 <rb_array_len+98>
   0x0000000000295576 <+54>:    test   $0x2000,%eax
   0x000000000029557b <+59>:    je     0x2955ea <rb_array_len+170>
   0x000000000029557d <+61>:    shr    $0xf,%eax                    ;; <-
   0x0000000000295580 <+64>:    and    $0x3,%eax                    ;; <- This is `return RARRAY_EMBED_LEN(a);`
   0x0000000000295583 <+67>:    pop    %rbx                         ;; <-
   0x0000000000295584 <+68>:    retq                                ;; <-
   0x0000000000295585 <+69>:    mov    %rbx,%rdi
   0x0000000000295588 <+72>:    mov    $0x7,%esi
   0x000000000029558d <+77>:    callq  0xcaea2 <rb_check_type>
   0x0000000000295592 <+82>:    mov    (%rbx),%rax
   0x0000000000295595 <+85>:    mov    %eax,%ecx
   0x0000000000295597 <+87>:    and    $0x1f,%ecx
   0x000000000029559a <+90>:    cmp    $0x1b,%rcx
   0x000000000029559e <+94>:    jne    0x295564 <rb_array_len+36>
   0x00000000002955a0 <+96>:    jmp    0x2955cb <rb_array_len+139>
   0x00000000002955a2 <+98>:    mov    %rbx,%rdi
   0x00000000002955a5 <+101>:   mov    $0x7,%esi
   0x00000000002955aa <+106>:   callq  0xcaea2 <rb_check_type>
   0x00000000002955af <+111>:   mov    (%rbx),%rax
   0x00000000002955b2 <+114>:   mov    %eax,%ecx
   0x00000000002955b4 <+116>:   and    $0x1f,%ecx
   0x00000000002955b7 <+119>:   cmp    $0x1b,%ecx
   0x00000000002955ba <+122>:   jne    0x295576 <rb_array_len+54>
   0x00000000002955bc <+124>:   jmp    0x2955cb <rb_array_len+139>
   0x00000000002955be <+126>:   mov    %rbx,%rdi
   0x00000000002955c1 <+129>:   mov    $0x7,%esi
   0x00000000002955c6 <+134>:   callq  0xcaea2 <rb_check_type>
   0x00000000002955cb <+139>:   lea    0x142fe(%rip),%rdi        # 0x2a98d0
   0x00000000002955d2 <+146>:   lea    0x1432f(%rip),%rdx        # 0x2a9908
   0x00000000002955d9 <+153>:   lea    0x14337(%rip),%rcx        # 0x2a9917
   0x00000000002955e0 <+160>:   mov    $0xea,%esi
   0x00000000002955e5 <+165>:   callq  0xcad86 <rb_assert_failure>
   0x00000000002955ea <+170>:   lea    0x14338(%rip),%rdi        # 0x2a9929
   0x00000000002955f1 <+177>:   lea    0x1436d(%rip),%rdx        # 0x2a9965
   0x00000000002955f8 <+184>:   lea    0x14377(%rip),%rcx        # 0x2a9976
   0x00000000002955ff <+191>:   mov    $0x79,%esi
   0x0000000000295604 <+196>:   callq  0xcad86 <rb_assert_failure>
End of assembler dump.

Here, assertions practically never fail. This means jumps are 100% predicted (almost no-op). They don't slow things. The problem is those unreachable branches. If you can read the assembly you see almost 2/3 of the above function just never reach. They blow the generated binary up significantly. rb_array_len is thus now considered too big to be inlined, to my compiler at least.

An obvious ad-hoc remedy is to supply __attribute__((__always_inline__)) for everything. But I don't think that's a good idea, because what is inlined and what is not depends very much on compilers, versions, target architectures, and almost everything.

Updated by shyouhei (Shyouhei Urabe) almost 4 years ago

If you recompile everything using ./configure cppflags=-DNDEBUG, then those assertions are eliminated, to let compilers inline rb_array_len again.

Updated by shevegen (Robert A. Heiler) almost 4 years ago

I have a question concerning one point mentioned above.

k0kubun wrote:

Provide .so for an assertion-enabled mode? (ko1's idea)

Could someone briefly explain the general idea behind this? I assume for a .so
file the ruby user would have to require/load that file, but what may be the
perceived benefits/disadvantages for doing so?

Updated by k0kubun (Takashi Kokubun) almost 4 years ago

I would like to suggest that if a user really favor speed over sanity check, they should just compiler everything with -DNDEBUG. This has been the standard C manner since long before Ruby's birth.

Got it. I'll consider using -DNDEBUG in benchmark servers at least. Also maybe it's worth noting it in NEWS for those who package Ruby for performance-sensitive usages?

An obvious ad-hoc remedy is to supply __attribute__((__always_inline__)) for everything. But I don't think that's a good idea, because what is inlined and what is not depends very much on compilers, versions, target architectures, and almost everything.

Agreed. While it's not a good idea to always inline everything, some may be worth a consideration though.

I assume for a .so file the ruby user would have to require/load that file

His idea was to install the .so file to Ruby prefix by default and add a --debug-xxx option to load it.

Actions #9

Updated by k0kubun (Takashi Kokubun) almost 4 years ago

  • Related to Bug #16840: Decrease in Hash#[]= performance with object keys added

Updated by nobu (Nobuyoshi Nakada) almost 4 years ago

Not only assertions, some optimizations can no longer be applied.

For instance, rb_str_new_cstr was defined as following in 2.7,

#define rb_str_new_cstr(str) RB_GNUC_EXTENSION_BLOCK(	\
    (__builtin_constant_p(str)) ?		\
	rb_str_new_static((str), (long)strlen(str)) : \
	rb_str_new_cstr(str)			\
)

and rb_str_new_cstr("...") has been expected to be compiled as rb_str_new_static("...", 3).

The below is the master version.

static inline VALUE
ruby3_str_new_cstr(const char *str)
{
    if /* constexpr */ (! RUBY3_CONSTANT_P(str)) {
        return rb_str_new_cstr(str);
    }
    else {
        long len = ruby3_strlen(str);
        return rb_str_new_static(str, len);
    }
}

As str is an argument variable and RUBY3_CONSTANT_P(str) is always false here, _static function is never used (in Apple clang 11.0.3 and gcc 10.1.0-RC-20200430_0).

I'm uncertain how this particular case affects the whole performance though, similar un-optimizations might be more.

Updated by shyouhei (Shyouhei Urabe) almost 4 years ago

nobu (Nobuyoshi Nakada) wrote in #note-10:

As str is an argument variable and RUBY3_CONSTANT_P(str) is always false here,

Well, thank you pointing this out. As I wrote in include/ruby/3/constant_p.h, you can apply __builtin_constant_p to an inline function argument. I thought that RUBY3_CONSTANT_P(str) is not always false. However https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html says:

You may use this built-in function in either a macro or an inline function. However, if you use it in an inlined function and pass an argument of the function as the argument to the built-in, GCC never returns 1 when you call the inline function with a string constant or ...

In this ruby3_str_new_cstr()'s particular case, the argument is a string. There is no chance. This is in fact wrong. We have to fix.

Updated by naruse (Yui NARUSE) almost 4 years ago

I want Ruby 2.8/3.0 is faster than 2.7 by default.
NDEBUG is not acceptable.
I think Microsoft's _DEBUG approach is more reasonable.

Updated by shyouhei (Shyouhei Urabe) almost 4 years ago

naruse (Yui NARUSE) wrote in #note-12:

NDEBUG is not acceptable.

NDEBUG is not my invention. Please file a bug report to upstream (ISO/IEC JTC1/SC22/WG14).

I'm not against defining it by default, though.

Actions #14

Updated by ko1 (Koichi Sasada) almost 4 years ago

  • Status changed from Open to Closed

Applied in changeset git|21991e6ca59274e41a472b5256bd3245f6596c90.


Use RUBY_DEBUG instead of NDEBUG

Assertions in header files slows down an interpreter, so they should be
turned off by default (simple make). To enable them, define a macro
RUBY_DEBUG=1 (e.g. make cppflags=-DRUBY_DEBUG or use #define at
the very beggining of the file. Note that even if NDEBUG=1 is defined,
RUBY_DEBUG=1 enables all assertions.
[Feature #16837]
related: https://github.com/ruby/ruby/pull/3120

assert() lines in MRI *.c is not disabled even if RUBY_DEBUG=0 and
it can be disabled with NDEBUG=1. So please consider to use
RUBY_ASSERT() if you want to disable them when RUBY_DEBUG=0.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0