Project

General

Profile

Actions

Bug #17799

open

Seg fault in rb_class_clear_method_cache

Added by stanhu (Stan Hu) 5 months ago. Updated 5 months ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux]
[ruby-core:103434]

Description

Recently our CI tests have been intermittently failing with segmentation faults at random points, such as:

/builds/gitlab-org/security/gitlab/spec/support/shared_examples/requests/api/issues/merge_requests_count_shared_examples.rb:3: [BUG] Segmentation fault at 0x0000000000000000
ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0042 p:0003 s:0237 e:000236 TOP    /builds/gitlab-org/security/gitlab/spec/support/shared_examples/requests/api/issues/merge_requests_count_shared_examples.rb:3 [FINISH]
c:0041 p:---- s:0234 e:000233 CFUNC  :require
c:0040 p:0012 s:0229 e:000228 BLOCK  /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/bootsnap-1.4.6/lib/bootsnap/load_path_cache/core_ext/kernel_require.r
c:0039 p:0070 s:0226 e:000225 METHOD /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/bootsnap-1.4.6/lib/bootsnap/load_path_cache/loaded_features_index.rb:
c:0038 p:0025 s:0214 e:000213 METHOD /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/bootsnap-1.4.6/lib/bootsnap/load_path_cache/core_ext/kernel_require.r
c:0037 p:0055 s:0208 e:000207 METHOD /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/bootsnap-1.4.6/lib/bootsnap/load_path_cache/core_ext/kernel_require.r
c:0036 p:0006 s:0201 e:000200 BLOCK  /builds/gitlab-org/security/gitlab/spec/spec_helper.rb:71 [FINISH]
c:0035 p:---- s:0197 e:000196 CFUNC  :each
c:0034 p:0563 s:0193 e:000192 TOP    /builds/gitlab-org/security/gitlab/spec/spec_helper.rb:71 [FINISH]
c:0033 p:---- s:0187 e:000186 CFUNC  :require
c:0032 p:0007 s:0182 e:000181 METHOD /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration.rb:2112
c:0031 p:0008 s:0173 e:000172 BLOCK  /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration.rb:1574 [FINISH]
c:0030 p:---- s:0169 e:000168 CFUNC  :each
c:0029 p:0042 s:0165 e:000164 METHOD /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration.rb:1574
c:0028 p:0048 s:0159 e:000158 BLOCK  /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration_options.rb:113 [FINISH]
c:0027 p:---- s:0155 e:000154 CFUNC  :each
c:0026 p:0019 s:0151 e:000150 METHOD /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration_options.rb:112
c:0025 p:0005 s:0145 e:000144 METHOD /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration_options.rb:22
c:0024 p:0035 s:0140 e:000139 METHOD /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/runner.rb:132
c:0023 p:0007 s:0134 e:000133 METHOD /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/runner.rb:99
c:0022 p:0007 s:0128 e:000127 METHOD /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/runner.rb:86
c:0021 p:0065 s:0122 e:000121 METHOD /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/runner.rb:71
c:0020 p:0020 s:0114 e:000113 METHOD /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/runner.rb:45
c:0019 p:0025 s:0109 e:000108 TOP    /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/exe/rspec:4 [FINISH]
c:0018 p:---- s:0106 e:000105 CFUNC  :load
c:0017 p:0112 s:0101 e:000100 TOP    /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/bin/rspec:23 [FINISH]
c:0016 p:---- s:0096 e:000095 CFUNC  :load
c:0015 p:0107 s:0091 e:000090 METHOD /usr/local/lib/ruby/2.7.0/bundler/cli/exec.rb:63
c:0014 p:0071 s:0083 e:000082 METHOD /usr/local/lib/ruby/2.7.0/bundler/cli/exec.rb:28
c:0013 p:0024 s:0078 e:000077 METHOD /usr/local/lib/ruby/2.7.0/bundler/cli.rb:476
c:0012 p:0054 s:0073 e:000072 METHOD /usr/local/lib/ruby/2.7.0/bundler/vendor/thor/lib/thor/command.rb:27
c:0011 p:0040 s:0065 e:000064 METHOD /usr/local/lib/ruby/2.7.0/bundler/vendor/thor/lib/thor/invocation.rb:127
c:0010 p:0239 s:0058 e:000057 METHOD /usr/local/lib/ruby/2.7.0/bundler/vendor/thor/lib/thor.rb:399
c:0009 p:0008 s:0045 e:000044 METHOD /usr/local/lib/ruby/2.7.0/bundler/cli.rb:30
c:0008 p:0066 s:0040 e:000039 METHOD /usr/local/lib/ruby/2.7.0/bundler/vendor/thor/lib/thor/base.rb:476
c:0007 p:0008 s:0033 e:000032 METHOD /usr/local/lib/ruby/2.7.0/bundler/cli.rb:24
c:0006 p:0109 s:0028 e:000027 BLOCK  /usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/libexec/bundle:46
c:0005 p:0002 s:0022 e:000021 METHOD /usr/local/lib/ruby/2.7.0/bundler/friendly_errors.rb:123
c:0004 p:0111 s:0017 E:001838 TOP    /usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/libexec/bundle:34 [FINISH]
c:0003 p:---- s:0013 e:000012 CFUNC  :load
c:0002 p:0112 s:0008 E:002100 EVAL   /usr/local/bin/bundle:23 [FINISH]
c:0001 p:0000 s:0003 E:001040 (none) [FINISH]

-- Ruby level backtrace information ----------------------------------------
/usr/local/bin/bundle:23:in `<main>'
/usr/local/bin/bundle:23:in `load'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/libexec/bundle:34:in `<top (required)>'
/usr/local/lib/ruby/2.7.0/bundler/friendly_errors.rb:123:in `with_friendly_errors'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/libexec/bundle:46:in `block in <top (required)>'
/usr/local/lib/ruby/2.7.0/bundler/cli.rb:24:in `start'
/usr/local/lib/ruby/2.7.0/bundler/vendor/thor/lib/thor/base.rb:476:in `start'
/usr/local/lib/ruby/2.7.0/bundler/cli.rb:30:in `dispatch'
/usr/local/lib/ruby/2.7.0/bundler/vendor/thor/lib/thor.rb:399:in `dispatch'
/usr/local/lib/ruby/2.7.0/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'
/usr/local/lib/ruby/2.7.0/bundler/vendor/thor/lib/thor/command.rb:27:in `run'
/usr/local/lib/ruby/2.7.0/bundler/cli.rb:476:in `exec'
/usr/local/lib/ruby/2.7.0/bundler/cli/exec.rb:28:in `run'
/usr/local/lib/ruby/2.7.0/bundler/cli/exec.rb:63:in `kernel_load'
/usr/local/lib/ruby/2.7.0/bundler/cli/exec.rb:63:in `load'
/builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/bin/rspec:23:in `<top (required)>'
/builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/bin/rspec:23:in `load'
/builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/exe/rspec:4:in `<top (required)>'
/builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/runner.rb:45:in `invoke'
/builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/runner.rb:71:in `run'
/builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/runner.rb:86:in `run'
/builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/runner.rb:99:in `setup'
/builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/runner.rb:132:in `configure'
/builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration_options.rb:22:in `configure'
/builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration_options.rb:112:in `process_options_into'
/builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration_options.rb:112:in `each'
/builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration_options.rb:113:in `block in process_options_into'
/builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration.rb:1574:in `requires='
/builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration.rb:1574:in `each'
/builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration.rb:1574:in `block in requires='
/builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration.rb:2112:in `load_file_handling_errors'
/builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration.rb:2112:in `require'
/builds/gitlab-org/security/gitlab/spec/spec_helper.rb:71:in `<top (required)>'
/builds/gitlab-org/security/gitlab/spec/spec_helper.rb:71:in `each'
/builds/gitlab-org/security/gitlab/spec/spec_helper.rb:71:in `block in <top (required)>'
/builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/bootsnap-1.4.6/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:31:in `require'
/builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/bootsnap-1.4.6/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:22:in `require_with_bootsnap_lfi'
/builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/bootsnap-1.4.6/lib/bootsnap/load_path_cache/loaded_features_index.rb:92:in `register'
/builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/bootsnap-1.4.6/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:23:in `block in require_with_bootsnap_lfi'
/builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/bootsnap-1.4.6/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:23:in `require'
/builds/gitlab-org/security/gitlab/spec/support/shared_examples/requests/api/issues/merge_requests_count_shared_examples.rb:3:in `<top (required)>'

-- Machine register context ------------------------------------------------
 RIP: 0x00007fba9179f8fb RBP: 0x00007ffdb2bc4dc0 RSP: 0x00007ffdb2bc3d40
 RAX: 0x0000565411171e60 RBX: 0x0000000000000000 RCX: 0x0000000004bf1491
 RDX: 0x00007ffdb2bc4dc0 RDI: 0x00005654110bc550 RSI: 0x00007fba9179f8c0
  R8: 0x0000565406728098  R9: 0x00007fba91124170 R10: 0x0000565406726010
 R11: 0x00007fba91124170 R12: 0x00007fba9179f8c0 R13: 0x0000000004bd5abc
 R14: 0x000056543d860c70 R15: 0x0000565435cff1e0 EFL: 0x0000000000010246

-- Other runtime information -----------------------------------------------

We managed to generate a core file from this seg fault:

$ docker run -v /tmp/bugs:/bugs -it registry.gitlab.com/gitlab-org/gitlab-build-images:ruby-2.7.2.patched-golang-1.14-git-2.31-lfs-2.9-chrome-89-node-14.15-yarn-1.22-postgresql-12-graphicsmagick-1.3.36 bash
root@25a81975afab:/bugs# mkdir -p /builds/gitlab-org/security/gitlab/
root@25a81975afab:/bugs# cd /builds/gitlab-org/security/gitlab/
root@25a81975afab:/builds/gitlab-org/security/gitlab# unzip /bugs/cache.zip
Archive:  /bugs/cache.zip
   creating: vendor/gitaly-ruby/
   creating: vendor/gitaly-ruby/ruby/
   creating: vendor/gitaly-ruby/ruby/2.7.0/
   creating: vendor/gitaly-ruby/ruby/2.7.0/bin/
  inflating: vendor/gitaly-ruby/ruby/2.7.0/bin/codera
<snip>
root@25a81975afab:/bugs# gdb /usr/local/bin/ruby --core core.bundle.1618331218.363
GNU gdb (Debian 8.2.1-2+b3) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/bin/ruby...done.

warning: core file may not match specified executable file.
[New LWP 363]
[New LWP 533]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/bin/rspec -Ispec -rspec_he'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7fba90f65740 (LWP 363))]
(gdb) t a a bt

Thread 2 (Thread 0x7fba87c62700 (LWP 533)):
#0  0x00007fba91056916 in __GI_ppoll (fds=fds@entry=0x7fba87b616d8, nfds=nfds@entry=1, timeout=<optimized out>, timeout@entry=0x7fba87b616e0, sigmask=sigmask@entry=0x0)
    at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1  0x00007fba91771890 in rb_sigwait_sleep (th=th@entry=0x5654120da230, sigwait_fd=sigwait_fd@entry=3, rel=rel@entry=0x7fba87b61790) at hrtime.h:148
#2  0x00007fba91772599 in native_sleep (th=0x5654120da230, rel=0x7fba87b61790) at thread_pthread.c:2099
#3  0x00007fba91775e2f in sleep_hrtime (fl=2, rel=<optimized out>, th=0x5654120da230) at thread.c:1303
#4  rb_thread_wait_for (time=...) at thread.c:1351
#5  0x00007fba916e10e0 in rb_f_sleep (argc=1, argv=0x7fba87b61d58, _=<optimized out>) at process.c:4886
#6  0x00007fba917a4c39 in vm_call_cfunc_with_frame (empty_kw_splat=<optimized out>, cd=0x56540b8b7a80, calling=<optimized out>, reg_cfp=0x7fba87c61ca0, ec=0x5654120da410) at vm_insnhelper.c:2514
#7  vm_call_cfunc (ec=0x5654120da410, reg_cfp=0x7fba87c61ca0, calling=<optimized out>, cd=0x56540b8b7a80) at vm_insnhelper.c:2539
#8  0x00007fba917bd6bc in vm_call_method_each_type (ec=0x5654120da410, cfp=0x7fba87c61ca0, calling=0x7fba87b61a00, cd=0x56540b8b7a80) at vm_insnhelper.c:2925
#9  0x00007fba917bde55 in vm_call_method_each_type (cd=<optimized out>, calling=<optimized out>, cfp=<optimized out>, ec=<optimized out>) at vm_insnhelper.c:3026
#10 vm_call_method (ec=0x5654120da410, cfp=0x7fba87c61ca0, calling=<optimized out>, cd=<optimized out>) at vm_insnhelper.c:3053
#11 0x00007fba917b0072 in vm_sendish (block_handler=<optimized out>, method_explorer=<optimized out>, cd=<optimized out>, reg_cfp=<optimized out>, ec=<optimized out>) at vm_insnhelper.c:4023
#12 vm_exec_core (ec=0x7fba87b616d8, initial=1) at insns.def:801
#13 0x00007fba917b5b8c in rb_vm_exec (ec=0x5654120da410, mjit_enable_p=1) at vm.c:1920
#14 0x00007fba917b729c in invoke_iseq_block_from_c (me=0x0, is_lambda=<optimized out>, cref=0x0, passed_block_handler=0, kw_splat=<optimized out>, argv=<optimized out>, argc=1, self=94918931276240,
    captured=<optimized out>, ec=0x5654120da410) at vm.c:1116
#15 invoke_block_from_c_proc (me=0x0, is_lambda=<optimized out>, passed_block_handler=0, kw_splat=<optimized out>, argv=<optimized out>, argc=1, self=94918931276240, proc=0x5654120da410,
    ec=0x5654120da410) at vm.c:1216
#16 vm_invoke_proc (passed_block_handler=0, kw_splat=<optimized out>, argv=<optimized out>, argc=1, self=94918931276240, proc=0x5654120da410, ec=0x5654120da410) at vm.c:1238
#17 rb_vm_invoke_proc (ec=0x5654120da410, proc=proc@entry=0x5654135f2920, argc=1, argv=<optimized out>, kw_splat=<optimized out>, passed_block_handler=passed_block_handler@entry=0) at vm.c:1259
#18 0x00007fba9177447d in thread_do_start (th=0x5654120da230) at thread.c:697
#19 0x00007fba917764ff in thread_start_func_2 (th=0x5654120da230, stack_start=<optimized out>) at thread.c:745
#20 0x00007fba91776a44 in thread_start_func_1 (th_ptr=<optimized out>) at thread_pthread.c:969
#21 0x00007fba912fefa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#22 0x00007fba910614cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7fba90f65740 (LWP 363)):
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fba90f8a535 in __GI_abort () at abort.c:79
#2  0x00007fba9157275b in die () at error.c:664
#3  rb_bug_for_fatal_signal (default_sighandler=0x0, sig=sig@entry=11, ctx=ctx@entry=0x565406831a00, fmt=fmt@entry=0x7fba91808f8b "Segmentation fault at %p") at error.c:664
#4  0x00007fba917314db in sigsegv (sig=11, info=0x565406831b30, ctx=0x565406831a00) at signal.c:946
#5  <signal handler called>
#6  rb_class_clear_method_cache (klass=0, arg=140439281334464) at vm.c:362
#7  0x00007fba9159b33d in rb_class_foreach_subclass (arg=8, f=<optimized out>, klass=<optimized out>) at class.c:122
#8  rb_class_detach_module_subclasses (klass=<optimized out>) at class.c:147
#9  0x0000000000000000 in ?? ()
(gdb)

This seg fault seems to have occurred rb_class_clear_method_cache, perhaps in https://github.com/ruby/ruby/blob/5445e0435260b449decf2ac16f9d09bae3cafe72/vm.c#L362?


Files

job.log (1.93 MB) job.log Full job log stanhu (Stan Hu), 04/13/2021 05:23 PM

Updated by xtkoba (Tee KOBAYASHI) 5 months ago

My observation is that the function rb_class_clear_method_cache is called with argument klass=0, which clearly causes a null pointer dereference at vm_method.c:66. There seems to be a bug in other place of the code (including third-party C extensions) that wrongfully assigns 0 to the class serial.

I apologize if this could be a duplicate of what I posted before which was mistakenly deleted.

Updated by xtkoba (Tee KOBAYASHI) 5 months ago

My explanation in #note-1 is partly incorrect. What is wrongfully 0 is the value of the class itself, not the class serial.

Updated by stanhu (Stan Hu) 5 months ago

xtkoba (Tee KOBAYASHI) wrote in #note-2:

My explanation in #note-1 is partly incorrect. What is wrongfully 0 is the value of the class itself, not the class serial.

Thanks. I haven't been able to reproduce the problem with optimizations turned off (-O0) or with this patch below. I'm not sure why this would be the case. Is it possible another thread is modifying the class definitions, and we need to add the volatile keyword to ensure the compiler doesn't optimize out the lookups?

diff --git a/class.c b/class.c
index c866d1d727..37ff3c5ade 100644
--- a/class.c
+++ b/class.c
@@ -27,6 +27,7 @@
 #include "ruby/st.h"
 #include "constant.h"
 #include "vm_core.h"
+#include "vm_debug.h"
 #include "id_table.h"
 #include <ctype.h>

@@ -119,6 +120,12 @@ rb_class_foreach_subclass(VALUE klass, void (*f)(VALUE, VALUE), VALUE arg)
     while (cur) {
    VALUE curklass = cur->klass;
    cur = cur->next;
+
+   if (curklass == 0) {
+       fprintf(stderr, "=== Detected NULL subclass:\n");
+       dp(curklass);
+   }
+
    f(curklass, arg);
     }
 }

Before

void
rb_class_foreach_subclass(VALUE klass, void (*f)(VALUE, VALUE), VALUE arg)
{
    rb_subclass_entry_t *cur = RCLASS_EXT(klass)->subclasses;

    /* do not be tempted to simplify this loop into a for loop, the order of
       operations is important here if `f` modifies the linked list */
    while (cur) {
        VALUE curklass = cur->klass;
        cur = cur->next;
        f(curklass, arg);
    }
}
0000000000000cf0 <rb_class_foreach_subclass>:
     cf0:   41 54                   push   %r12
     cf2:   55                      push   %rbp
     cf3:   53                      push   %rbx
     cf4:   48 8b 47 18             mov    0x18(%rdi),%rax
     cf8:   48 8b 58 28             mov    0x28(%rax),%rbx
     cfc:   48 85 db                test   %rbx,%rbx
     cff:   74 21                   je     d22 <rb_class_foreach_subclass+0x32>
     d01:   49 89 f4                mov    %rsi,%r12
     d04:   48 89 d5                mov    %rdx,%rbp
     d07:   66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
     d0e:   00 00
     d10:   48 8b 3b                mov    (%rbx),%rdi
     d13:   48 8b 5b 08             mov    0x8(%rbx),%rbx
     d17:   48 89 ee                mov    %rbp,%rsi
     d1a:   41 ff d4                callq  *%r12
     d1d:   48 85 db                test   %rbx,%rbx
     d20:   75 ee                   jne    d10 <rb_class_foreach_subclass+0x20>
     d22:   5b                      pop    %rbx
     d23:   5d                      pop    %rbp
     d24:   41 5c                   pop    %r12
     d26:   c3                      retq
     d27:   66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
     d2e:   00 00

After

rb_class_foreach_subclass(VALUE klass, void (*f)(VALUE, VALUE), VALUE arg)
{
    rb_subclass_entry_t *cur = RCLASS_EXT(klass)->subclasses;

    /* do not be tempted to simplify this loop into a for loop, the order of
       operations is important here if `f` modifies the linked list */
    while (cur) {
        VALUE curklass = cur->klass;
        cur = cur->next;

        if (curklass == 0) {
                fprintf(stderr, "=== DETECTED null module class:\n");
                dp(curklass);
        }

        f(curklass, arg);
    }
}
0000000000000cf0 <rb_class_foreach_subclass>:
     cf0:   41 55                   push   %r13
     cf2:   41 54                   push   %r12
     cf4:   4c 8d 2d 00 00 00 00    lea    0x0(%rip),%r13        # cfb <rb_class_foreach_subclass+0xb>
     cfb:   55                      push   %rbp
     cfc:   53                      push   %rbx
     cfd:   49 89 f4                mov    %rsi,%r12
     d00:   48 89 d5                mov    %rdx,%rbp
     d03:   48 83 ec 08             sub    $0x8,%rsp
     d07:   48 8b 47 18             mov    0x18(%rdi),%rax
     d0b:   48 8b 58 28             mov    0x28(%rax),%rbx
     d0f:   48 85 db                test   %rbx,%rbx
     d12:   74 1b                   je     d2f <rb_class_foreach_subclass+0x3f>
     d14:   0f 1f 40 00             nopl   0x0(%rax)
     d18:   48 8b 3b                mov    (%rbx),%rdi
     d1b:   48 8b 5b 08             mov    0x8(%rbx),%rbx
     d1f:   48 85 ff                test   %rdi,%rdi
     d22:   74 1c                   je     d40 <rb_class_foreach_subclass+0x50>
     d24:   48 89 ee                mov    %rbp,%rsi
     d27:   41 ff d4                callq  *%r12
     d2a:   48 85 db                test   %rbx,%rbx
     d2d:   75 e9                   jne    d18 <rb_class_foreach_subclass+0x28>
     d2f:   48 83 c4 08             add    $0x8,%rsp
     d33:   5b                      pop    %rbx
     d34:   5d                      pop    %rbp
     d35:   41 5c                   pop    %r12
     d37:   41 5d                   pop    %r13
     d39:   c3                      retq
     d3a:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
     d40:   48 8b 0d 00 00 00 00    mov    0x0(%rip),%rcx        # d47 <rb_class_foreach_subclass+0x57>
     d47:   ba 20 00 00 00          mov    $0x20,%edx
     d4c:   be 01 00 00 00          mov    $0x1,%esi
     d51:   4c 89 ef                mov    %r13,%rdi
     d54:   e8 00 00 00 00          callq  d59 <rb_class_foreach_subclass+0x69>
     d59:   48 8d 15 00 00 00 00    lea    0x0(%rip),%rdx        # d60 <rb_class_foreach_subclass+0x70>
     d60:   31 c9                   xor    %ecx,%ecx
     d62:   31 f6                   xor    %esi,%esi
     d64:   bf ff ff ff ff          mov    $0xffffffff,%edi
     d69:   e8 00 00 00 00          callq  d6e <rb_class_foreach_subclass+0x7e>
     d6e:   31 ff                   xor    %edi,%edi
     d70:   48 89 ee                mov    %rbp,%rsi
     d73:   41 ff d4                callq  *%r12
     d76:   48 85 db                test   %rbx,%rbx
     d79:   75 9d                   jne    d18 <rb_class_foreach_subclass+0x28>
     d7b:   48 83 c4 08             add    $0x8,%rsp
     d7f:   5b                      pop    %rbx
     d80:   5d                      pop    %rbp
     d81:   41 5c                   pop    %r12
     d83:   41 5d                   pop    %r13
     d85:   c3                      retq
     d86:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
     d8d:   00 00 00

Updated by stanhu (Stan Hu) 5 months ago

I don't see any compiler optimizations that would explain why the behavior would be any different. I was hoping to see something optimized out (as described in the volatile section in https://ruby-hacking-guide.github.io/gc.html), but both assembly dumps contain the same code for the loop:

    ;; Set arg 1 to be cur->klass
     d18:   48 8b 3b                mov    (%rbx),%rdi
    ;; cur = cur->next;
     d1b:   48 8b 5b 08             mov    0x8(%rbx),%rbx
    ;; if (cur->klass == 0)
     d1f:   48 85 ff                test   %rdi,%rdi
     d22:   74 1c                   je     d40 <rb_class_foreach_subclass+0x50>
    ;; Set arg 2 for f(curklass, arg)
     d24:   48 89 ee                mov    %rbp,%rsi
    ;; f(curklass, arg)
     d27:   41 ff d4                callq  *%r12
    ;; while (cur)

Full notes:

Before

0000000000000cf0 <rb_class_foreach_subclass>:
     cf0:   41 54                   push   %r12
     cf2:   55                      push   %rbp
     cf3:   53                      push   %rbx
    ;; Store RCLASS_EXT(klass) => rax
     cf4:   48 8b 47 18             mov    0x18(%rdi),%rax
    ;; rb_subclass_entry_t *cur = RCLASS_EXT(klass)->subclasses => rbx
     cf8:   48 8b 58 28             mov    0x28(%rax),%rbx
    ;; while (cur)
     cfc:   48 85 db                test   %rbx,%rbx
     cff:   74 21                   je     d22 <rb_class_foreach_subclass+0x32>
    ;; Store f => r12
     d01:   49 89 f4                mov    %rsi,%r12
    ;; Store arg => RBP
     d04:   48 89 d5                mov    %rdx,%rbp
     d07:   66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
     d0e:   00 00
    ;; Set arg 1 to be cur->klass
     d10:   48 8b 3b                mov    (%rbx),%rdi
    ;; cur = cur->next;
     d13:   48 8b 5b 08             mov    0x8(%rbx),%rbx
    ;; Set arg 2 for f(curklass, arg)
     d17:   48 89 ee                mov    %rbp,%rsi
    ;; f(curklass, arg)
     d1a:   41 ff d4                callq  *%r12
    ;; while (cur)
     d1d:   48 85 db                test   %rbx,%rbx
     d20:   75 ee                   jne    d10 <rb_class_foreach_subclass+0x20>
     d22:   5b                      pop    %rbx
     d23:   5d                      pop    %rbp
     d24:   41 5c                   pop    %r12
     d26:   c3                      retq
     d27:   66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
     d2e:   00 00

After

0000000000000cf0 <rb_class_foreach_subclass>:
     cf0:   41 55                   push   %r13
     cf2:   41 54                   push   %r12
     cf4:   4c 8d 2d 00 00 00 00    lea    0x0(%rip),%r13        # cfb <rb_class_foreach_subclass+0xb>
     cfb:   55                      push   %rbp
     cfc:   53                      push   %rbx
    ;; Store f => r12
     cfd:   49 89 f4                mov    %rsi,%r12
     d00:   48 89 d5                mov    %rdx,%rbp
     d03:   48 83 ec 08             sub    $0x8,%rsp
    ;; Store RCLASS_EXT(klass) => rax
     d07:   48 8b 47 18             mov    0x18(%rdi),%rax
    ;; rb_subclass_entry_t *cur = RCLASS_EXT(klass)->subclasses => rbx
     d0b:   48 8b 58 28             mov    0x28(%rax),%rbx
    ;; while (cur)
     d0f:   48 85 db                test   %rbx,%rbx
     d12:   74 1b                   je     d2f <rb_class_foreach_subclass+0x3f>
     d14:   0f 1f 40 00             nopl   0x0(%rax)
    ;; Set arg 1 to be cur->klass
     d18:   48 8b 3b                mov    (%rbx),%rdi
    ;; cur = cur->next;
     d1b:   48 8b 5b 08             mov    0x8(%rbx),%rbx
    ;; if (cur->klass == 0)
     d1f:   48 85 ff                test   %rdi,%rdi
     d22:   74 1c                   je     d40 <rb_class_foreach_subclass+0x50>
    ;; Set arg 2 for f(curklass, arg)
     d24:   48 89 ee                mov    %rbp,%rsi
    ;; f(curklass, arg)
     d27:   41 ff d4                callq  *%r12
    ;; while (cur)
     d2a:   48 85 db                test   %rbx,%rbx
     d2d:   75 e9                   jne    d18 <rb_class_foreach_subclass+0x28>
     d2f:   48 83 c4 08             add    $0x8,%rsp
     d33:   5b                      pop    %rbx
     d34:   5d                      pop    %rbp
     d35:   41 5c                   pop    %r12
     d37:   41 5d                   pop    %r13
     d39:   c3                      retq
     d3a:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
     d40:   48 8b 0d 00 00 00 00    mov    0x0(%rip),%rcx        # d47 <rb_class_foreach_subclass+0x57>
     d47:   ba 20 00 00 00          mov    $0x20,%edx
     d4c:   be 01 00 00 00          mov    $0x1,%esi
     d51:   4c 89 ef                mov    %r13,%rdi
    ;; fprintf(stderr, "=== DETECTED null module class:\n")
     d54:   e8 00 00 00 00          callq  d59 <rb_class_foreach_subclass+0x69>
     d59:   48 8d 15 00 00 00 00    lea    0x0(%rip),%rdx        # d60 <rb_class_foreach_subclass+0x70>
     d60:   31 c9                   xor    %ecx,%ecx
     d62:   31 f6                   xor    %esi,%esi
     d64:   bf ff ff ff ff          mov    $0xffffffff,%edi
    ;; dp(curklass)
     d69:   e8 00 00 00 00          callq  d6e <rb_class_foreach_subclass+0x7e>
    ;; Set arg 1 to 0
     d6e:   31 ff                   xor    %edi,%edi
    ;; Set arg 2 for f(curklass, arg)
     d70:   48 89 ee                mov    %rbp,%rsi
    ;; f(curklass, arg)
     d73:   41 ff d4                callq  *%r12
     d76:   48 85 db                test   %rbx,%rbx
     d79:   75 9d                   jne    d18 <rb_class_foreach_subclass+0x28>
     d7b:   48 83 c4 08             add    $0x8,%rsp
     d7f:   5b                      pop    %rbx
     d80:   5d                      pop    %rbp
     d81:   41 5c                   pop    %r12
     d83:   41 5d                   pop    %r13
     d85:   c3                      retq
     d86:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
     d8d:   00 00 00

Updated by xtkoba (Tee KOBAYASHI) 5 months ago

It would be possible that the lines inserted affect other parts of the code.

FYI, I once struggled with a similar problem (#17540), which I am not sure is relevant.

Updated by stanhu (Stan Hu) 5 months ago

xtkoba (Tee KOBAYASHI) wrote in #note-5:

It would be possible that the lines inserted affect other parts of the code.

FYI, I once struggled with a similar problem (#17540), which I am not sure is relevant.

Thanks! I wonder if this is a similar strict aliasing issue.

One thing I don't understand is this part of the stack trace:

#4  0x00007fba917314db in sigsegv (sig=11, info=0x565406831b30, ctx=0x565406831a00) at signal.c:946
#5  <signal handler called>
#6  rb_class_clear_method_cache (klass=0, arg=140439281334464) at vm.c:362
#7  0x00007fba9159b33d in rb_class_foreach_subclass (arg=8, f=<optimized out>, klass=<optimized out>) at class.c:122
#8  rb_class_detach_module_subclasses (klass=<optimized out>) at class.c:147
#9  0x0000000000000000 in ?? ()

How does rb_class_detach_module_subclasses ever call rb_class_clear_method_cache? From https://github.com/ruby/ruby/blob/5445e0435260b449decf2ac16f9d09bae3cafe72/class.c#L147:


static void
class_detach_module_subclasses(VALUE klass, VALUE arg)
{
    rb_class_remove_from_module_subclasses(klass);
}

void
rb_class_detach_module_subclasses(VALUE klass)
{
    rb_class_foreach_subclass(klass, class_detach_module_subclasses, Qnil);
}

void
rb_class_remove_from_module_subclasses(VALUE klass)
{
    rb_subclass_entry_t *entry;

    if (RCLASS_EXT(klass)->module_subclasses) {
    entry = *RCLASS_EXT(klass)->module_subclasses;
    *RCLASS_EXT(klass)->module_subclasses = entry->next;

    if (entry->next) {
        RCLASS_EXT(entry->next->klass)->module_subclasses = RCLASS_EXT(klass)->module_subclasses;
    }

    xfree(entry);
    }

    RCLASS_EXT(klass)->module_subclasses = NULL;
}

Is there some hook that I'm not seeing, or is this call indicative of some incorrect function pointer?

Updated by xtkoba (Tee KOBAYASHI) 5 months ago

The stack trace seems somehow corrupted and not correct, possibly messed up by signal trampoline. It is true that there are some types of bugs that corrupt the call stack (e.g. longjmp with incorrect jump buffer). I have no idea whether this is the case here.

Updated by stanhu (Stan Hu) 5 months ago

xtkoba (Tee KOBAYASHI) wrote in #note-7:

The stack trace seems somehow corrupted and not correct, possibly messed up by signal trampoline. It is true that there are some types of bugs that corrupt the call stack (e.g. longjmp with incorrect jump buffer). I have no idea whether this is the case here.

Ok, I analyzed the assembly code some more, and I wonder if this is indeed a strict aliasing problem. Earlier I had been looking at rb_class_foreach_subclass, but with -O3 enabled, the compiler performs an inline optimization so that the function looks like:

void rb_class_detach_module_subclasses(VALUE klass)
{
    rb_subclass_entry_t *cur = RCLASS_EXT(klass)->subclasses;

    /* do not be tempted to simplify this loop into a for loop, the order of
       operations is important here if `f` modifies the linked list */
    while (cur) {
        VALUE curklass = cur->klass;
        cur = cur->next;

        if (RCLASS_EXT(klass)->module_subclasses) {
            entry = *RCLASS_EXT(klass)->module_subclasses;
            *RCLASS_EXT(klass)->module_subclasses = entry->next;

            if (entry->next) {
              RCLASS_EXT(entry->next->klass)->module_subclasses = RCLASS_EXT(klass)->module_subclasses;
            }

            xfree(entry);
        }

        RCLASS_EXT(klass)->module_subclasses = NULL;
    }
}

Ruby compiler defaults

0000000000000da0 <rb_class_detach_module_subclasses>:
{
     da0:   55                      push   %rbp
     da1:   53                      push   %rbx
     da2:   48 83 ec 08             sub    $0x8,%rsp
    rb_subclass_entry_t *cur = RCLASS_EXT(klass)->subclasses;
     da6:   48 8b 47 18             mov    0x18(%rdi),%rax
     daa:   48 8b 58 28             mov    0x28(%rax),%rbx
    while (cur) {
     dae:   48 85 db                test   %rbx,%rbx
     db1:   74 4d                   je     e00 <rb_class_detach_module_subclasses+0x60>
     db3:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
    VALUE curklass = cur->klass;
     db8:   48 8b 2b                mov    (%rbx),%rbp
    cur = cur->next;
     dbb:   48 8b 5b 08             mov    0x8(%rbx),%rbx
    if (RCLASS_EXT(klass)->module_subclasses) {
     dbf:   48 8b 55 18             mov    0x18(%rbp),%rdx
     dc3:   48 8b 42 38             mov    0x38(%rdx),%rax
     dc7:   48 85 c0                test   %rax,%rax
     dca:   74 27                   je     df3 <rb_class_detach_module_subclasses+0x53>
    entry = *RCLASS_EXT(klass)->module_subclasses;
     dcc:   48 8b 38                mov    (%rax),%rdi
    *RCLASS_EXT(klass)->module_subclasses = entry->next;
     dcf:   48 8b 57 08             mov    0x8(%rdi),%rdx
     dd3:   48 89 10                mov    %rdx,(%rax)
    if (entry->next) {
     dd6:   48 8b 57 08             mov    0x8(%rdi),%rdx
     dda:   48 85 d2                test   %rdx,%rdx
     ddd:   74 0b                   je     dea <rb_class_detach_module_subclasses+0x4a>
        RCLASS_EXT(entry->next->klass)->module_subclasses = RCLASS_EXT(klass)->module_subclasses;
     ddf:   48 8b 12                mov    (%rdx),%rdx
     de2:   48 8b 52 18             mov    0x18(%rdx),%rdx
     de6:   48 89 42 38             mov    %rax,0x38(%rdx)
    xfree(entry);
     dea:   e8 00 00 00 00          callq  def <rb_class_detach_module_subclasses+0x4f>
     def:   48 8b 55 18             mov    0x18(%rbp),%rdx
    while (cur) {
     df3:   48 85 db                test   %rbx,%rbx
    RCLASS_EXT(klass)->module_subclasses = NULL;
     df6:   48 c7 42 38 00 00 00    movq   $0x0,0x38(%rdx)
     dfd:   00
    while (cur) {
     dfe:   75 b8                   jne    db8 <rb_class_detach_module_subclasses+0x18>
}
     e00:   48 83 c4 08             add    $0x8,%rsp
     e04:   5b                      pop    %rbx
     e05:   5d                      pop    %rbp
     e06:   c3                      retq
     e07:   66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
     e0e:   00 00

With -fno-strict-aliasing

0000000000000de0 <rb_class_detach_module_subclasses>:
{
     de0:   55                      push   %rbp
     de1:   53                      push   %rbx
     de2:   48 83 ec 08             sub    $0x8,%rsp
    rb_subclass_entry_t *cur = RCLASS_EXT(klass)->subclasses;
     de6:   48 8b 47 18             mov    0x18(%rdi),%rax
     dea:   48 8b 58 28             mov    0x28(%rax),%rbx
    while (cur) {
     dee:   48 85 db                test   %rbx,%rbx
     df1:   74 55                   je     e48 <rb_class_detach_module_subclasses+0x68>
     df3:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
    VALUE curklass = cur->klass;
     df8:   48 8b 2b                mov    (%rbx),%rbp
    cur = cur->next;
     dfb:   48 8b 5b 08             mov    0x8(%rbx),%rbx
    if (RCLASS_EXT(klass)->module_subclasses) {
     dff:   48 8b 45 18             mov    0x18(%rbp),%rax
     e03:   48 8b 50 38             mov    0x38(%rax),%rdx
     e07:   48 85 d2                test   %rdx,%rdx
     e0a:   74 2f                   je     e3b <rb_class_detach_module_subclasses+0x5b>
    entry = *RCLASS_EXT(klass)->module_subclasses;
     e0c:   48 8b 3a                mov    (%rdx),%rdi
    *RCLASS_EXT(klass)->module_subclasses = entry->next;
     e0f:   48 8b 47 08             mov    0x8(%rdi),%rax
     e13:   48 89 02                mov    %rax,(%rdx)
    if (entry->next) {
     e16:   48 8b 47 08             mov    0x8(%rdi),%rax
     e1a:   48 85 c0                test   %rax,%rax
     e1d:   74 13                   je     e32 <rb_class_detach_module_subclasses+0x52>
        RCLASS_EXT(entry->next->klass)->module_subclasses = RCLASS_EXT(klass)->module_subclasses;
     e1f:   48 8b 55 18             mov    0x18(%rbp),%rdx
     e23:   48 8b 00                mov    (%rax),%rax
     e26:   48 8b 52 38             mov    0x38(%rdx),%rdx
     e2a:   48 8b 40 18             mov    0x18(%rax),%rax
     e2e:   48 89 50 38             mov    %rdx,0x38(%rax)
    xfree(entry);
     e32:   e8 00 00 00 00          callq  e37 <rb_class_detach_module_subclasses+0x57>
     e37:   48 8b 45 18             mov    0x18(%rbp),%rax
    while (cur) {
     e3b:   48 85 db                test   %rbx,%rbx
    RCLASS_EXT(klass)->module_subclasses = NULL;
     e3e:   48 c7 40 38 00 00 00    movq   $0x0,0x38(%rax)
     e45:   00
    while (cur) {
     e46:   75 b0                   jne    df8 <rb_class_detach_module_subclasses+0x18>
}
     e48:   48 83 c4 08             add    $0x8,%rsp
     e4c:   5b                      pop    %rbx
     e4d:   5d                      pop    %rbp
     e4e:   c3                      retq
     e4f:   90                      nop

The key block in question is the line (https://github.com/ruby/ruby/blob/5445e0435260b449decf2ac16f9d09bae3cafe72/class.c#L103):

RCLASS_EXT(entry->next->klass)->module_subclasses = RCLASS_EXT(klass)->module_subclasses;

In the first case, we see:

            RCLASS_EXT(entry->next->klass)->module_subclasses = RCLASS_EXT(klass)->module_subclasses;
     ddf:       48 8b 12                mov    (%rdx),%rdx
     de2:       48 8b 52 18             mov    0x18(%rdx),%rdx
     de6:       48 89 42 38             mov    %rax,0x38(%rdx)

However, in the second case with no strict aliasing disabled we see:

            RCLASS_EXT(entry->next->klass)->module_subclasses = RCLASS_EXT(klass)->module_subclasses;
     e1f:       48 8b 55 18             mov    0x18(%rbp),%rdx
     e23:       48 8b 00                mov    (%rax),%rax
     e26:       48 8b 52 38             mov    0x38(%rdx),%rdx
     e2a:       48 8b 40 18             mov    0x18(%rax),%rax
     e2e:       48 89 50 38             mov    %rdx,0x38(%rax)

What's the difference? In the first case, it looks like the assembly is much shorter because RCLASS_EXT(klass)->module_subclasses is saved to rdx, and the compiler just reuses that.

However, in the second case, the assembly code recalculates this again--which seems like a good idea because the assignment from entry->next (https://github.com/ruby/ruby/blob/5445e0435260b449decf2ac16f9d09bae3cafe72/class.c#L100) might have changed the type? Though I would hope that the pointer address should remain the same. In looking at the assembly output with the fprintf patch above, I'm seeing the optimization applied too, so this might just be a red herring. I'll have to see if I can reproduce with just -fno-strict-aliasing.

Updated by xtkoba (Tee KOBAYASHI) 5 months ago

I doubt if rb_class_detach_module_subclasses in class.c is worth investigating. The reason is as follows. As mentioned in #note-6, there is a large gap between Frame 6 and Frame 8 of the crashed thread which we cannot fill in, and so we cannot rely on the stack trace. And when we look at the register dump, RIP points at 0x00007fba9179f8fb, which should come from somewhere in vm*.c and not class.c when alphabetical order is taken into account.

Actions

Also available in: Atom PDF