Project

General

Profile

Actions

Bug #17052

closed

Ruby with LTO enabled has issues with SIGSEGV handler

Added by vo.x (Vit Ondruch) over 1 year ago. Updated 3 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [powerpc64le-linux]
[ruby-core:99351]

Description

Fedora aims to enable link time optimization (LTO) of packages in next release. The specific changes in configuration options are available here 1. Since that time, I observe following errors 2 at least on {aarch64,ppc64le} (and possibly also other architectures):

  1) Failure:
TestBugReporter#test_bug_reporter_add [/builddir/build/BUILD/ruby-2.7.1/test/-ext-/bug_reporter/test_bug_reporter.rb:22]:
pid 32395 killed by SIGSEGV (signal 11) (core dumped)
| -:1: [BUG] Segmentation fault at 0x000003e800007e8b
| ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [powerpc64le-linux]
| 
| -- Control frame information -----------------------------------------------
| c:0003 p:---- s:0012 e:000011 CFUNC  :kill
| c:0002 p:0021 s:0006 e:000005 EVAL   -:1 [FINISH]
| c:0001 p:0000 s:0003 E:000f80 (none) [FINISH]
| 
| -- Ruby level backtrace information ----------------------------------------
| -:1:in `<main>'
| -:1:in `kill'
| 
| -- C level backtrace information -------------------------------------------
.
1. [2/2] Assertion for "stderr"
   | Expected /Sample bug reporter: 12345/
   | to match
   |   "-- Control frame information -----------------------------------------------\n"+
   |   "c:0003 p:---- s:0012 e:000011 CFUNC  :kill\n"+
   |   "c:0002 p:0021 s:0006 e:000005 EVAL   -:1 [FINISH]\n"+
   |   "c:0001 p:0000 s:0003 E:000f80 (none) [FINISH]\n\n"+
   |   "-- Ruby level backtrace information ----------------------------------------\n"+
   |   "-:1:in `<main>'\n"+
   |   "-:1:in `kill'\n\n"+
   |   "-- C level backtrace information -------------------------------------------\n"
   | after 4 patterns with 120 characters.
  2) Failure:
TestRubyOptions#test_segv_loaded_features [/builddir/build/BUILD/ruby-2.7.1/test/ruby/test_rubyoptions.rb:735]:
pid 38444 killed by SIGSEGV (signal 11) (core dumped)
| -e:1: [BUG] Segmentation fault at 0x000003e80000962c
| ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [powerpc64le-linux]
| 
| -- Control frame information -----------------------------------------------
| c:0003 p:---- s:0012 e:000011 CFUNC  :kill
| c:0002 p:0016 s:0006 e:000005 BLOCK  -e:1 [FINISH]
| c:0001 p:0000 s:0003 E:002460 (none) [FINISH]
| 
| -- Ruby level backtrace information ----------------------------------------
| -e:1:in `block in <main>'
| -e:1:in `kill'
| 
| -- C level backtrace information -------------------------------------------
.
1. [2/2] Assertion for "stderr"
   | <""> expected but was
   | <"-- C level backtrace information -------------------------------------------\n">.
  3) Failure:
TestRubyOptions#test_segv_setproctitle [/builddir/build/BUILD/ruby-2.7.1/test/ruby/test_rubyoptions.rb:749]:
pid 38451 killed by SIGSEGV (signal 11) (core dumped)
| -e:1: [BUG] Segmentation fault at 0x000003e800009633
| ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [powerpc64le-linux]
| 
| -- Control frame information -----------------------------------------------
| c:0003 p:---- s:0012 e:000011 CFUNC  :kill
| c:0002 p:0029 s:0006 e:000005 EVAL   -e:1 [FINISH]
| c:0001 p:0000 s:0003 E:000480 (none) [FINISH]
| 
| -- Ruby level backtrace information ----------------------------------------
| -e:1:in `<main>'
| -e:1:in `kill'
| 
| -- C level backtrace information -------------------------------------------
.
1. [2/2] Assertion for "stderr"
   | <""> expected but was
   | <"-- C level backtrace information -------------------------------------------\n">.
  4) Failure:
TestRubyOptions#test_segv_test [/builddir/build/BUILD/ruby-2.7.1/test/ruby/test_rubyoptions.rb:729]:
pid 38460 killed by SIGSEGV (signal 11) (core dumped)
| -e:1: [BUG] Segmentation fault at 0x000003e80000963c
| ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [powerpc64le-linux]
| 
| -- Control frame information -----------------------------------------------
| c:0003 p:---- s:0012 e:000011 CFUNC  :kill
| c:0002 p:0015 s:0006 e:000005 EVAL   -e:1 [FINISH]
| c:0001 p:0000 s:0003 E:0006a0 (none) [FINISH]
| 
| -- Ruby level backtrace information ----------------------------------------
| -e:1:in `<main>'
| -e:1:in `kill'
| 
| -- C level backtrace information -------------------------------------------
.
1. [2/2] Assertion for "stderr"
   | <""> expected but was
   | <"-- C level backtrace information -------------------------------------------\n">.
Finished tests in 840.600443s, 25.0047 tests/s, 3238.9681 assertions/s.
21019 tests, 2722678 assertions, 4 failures, 0 errors, 70 skips
ruby -v: ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [powerpc64le-linux]

When I raised the issue on fedora-devel ML 3, there was suggestion that it might happen when signal handler modifies any global variable. Now I am not sure if that is the case. Can somebody confirm? Or investigate/fix this, please?


Files

ruby-addr2line-DW_FORM_ref_addr.patch (651 Bytes) ruby-addr2line-DW_FORM_ref_addr.patch Correctly interpret the size of `DW_FORM_ref_addr` xtkoba (Tee KOBAYASHI), 08/12/2021 09:56 PM
ruby-addr2line-read_abstract_origin.patch (1.37 KB) ruby-addr2line-read_abstract_origin.patch Restrict forms in `read_abstract_origin` xtkoba (Tee KOBAYASHI), 08/13/2021 01:40 PM
ruby-addr2line-abbrev_table.patch (1.49 KB) ruby-addr2line-abbrev_table.patch addr2line.c: Correct `abbrev_table` entries xtkoba (Tee KOBAYASHI), 08/17/2021 07:42 AM
ruby-addr2line-abbrev_table-r1.patch (2.81 KB) ruby-addr2line-abbrev_table-r1.patch Correct `abbrev_table` entries (revised) xtkoba (Tee KOBAYASHI), 08/17/2021 10:01 AM
ruby-addr2line-debug_info_read-debug-code-0001.patch (1.11 KB) ruby-addr2line-debug_info_read-debug-code-0001.patch Debug code for `debug_info_read` xtkoba (Tee KOBAYASHI), 08/19/2021 12:51 PM

Updated by vo.x (Vit Ondruch) over 1 year ago

  • Subject changed from Ruby with LTO enabled on {aarch64,ppc64le} architectures. to Ruby with LTO enabled has issues with SIGSEGV handler

Actually, the platform might not matter. It might fail just randomly, because now I have caught the same issue also on x86_64.

Updated by shyouhei (Shyouhei Urabe) over 1 year ago

  • Status changed from Open to Third Party's Issue

Yes I can reproduce this on my machine.

% LC_ALL=C gdb --args ./miniruby -e'Process.kill("SIGSEGV",$$)'
GNU gdb (Ubuntu 8.2-0ubuntu1~18.04) 8.2
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./miniruby...done.
(gdb) run
Starting program: ./miniruby -eProcess.kill\(\"SIGSEGV\",\$\$\)
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7a23237 in kill () at ../sysdeps/unix/syscall-template.S:78
78      ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0  0x00007ffff7a23237 in kill () at ../sysdeps/unix/syscall-template.S:78
#1  0x000055555571ec9b in rb_f_kill (argc=2, argv=0x7ffff7ecb048) at signal.c:480
#2  0x0000555555789c44 in vm_call_cfunc_with_frame (ec=0x555555add590, reg_cfp=0x7ffff7fcafa0, calling=<optimized out>, cd=<optimized out>) at vm_insnhelper.c:2601
#3  0x000055555578d4d5 in vm_sendish (ec=0x555555add590, reg_cfp=0x7ffff7fcafa0, cd=0x555555bfaf30, block_handler=<optimized out>, method_explorer=<optimized out>) at vm_callinfo.h:337
#4  0x000055555578ef98 in vm_exec_core (ec=0x555555add590, initial=11) at insns.def:799
#5  0x00005555557a4ee4 in rb_vm_exec (ec=0x555555add590, mjit_enable_p=1) at vm.c:1953
#6  0x00005555555ff1d0 in rb_ec_exec_node (ec=ec@entry=0x555555add590, n=n@entry=0x555555ae9958) at eval.c:296
#7  0x0000555555603533 in ruby_run_node (n=0x555555ae9958) at eval.c:354
#8  0x000055555557ee0f in main (argc=<optimized out>, argv=<optimized out>) at main.c:50
(gdb) c
Continuing.
-e:1: [BUG] Segmentation fault at 0x0000286b000022c1
ruby 2.8.0dev (2020-08-20T02:48:16Z flto 71753b7f6d) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0003 p:---- s:0012 e:000011 CFUNC  :kill
c:0002 p:0015 s:0006 e:000005 EVAL   -e:1 [FINISH]
c:0001 p:0000 s:0003 E:000a10 (none) [FINISH]

-- Ruby level backtrace information ----------------------------------------
-e:1:in `<main>'
-e:1:in `kill'

-- Machine register context ------------------------------------------------
 RIP: 0x00007ffff7a23237 RBP: 0x0000000000000002 RSP: 0x00007fffffffc2b8
 RAX: 0x0000000000000000 RBX: 0x0000000000000001 RCX: 0x00007ffff7a23237
 RDX: 0x0000555555adca01 RDI: 0x00000000000022c1 RSI: 0x000000000000000b
  R8: 0x0000555555b2a610  R9: 0x0000555555bed890 R10: 0x0000555555bed890
 R11: 0x0000000000000206 R12: 0x00007ffff7ecb048 R13: 0x00000000000022c1
 R14: 0x00000000000022c1 R15: 0x000000000000000b EFL: 0x0000000000000206

-- C level backtrace information -------------------------------------------

Program received signal SIGSEGV, Segmentation fault.
0x00005555557a9a90 in uleb128 (p=0x555555bd5428) at addr2line.c:1274
1274        uint64_t abbrev_number = uleb128(&reader->p);
(gdb) bt
#0  0x00005555557a9a90 in uleb128 (p=0x555555bd5428) at addr2line.c:1274
#1  di_read_die (reader=0x555555bd53f0, die=0x555555bd5310) at addr2line.c:1274
#2  0x00005555557bb65d in read_abstract_origin (line=0x555555bd5380, abstract_origin=<optimized out>, reader=0x555555bd53f0) at addr2line.c:1486
#3  debug_info_read (traces=<optimized out>, offset=<optimized out>, lines=<optimized out>, num_traces=<optimized out>, reader=<optimized out>) at addr2line.c:1550
#4  fill_lines (num_traces=num_traces@entry=16, check_debuglink=check_debuglink@entry=1, objp=objp@entry=0x555555bd5d18, lines=lines@entry=0x555555c15480, offset=0, offset@entry=-1, traces=<optimized out>) at addr2line.c:1763
#5  0x00005555557bc069 in rb_dump_backtrace_with_lines.constprop.0 (num_traces=16, traces=<optimized out>) at addr2line.c:2150
#6  0x00005555557a8810 in rb_print_backtrace () at vm_dump.c:759
#7  0x00005555557acf21 in rb_vm_bugreport (ctx=<optimized out>) at vm_dump.c:955
#8  0x00005555555f59a4 in rb_bug_for_fatal_signal (default_sighandler=0x0, sig=11, ctx=0x555555bd6180, fmt=0x5555557ff3fa "Segmentation fault at %p") at error.c:675
#9  0x000055555571ac29 in sigsegv (sig=11, info=0x555555bd62b0, ctx=0x555555bd6180) at signal.c:959
#10 <signal handler called>
#11 0x00007ffff7a23237 in kill () at ../sysdeps/unix/syscall-template.S:78
#12 0x000055555571ec9b in rb_f_kill (argc=2, argv=0x7ffff7ecb048) at signal.c:480
#13 0x0000555555789c44 in vm_call_cfunc_with_frame (ec=0x555555add590, reg_cfp=0x7ffff7fcafa0, calling=<optimized out>, cd=<optimized out>) at vm_insnhelper.c:2601
#14 0x000055555578d4d5 in vm_sendish (ec=0x555555add590, reg_cfp=0x7ffff7fcafa0, cd=0x555555bfaf30, block_handler=<optimized out>, method_explorer=<optimized out>) at vm_callinfo.h:337
#15 0x000055555578ef98 in vm_exec_core (ec=0x555555add590, initial=11) at insns.def:799
#16 0x00005555557a4ee4 in rb_vm_exec (ec=0x555555add590, mjit_enable_p=1) at vm.c:1953
#17 0x00005555555ff1d0 in rb_ec_exec_node (ec=ec@entry=0x555555add590, n=n@entry=0x555555ae9958) at eval.c:296
#18 0x0000555555603533 in ruby_run_node (n=0x555555ae9958) at eval.c:354
#19 0x000055555557ee0f in main (argc=<optimized out>, argv=<optimized out>) at main.c:50
(gdb)

It seems the generated DWARF section is broken. For instance addr2line(1) also fails to understand it.

% nm ./miniruby | fgrep -w rb_f_kill | LC_ALL=C addr2line -e ./miniruby
addr2line: Dwarf Error: Could not find abbrev number 64.
??:?
:?

When you kill LTO option the above one liner must show "signal.c:423" or something.

vo.x (Vit Ondruch) is it possible for you to ask this to linker people instead? As addr2line(1) is also affected, it is hard for me to think we are the ones who is doing something wrong.

Updated by shyouhei (Shyouhei Urabe) over 1 year ago

Meanwhile it can not be a bad idea for us to avoid SEGV even when DWARF is broken.

Updated by vo.x (Vit Ondruch) over 1 year ago

Thank you for the investigation. There is a whole thread on Fedora devel mailing list:

https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/OAD6BEN6UMVIXAEI5EDS5COUYFSPNLZN/

I have posted results of your investigations there and I think Jeff Law will be back soon with some advice/fix.

Updated by vo.x (Vit Ondruch) over 1 year ago

vo.x (Vit Ondruch) wrote in #note-4:

Thank you for the investigation. There is a whole thread on Fedora devel mailing list:

https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/OAD6BEN6UMVIXAEI5EDS5COUYFSPNLZN/

I have posted results of your investigations there

This is the link:

https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/NLQBH6N6GK53T4CC5CM2SWQEPYGS2CMR/

and I think Jeff Law will be back soon with some advice/fix.

Updated by vo.x (Vit Ondruch) 4 months ago

This is still unresolved :(

Updated by vo.x (Vit Ondruch) 4 months ago

And it seems that the DWARF is in better shape:

$ nm ./miniruby | grep -w rb_f_kill
00000000001ddcf0 T rb_f_kill
000000000002bd88 t rb_f_kill.cold

$ nm ./miniruby | grep -w rb_f_kill | addr2line -e ./miniruby
/builddir/build/BUILD/ruby-3.0.2/signal.c:424
/builddir/build/BUILD/ruby-3.0.2/numeric.c:2980

Updated by vo.x (Vit Ondruch) 4 months ago

  • Status changed from Third Party's Issue to Open

Changing state back to open, because this is unresolved and I am repeatedly hitting this issue in various reincarnations. So this is the debug session on PPC64LE:

$ gdb --args ./miniruby -e'Process.kill("SIGSEGV",$$)'
GNU gdb (GDB) Fedora 10.2-6.fc35
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "ppc64le-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./miniruby...
warning: File "/builddir/build/BUILD/ruby-3.0.2/.gdbinit" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
    add-auto-load-safe-path /builddir/build/BUILD/ruby-3.0.2/.gdbinit
line to your configuration file "/builddir/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/builddir/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
(gdb) r
Starting program: /builddir/build/BUILD/ruby-3.0.2/miniruby -eProcess.kill\(\"SIGSEGV\",\$\$\)
Download failed: No route to host.  Continuing without debug info for /lib64/libz.so.1.
Download failed: No route to host.  Continuing without debug info for /lib64/libgmp.so.10.
Download failed: No route to host.  Continuing without debug info for /lib64/libcrypt.so.2.
Download failed: No route to host.  Continuing without debug info for /lib64/libm.so.6.
Download failed: No route to host.  Continuing without debug info for /lib64/libc.so.6.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7aa8810 in kill () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-1.fc35.ppc64le gmp-6.2.0-7.fc35.ppc64le libxcrypt-4.4.24-1.fc35.ppc64le zlib-1.2.11-30.fc35.ppc64le
(gdb) c
Continuing.
-e:1: [BUG] Segmentation fault at 0x590fb15c0000001f
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [powerpc64le-linux]

-- Control frame information -----------------------------------------------
c:0003 p:---- s:0012 e:000011 CFUNC  :kill
c:0002 p:0015 s:0006 e:000005 EVAL   -e:1 [FINISH]
c:0001 p:0000 s:0003 E:0013f0 (none) [FINISH]

-- Ruby level backtrace information ----------------------------------------
-e:1:in `<main>'
-e:1:in `kill'

-- C level backtrace information -------------------------------------------

Program received signal SIGSEGV, Segmentation fault.
0x000000010031fa44 in uleb128 (p=0x1005986a0) at addr2line.c:200
200     unsigned char b = *(unsigned char *)(*p)++;
(gdb) bt
#0  0x000000010031fa44 in uleb128 (p=0x1005986a0) at addr2line.c:200
#1  di_read_die (reader=0x100598668, die=0x100598588) at addr2line.c:1335
#2  0x0000000100337d4c in read_abstract_origin (line=0x1005985f8, abstract_origin=760312297580551, reader=0x100598668) at addr2line.c:1604
#3  debug_info_read (traces=<optimized out>, offset=<optimized out>, lines=<optimized out>, num_traces=<optimized out>, reader=<optimized out>) at addr2line.c:1668
#4  fill_lines (num_traces=num_traces@entry=19, check_debuglink=check_debuglink@entry=1, objp=0x100599010, objp@entry=0x1005990a0, lines=lines@entry=0x1005e2c50, offset=0, offset@entry=-1, 
    traces=<optimized out>) at addr2line.c:1888
#5  0x0000000100338d1c in rb_dump_backtrace_with_lines.constprop.0 (num_traces=<optimized out>, traces=<optimized out>) at addr2line.c:2287
#6  0x000000010031dfec in rb_print_backtrace () at vm_dump.c:760
#7  0x00000001003356a8 in rb_vm_bugreport.constprop.0 (ctx=<optimized out>) at vm_dump.c:998
#8  0x00000001000cd8c4 in rb_bug_for_fatal_signal (default_sighandler=0x0, sig=<optimized out>, ctx=<optimized out>, fmt=0x100383a98 "Segmentation fault at %p") at error.c:786
#9  0x0000000100256388 in sigsegv (sig=<optimized out>, info=0x10059a330, ctx=0x1005995b0) at signal.c:960
#10 <signal handler called>
#11 0x00007ffff7aa8810 in kill () from /lib64/libc.so.6
#12 0x000000010025ad70 in rb_f_kill (argc=<optimized out>, argv=0x7ffff78e0050) at signal.c:439
#13 0x00000001001fb448 in proc_rb_f_kill (c=<optimized out>, v=<optimized out>, _=<optimized out>) at process.c:8605
#14 0x00000001002e92d8 in ractor_safe_call_cfunc_m1 (recv=<optimized out>, argc=<optimized out>, argv=<optimized out>, func=<optimized out>) at vm_insnhelper.c:2739
#15 0x00000001002f2120 in vm_call_cfunc_with_frame (ec=0x100491ac0, reg_cfp=0x7fffffffe170, calling=<optimized out>) at vm_insnhelper.c:2929
#16 0x00000001002f4f34 in vm_sendish (ec=0x100491ac0, reg_cfp=0x7ffff79dffa0, cd=0x1005c2eb0, block_handler=<optimized out>, method_explorer=<optimized out>) at vm_insnhelper.c:4530
#17 0x00000001002fa2ac in vm_exec_core (ec=0x100491ac0, initial=<optimized out>) at insns.def:789
#18 0x0000000100315200 in rb_vm_exec (ec=0x100491ac0, mjit_enable_p=<optimized out>) at vm.c:2172
#19 0x0000000100317040 in rb_iseq_eval_main (iseq=0x1004a9fb0) at vm.c:2420
#20 0x00000001000d7c8c in rb_ec_exec_node (ec=ec@entry=0x100491ac0, n=n@entry=0x1004a9fb0) at eval.c:317
#21 0x00000001000d7df4 in ruby_run_node (n=0x1004a9fb0) at eval.c:375
#22 0x000000010002afb8 in main (argc=<optimized out>, argv=<optimized out>) at ./main.c:50
(gdb) 

Updated by xtkoba (Tee KOBAYASHI) 4 months ago

I noticed that addr2line.c misinterpreted the size of DW_FORM_ref_addr as the size of an address. In fact it should be interpreted as the size of an offset (32 bits in 32-bit DWARF and 64 bits in 64-bit DWARF) [1]. A patch is attached to fix this.

[1] http://wiki.dwarfstd.org/index.php?title=DWARF_FAQ#How_big_is_a_DW_FORM_ref_addr.3F

I am not sure if this patch resolves the main issue, because it fails to reproduce on my x86_64-linux environment. A noticeable difference between LTO and non-LTO binaries on x86_64-linux is that DW_AT_abstract_origin attribute is of the form DW_FORM_ref_addr in the former and DW_FORM_ref4 in the latter.

Note that there is another size misinterpretation in addr2line.c, which is fixed in the patch for #17823. I recommend it be applied also.

Updated by xtkoba (Tee KOBAYASHI) 4 months ago

Now I can reproduce this on my qemu-user-emulated powerpc64le-linux-musl environment:

$ powerpc64le-musl-qemu -strace ./miniruby -e 'Process.kill :SEGV, $$'
(...)
24209 readlink("/proc/self/exe",0x0000004000490bd8,4096) = 75
24209 open("/var/tmp/ruby/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/miniruby",O_RDONLY|O_LARGEFILE) = 5
24209 _llseek(5,0,0,0x0000004004160a70,SEEK_END) = 0
24209 _llseek(5,0,0,0x0000004004160a70,SEEK_SET) = 0
24209 mmap(NULL,24128544,PROT_READ,MAP_SHARED,5,0) = 0x000000400566b000
24209 close(5) = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=1, si_addr=NULL} ---
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Segmentation fault

And it turns out that the patch attached in #note-9 is not enough to resolve this. I'm going to try to debug addr2line.c further.

Updated by xtkoba (Tee KOBAYASHI) 4 months ago

It turns out that DW_FORM_ref_addr is not interpreted correctly. A patch is attached to just ignore that form (and other forms that are not supposed to be used currently in addr2line.c). This ignorance does not seem to affect the quality of C backtrace printing very much, and so I am not going to add the interpretation of currently ignored forms for now.

FYI, the forms that can be used for a value of DW_AT_abstract_origin are described in 7.5.5 of DWARF 5 specification (pp.217--218 in DWARF5.pdf).

Actions #12

Updated by xtkoba (Tee KOBAYASHI) 4 months ago

  • Status changed from Open to Closed

Applied in changeset git|72317b333b85eed483ad00bcd4f40944019a7c13.


Ignore DW_FORM_ref_addr [Bug #17052]

Ignore DW_FORM_ref_addr form and other forms that are not supposed
to be used currently.

Updated by vo.x (Vit Ondruch) 4 months ago

  • Status changed from Closed to Open

Thank you for looking into this.

Unfortunately, I have tried git|72317b333b85eed483ad00bcd4f40944019a7c13 and I see no difference.

BTW I test Ruby 3.0.2 including the following commits:

1) git|9e5105ca451d6d38eb2d03a2ffc904074f0333b9 from #17585
2) https://github.com/ruby/ruby/pull/4439

Updated by xtkoba (Tee KOBAYASHI) 4 months ago

Aren't there any subtle differences in GDB outputs? If fixed as intended, the value of abstract_origin can't get such large as 760312297580551 (0x2b380006a6807).

Updated by vo.x (Vit Ondruch) 4 months ago

This is the GDB output:

(gdb) r
Starting program: /builddir/build/BUILD/ruby-3.0.2/miniruby -e Process.kill\ :SEGV,\ \$\$
Download failed: No route to host.  Continuing without debug info for /lib64/libz.so.1.
Download failed: No route to host.  Continuing without debug info for /lib64/libgmp.so.10.
Download failed: No route to host.  Continuing without debug info for /lib64/libcrypt.so.2.
Download failed: No route to host.  Continuing without debug info for /lib64/libm.so.6.
Download failed: No route to host.  Continuing without debug info for /lib64/libc.so.6.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7aa8810 in kill () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-1.fc35.ppc64le gmp-6.2.0-7.fc35.ppc64le libxcrypt-4.4.25-1.fc35.ppc64le zlib-1.2.11-30.fc35.ppc64le
(gdb) c
Continuing.
-e:1: [BUG] Segmentation fault at 0x590fb15c0000001f
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [powerpc64le-linux]

-- Control frame information -----------------------------------------------
c:0003 p:---- s:0012 e:000011 CFUNC  :kill
c:0002 p:0015 s:0006 e:000005 EVAL   -e:1 [FINISH]
c:0001 p:0000 s:0003 E:0013f0 (none) [FINISH]

-- Ruby level backtrace information ----------------------------------------
-e:1:in `<main>'
-e:1:in `kill'

-- C level backtrace information -------------------------------------------

Program received signal SIGSEGV, Segmentation fault.
0x000000010031fb00 in uleb128 (p=0x1005986c0) at addr2line.c:200
200     unsigned char b = *(unsigned char *)(*p)++;
(gdb) bt
#0  0x000000010031fb00 in uleb128 (p=0x1005986c0) at addr2line.c:200
#1  di_read_die (reader=0x100598668, die=0x100598578) at addr2line.c:1343
#2  0x00000001003379a8 in debug_info_read (traces=<optimized out>, offset=<optimized out>, lines=<optimized out>, num_traces=<optimized out>, reader=<optimized out>) at addr2line.c:1650
#3  fill_lines (num_traces=num_traces@entry=19, check_debuglink=check_debuglink@entry=1, objp=0x100599010, objp@entry=0x1005990a0, lines=lines@entry=0x1005e2c20, offset=0, offset@entry=-1, 
    traces=<optimized out>) at addr2line.c:1907
#4  0x0000000100338cfc in rb_dump_backtrace_with_lines.constprop.0 (num_traces=<optimized out>, traces=<optimized out>) at addr2line.c:2306
#5  0x000000010031df2c in rb_print_backtrace () at vm_dump.c:760
#6  0x0000000100335668 in rb_vm_bugreport.constprop.0 (ctx=<optimized out>) at vm_dump.c:998
#7  0x00000001000cd8a4 in rb_bug_for_fatal_signal (default_sighandler=0x0, sig=<optimized out>, ctx=<optimized out>, fmt=0x100383a78 "Segmentation fault at %p") at error.c:786
#8  0x00000001002562e8 in sigsegv (sig=<optimized out>, info=0x10059a330, ctx=0x1005995b0) at signal.c:960
#9  <signal handler called>
#10 0x00007ffff7aa8810 in kill () from /lib64/libc.so.6
#11 0x000000010025acd0 in rb_f_kill (argc=<optimized out>, argv=0x7ffff78e0050) at signal.c:439
#12 0x00000001001fb3a8 in proc_rb_f_kill (c=<optimized out>, v=<optimized out>, _=<optimized out>) at process.c:8605
#13 0x00000001002e9238 in ractor_safe_call_cfunc_m1 (recv=<optimized out>, argc=<optimized out>, argv=<optimized out>, func=<optimized out>) at vm_insnhelper.c:2739
#14 0x00000001002f2080 in vm_call_cfunc_with_frame (ec=0x100491ac0, reg_cfp=0x7fffffffe170, calling=<optimized out>) at vm_insnhelper.c:2929
#15 0x00000001002f4e94 in vm_sendish (ec=0x100491ac0, reg_cfp=0x7ffff79dffa0, cd=0x1005c2e80, block_handler=<optimized out>, method_explorer=<optimized out>) at vm_insnhelper.c:4530
#16 0x00000001002fa20c in vm_exec_core (ec=0x100491ac0, initial=<optimized out>) at insns.def:789
#17 0x0000000100315140 in rb_vm_exec (ec=0x100491ac0, mjit_enable_p=<optimized out>) at vm.c:2172
#18 0x0000000100316f80 in rb_iseq_eval_main (iseq=0x1004a9fd8) at vm.c:2420
#19 0x00000001000d7c6c in rb_ec_exec_node (ec=ec@entry=0x100491ac0, n=n@entry=0x1004a9fd8) at eval.c:317
#20 0x00000001000d7dd4 in ruby_run_node (n=0x1004a9fd8) at eval.c:375
#21 0x000000010002afb8 in main (argc=<optimized out>, argv=<optimized out>) at ./main.c:50
(gdb) l uleb128
file: "addr2line.c", line number: 195, symbol: "uleb128"
190 
191 /* Avoid consuming stack as this module may be used from signal handler */
192 static char binary_filename[PATH_MAX + 1];
193 
194 static unsigned long
195 uleb128(char **p)
196 {
197     unsigned long r = 0;
198     int s = 0;
199     for (;;) {
file: "addr2line.c", line number: 197, symbol: "uleb128"
192 static char binary_filename[PATH_MAX + 1];
193 
194 static unsigned long
195 uleb128(char **p)
196 {
197     unsigned long r = 0;
198     int s = 0;
199     for (;;) {
200     unsigned char b = *(unsigned char *)(*p)++;
201     if (b < 0x80) {
file: "addr2line.c", line number: 200, symbol: "uleb128"
195 uleb128(char **p)
196 {
197     unsigned long r = 0;
198     int s = 0;
199     for (;;) {
200     unsigned char b = *(unsigned char *)(*p)++;
201     if (b < 0x80) {
202         r += (unsigned long)b << s;
203         break;
204     }
file: "addr2line.c", line number: 201, symbol: "uleb128"
196 {
197     unsigned long r = 0;
198     int s = 0;
199     for (;;) {
200     unsigned char b = *(unsigned char *)(*p)++;
201     if (b < 0x80) {
202         r += (unsigned long)b << s;
203         break;
204     }
205     r += (b & 0x7f) << s;
(gdb) 
206     s += 7;
207     }
208     return r;
209 }
210 
211 static long
212 sleb128(char **p)
213 {
214     long r = 0;
215     int s = 0;
(gdb) l di_read_die
1328    }
1329    #endif
1330    
1331    static DIE *
1332    di_read_die(DebugInfoReader *reader, DIE *die)
1333    {
1334        uint64_t abbrev_number = uleb128(&reader->p);
1335        if (abbrev_number == 0) {
1336            reader->level--;
1337            return NULL;
(gdb) 
1338        }
1339    
1340        reader->q = di_find_abbrev(reader, abbrev_number);
1341    
1342        die->pos = reader->p - reader->obj->debug_info.ptr - 1;
1343        die->tag = (int)uleb128(&reader->q); /* tag */
1344        die->has_children = *reader->q++; /* has_children */
1345        if (die->has_children) {
1346            reader->level++;
1347        }
(gdb) 
1348        return die;
1349    }
1350    
1351    static DebugInfoValue *
1352    di_read_record(DebugInfoReader *reader, DebugInfoValue *vp)
1353    {
1354        uint64_t at = uleb128(&reader->q);
1355        uint64_t form = uleb128(&reader->q);
1356        if (!at || !form) return NULL;
1357        vp->at = at;
(gdb) l debug_info_read
1643    debug_info_read(DebugInfoReader *reader, int num_traces, void **traces,
1644             line_info_t *lines, int offset) {
1645        while (reader->p < reader->cu_end) {
1646            DIE die;
1647            ranges_t ranges = {};
1648            line_info_t line = {};
1649    
1650            if (!di_read_die(reader, &die)) continue;
1651            /* fprintf(stderr,"%d:%tx: <%d>\n",__LINE__,die.pos,reader->level,die.tag); */
1652    
(gdb) 
1653            if (die.tag != DW_TAG_subprogram && die.tag != DW_TAG_inlined_subroutine) {
1654              skip_die:
1655                di_skip_records(reader);
1656                continue;
1657            }
1658    
1659            /* enumerate abbrev */
1660            for (;;) {
1661                DebugInfoValue v = {{}};
1662                /* ptrdiff_t pos = reader->p - reader->p0; */
(gdb) 
1663                if (!di_read_record(reader, &v)) break;
1664                /* fprintf(stderr,"\n%d:%tx: AT:%lx FORM:%lx\n",__LINE__,pos,v.at,v.form); */
1665                /* div_inspect(&v); */
1666                switch (v.at) {
1667                  case DW_AT_name:
1668                    line.sname = get_cstr_value(&v);
1669                    break;
1670                  case DW_AT_call_file:
1671                    fill_filename((int)v.as.uint64, reader->debug_line_directories, reader->debug_line_files, &line, reader->obj);
1672                    break;
(gdb) 
1673                  case DW_AT_call_line:
1674                    line.line = (int)v.as.uint64;
1675                    break;
1676                  case DW_AT_low_pc:
1677                  case DW_AT_high_pc:
1678                  case DW_AT_ranges:
1679                    ranges_set(&ranges, &v);
1680                    break;
1681                  case DW_AT_declaration:
1682                    goto skip_die;
(gdb) 
1683                  case DW_AT_inline:
1684                    /* 1 or 3 */
1685                    break; /* goto skip_die; */
1686                  case DW_AT_abstract_origin:
1687                    read_abstract_origin(reader, v.form, v.as.uint64, &line);
1688                    break; /* goto skip_die; */
1689                }
1690            }
1691            /* ranges_inspect(reader, &ranges); */
1692            /* fprintf(stderr,"%d:%tx: %x ",__LINE__,diepos,die.tag); */
(gdb) 
1693            for (int i=offset; i < num_traces; i++) {
1694                uintptr_t addr = (uintptr_t)traces[i];
1695                uintptr_t offset = addr - reader->obj->base_addr + reader->obj->vmaddr;
1696                uintptr_t saddr = ranges_include(reader, &ranges, offset);
1697                if (saddr) {
1698                    /* fprintf(stderr, "%d:%tx: %d %lx->%lx %x %s: %s/%s %d %s %s %s\n",__LINE__,die.pos, i,addr,offset, die.tag,line.sname,line.dirname,line.filename,line.line,reader->obj->path,line.sname,lines[i].sname); */
1699                    if (lines[i].sname) {
1700                        line_info_t *lp = malloc(sizeof(line_info_t));
1701                        memcpy(lp, &lines[i], sizeof(line_info_t));
1702                        lines[i].next = lp;
(gdb) 
1703                        lp->dirname = line.dirname;
1704                        lp->filename = line.filename;
1705                        lp->line = line.line;
1706                        lp->saddr = 0;
1707                    }
1708                    lines[i].path = reader->obj->path;
1709                    lines[i].base_addr = line.base_addr;
1710                    lines[i].sname = line.sname;
1711                    lines[i].saddr = saddr + reader->obj->base_addr - reader->obj->vmaddr;
1712                }
(gdb) 
1713            }
1714        }
1715    }
1716    
1717    #ifdef USE_ELF
1718    static unsigned long
1719    uncompress_debug_section(ElfW(Shdr) *shdr, char *file, char **ptr)
1720    {
1721    #ifdef SUPPORT_COMPRESSED_DEBUG_LINE
1722        ElfW(Chdr) *chdr = (ElfW(Chdr) *)(file + shdr->sh_offset);
(gdb) 

Updated by vo.x (Vit Ondruch) 4 months ago

Ah, I can see there is missing the read_abstract_origin on the backtrace.

Updated by xtkoba (Tee KOBAYASHI) 4 months ago

It turns out that di_read_debug_abbrev_cu in addr2line.c does not work correctly when abbrev_number is not aligned in ascending order for each CU entry of .debug_abbrev. A patch is attached to fix this.

Note again that I cannot reproduce this in my environment and so this patch does not necessarily resolve the main problem. Thank you for your patience.

Updated by xtkoba (Tee KOBAYASHI) 4 months ago

A revised patch (from #note-17) is attached so that abbrev_number >= 256 is handled properly, although it might be practically not going to happen.

Updated by xtkoba (Tee KOBAYASHI) 4 months ago

FWIW, another bug is found in addr2line.c (#18081), which I believe is irrevant to the issue here, though.

Updated by vo.x (Vit Ondruch) 4 months ago

The result with the latest patch is significantly different:

$ gdb --args ./miniruby -e'Process.kill("SIGSEGV",$$)'
GNU gdb (GDB) Fedora 10.2-6.fc35
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "ppc64le-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./miniruby...
warning: File "/builddir/build/BUILD/ruby-3.0.2/.gdbinit" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
    add-auto-load-safe-path /builddir/build/BUILD/ruby-3.0.2/.gdbinit
line to your configuration file "/builddir/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/builddir/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
(gdb) r
Starting program: /builddir/build/BUILD/ruby-3.0.2/miniruby -eProcess.kill\(\"SIGSEGV\",\$\$\)
Download failed: No route to host.  Continuing without debug info for /lib64/libz.so.1.
Download failed: No route to host.  Continuing without debug info for /lib64/libgmp.so.10.
Download failed: No route to host.  Continuing without debug info for /lib64/libcrypt.so.2.
Download failed: No route to host.  Continuing without debug info for /lib64/libm.so.6.
Download failed: No route to host.  Continuing without debug info for /lib64/libc.so.6.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7aa8810 in kill () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-1.fc35.ppc64le gmp-6.2.0-7.fc35.ppc64le libxcrypt-4.4.25-1.fc35.ppc64le zlib-1.2.11-30.fc35.ppc64le
(gdb) c
Continuing.
-e:1: [BUG] Segmentation fault at 0x590fb15c00000020
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [powerpc64le-linux]

-- Control frame information -----------------------------------------------
c:0003 p:---- s:0012 e:000011 CFUNC  :kill
c:0002 p:0015 s:0006 e:000005 EVAL   -e:1 [FINISH]
c:0001 p:0000 s:0003 E:0013f0 (none) [FINISH]

-- Ruby level backtrace information ----------------------------------------
-e:1:in `<main>'
-e:1:in `kill'

-- C level backtrace information -------------------------------------------
1267: Abbrev Number 106 not found

Program received signal SIGABRT, Aborted.
0x00007ffff7b09864 in __pthread_kill_internal () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff7b09864 in __pthread_kill_internal () from /lib64/libc.so.6
#1  0x00007ffff7aa848c in raise () from /lib64/libc.so.6
#2  0x00007ffff7a81404 in abort () from /lib64/libc.so.6
#3  0x000000010031fe70 in di_abbrev_table (abbrev_number=106, reader=0x100598660) at addr2line.c:1268
#4  di_find_abbrev (abbrev_number=106, reader=0x100598660) at addr2line.c:1279
#5  di_read_die (reader=0x100598660, die=0x100598570) at addr2line.c:1349
#6  0x0000000100337a3c in debug_info_read (traces=<optimized out>, offset=<optimized out>, lines=<optimized out>, num_traces=<optimized out>, reader=<optimized out>) at addr2line.c:1658
#7  fill_lines (num_traces=num_traces@entry=19, check_debuglink=check_debuglink@entry=1, objp=0x100599010, objp@entry=0x1005990a0, lines=lines@entry=0x1005e2c50, offset=0, offset@entry=-1, 
    traces=<optimized out>) at addr2line.c:1915
#8  0x0000000100338d9c in rb_dump_backtrace_with_lines.constprop.0 (num_traces=<optimized out>, traces=<optimized out>) at addr2line.c:2314
#9  0x000000010031e04c in rb_print_backtrace () at vm_dump.c:760
#10 0x00000001003353d8 in rb_vm_bugreport.constprop.0 (ctx=<optimized out>) at vm_dump.c:998
#11 0x00000001000cd884 in rb_bug_for_fatal_signal (default_sighandler=0x0, sig=<optimized out>, ctx=<optimized out>, fmt=0x100383b18 "Segmentation fault at %p") at error.c:786
#12 0x00000001002563c8 in sigsegv (sig=<optimized out>, info=0x10059a330, ctx=0x1005995b0) at signal.c:960
#13 <signal handler called>
#14 0x00007ffff7aa8810 in kill () from /lib64/libc.so.6
#15 0x000000010025adb0 in rb_f_kill (argc=<optimized out>, argv=0x7ffff78e0050) at signal.c:439
#16 0x00000001001fb488 in proc_rb_f_kill (c=<optimized out>, v=<optimized out>, _=<optimized out>) at process.c:8605
#17 0x00000001002e9358 in ractor_safe_call_cfunc_m1 (recv=<optimized out>, argc=<optimized out>, argv=<optimized out>, func=<optimized out>) at vm_insnhelper.c:2739
#18 0x00000001002f21a0 in vm_call_cfunc_with_frame (ec=0x100491ac0, reg_cfp=0x7fffffffe170, calling=<optimized out>) at vm_insnhelper.c:2929
#19 0x00000001002f4fb4 in vm_sendish (ec=0x100491ac0, reg_cfp=0x7ffff79dffa0, cd=0x1005c2eb0, block_handler=<optimized out>, method_explorer=<optimized out>) at vm_insnhelper.c:4530
#20 0x00000001002fa32c in vm_exec_core (ec=0x100491ac0, initial=<optimized out>) at insns.def:789
#21 0x0000000100315260 in rb_vm_exec (ec=0x100491ac0, mjit_enable_p=<optimized out>) at vm.c:2172
#22 0x00000001003170a0 in rb_iseq_eval_main (iseq=0x1004a9fb0) at vm.c:2420
#23 0x00000001000d7c4c in rb_ec_exec_node (ec=ec@entry=0x100491ac0, n=n@entry=0x1004a9fb0) at eval.c:317
#24 0x00000001000d7db4 in ruby_run_node (n=0x1004a9fb0) at eval.c:375
#25 0x000000010002afb8 in main (argc=<optimized out>, argv=<optimized out>) at ./main.c:50
(gdb) 

But probably still not the right output though. I'd still expect some backtrace reported.

Updated by xtkoba (Tee KOBAYASHI) 4 months ago

The message

1267: Abbrev Number 106 not found

means that addr2line.c could not find an abbrev table entry in .debug_abbrev referenced from .debug_info. Isn't it possible that either section is broken?

It might be helpful if can investigate those sections, in binary or objdump (from GNU Binutils) text output of

objdump -Wi miniruby
objdump -Wa miniruby

or llvm-dwarfdump output of

llvm-dwarfdump --debug-info miniruby
llvm-dwarfdump --debug-abbrev miniruby

Updated by vo.x (Vit Ondruch) 4 months ago

xtkoba (Tee KOBAYASHI) wrote in #note-21:

Isn't it possible that either section is broken?

Well, it should be DWARF5, which is pretty recent and on ppc64le, so there might be some surprises ...

It might be helpful if can investigate those sections, in binary or objdump (from GNU Binutils) text output of

The dumps should be available here: https://vondruch.fedorapeople.org/objdump.tar.gz

Updated by xtkoba (Tee KOBAYASHI) 4 months ago

vo.x (Vit Ondruch) wrote in #note-22:

The dumps should be available here: https://vondruch.fedorapeople.org/objdump.tar.gz

I'm afraid that this contains only .debug_info and not .debug_abbrev.

It seems that objdump can dump the .debug_info section without error, so maybe it is not broken. And of course it is probable that addr2line.c makes wrong assumptions on DWARF format.

Updated by vo.x (Vit Ondruch) 4 months ago

I have recreated and reuploaded the tarball. Same URL should work.

Updated by xtkoba (Tee KOBAYASHI) 4 months ago

Nothing strange found in .debug_abbrev, either.

A patch is attached to help ease debugging.

Updated by vo.x (Vit Ondruch) 4 months ago

(gdb) c
Continuing.
-e:1: [BUG] Segmentation fault at 0x590fb15c00000020
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [powerpc64le-linux]

-- Control frame information -----------------------------------------------
c:0003 p:---- s:0012 e:000011 CFUNC  :kill
c:0002 p:0015 s:0006 e:000005 EVAL   -e:1 [FINISH]
c:0001 p:0000 s:0003 E:0013f0 (none) [FINISH]

-- Ruby level backtrace information ----------------------------------------
-e:1:in `<main>'
-e:1:in `kill'

-- C level backtrace information -------------------------------------------
1273: debug_info 0x54 Abbrev Number 106 not found

Program received signal SIGABRT, Aborted.
0x00007ffff7b09864 in __pthread_kill_internal () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff7b09864 in __pthread_kill_internal () from /lib64/libc.so.6
#1  0x00007ffff7aa848c in raise () from /lib64/libc.so.6
#2  0x00007ffff7a81404 in abort () from /lib64/libc.so.6
#3  0x000000010031fe70 in di_abbrev_table (abbrev_number=106, reader=0x100598660) at addr2line.c:1274
#4  di_find_abbrev (abbrev_number=106, reader=0x100598660) at addr2line.c:1285
#5  di_read_die (reader=0x100598660, die=0x100598570) at addr2line.c:1355
#6  0x00000001003370cc in debug_info_read (traces=<optimized out>, offset=<optimized out>, lines=<optimized out>, num_traces=<optimized out>, reader=<optimized out>) at addr2line.c:1664
#7  fill_lines (num_traces=num_traces@entry=19, check_debuglink=check_debuglink@entry=1, objp=0x100599010, objp@entry=0x1005990a0, lines=lines@entry=0x1005e2c50, offset=<optimized out>, offset@entry=-1, 
    traces=<optimized out>) at addr2line.c:1929
#8  0x0000000100338dec in rb_dump_backtrace_with_lines.constprop.0 (num_traces=<optimized out>, traces=<optimized out>) at addr2line.c:2328
#9  0x000000010031e04c in rb_print_backtrace () at vm_dump.c:760
#10 0x0000000100335418 in rb_vm_bugreport.constprop.0 (ctx=<optimized out>) at vm_dump.c:998
#11 0x00000001000cd884 in rb_bug_for_fatal_signal (default_sighandler=0x0, sig=<optimized out>, ctx=<optimized out>, fmt=0x100383b58 "Segmentation fault at %p") at error.c:786
#12 0x0000000100256408 in sigsegv (sig=<optimized out>, info=0x10059a330, ctx=0x1005995b0) at signal.c:960
#13 <signal handler called>
#14 0x00007ffff7aa8810 in kill () from /lib64/libc.so.6
#15 0x000000010025adf0 in rb_f_kill (argc=<optimized out>, argv=0x7ffff78e0050) at signal.c:439
#16 0x00000001001fb4c8 in proc_rb_f_kill (c=<optimized out>, v=<optimized out>, _=<optimized out>) at process.c:8605
#17 0x00000001002e9358 in ractor_safe_call_cfunc_m1 (recv=<optimized out>, argc=<optimized out>, argv=<optimized out>, func=<optimized out>) at vm_insnhelper.c:2739
#18 0x00000001002f21a0 in vm_call_cfunc_with_frame (ec=0x100491ac0, reg_cfp=0x7fffffffe170, calling=<optimized out>) at vm_insnhelper.c:2929
#19 0x00000001002f4fb4 in vm_sendish (ec=0x100491ac0, reg_cfp=0x7ffff79dffa0, cd=0x1005c2eb0, block_handler=<optimized out>, method_explorer=<optimized out>) at vm_insnhelper.c:4530
#20 0x00000001002fa32c in vm_exec_core (ec=0x100491ac0, initial=<optimized out>) at insns.def:789
#21 0x0000000100315260 in rb_vm_exec (ec=0x100491ac0, mjit_enable_p=<optimized out>) at vm.c:2172
#22 0x00000001003170a0 in rb_iseq_eval_main (iseq=0x1004a9fb0) at vm.c:2420
#23 0x00000001000d7c4c in rb_ec_exec_node (ec=ec@entry=0x100491ac0, n=n@entry=0x1004a9fb0) at eval.c:317
#24 0x00000001000d7db4 in ruby_run_node (n=0x1004a9fb0) at eval.c:375
#25 0x000000010002afb8 in main (argc=<optimized out>, argv=<optimized out>) at ./main.c:50
(gdb) f 3
#3  0x000000010031fe70 in di_abbrev_table (abbrev_number=106, reader=0x100598660) at addr2line.c:1274
1274            abort();
(gdb) l
1269            while (a) {
1270                len++;
1271                a >>= 7;
1272            }
1273            fprintf(stderr,"%d: debug_info 0x%"PRIxPTR" Abbrev Number %"PRId64" not found\n",__LINE__, reader->p - reader->obj->debug_info.ptr - len, abbrev_number);
1274            abort();
1275        }
1276        return p;
1277    }
1278    
(gdb) 

Updated by xtkoba (Tee KOBAYASHI) 4 months ago

Byte 0x54 of the .debug_info is the 3rd byte of

    <52>   DW_AT_abstract_origin: <0x6a6c3a>

of the form DW_FORM_ref_addr which is in little endian 0x6a == 106 (and also the same value when interpreted as ULEB128). So apparently the executable misreads the .debug_info.

What feels weird to me is that my executable of miniruby has almost the same pattern as yours at the top portion of the .debug_info section, and yet does not cause any errors. Possibly addr2line.c is miscompiled in any way?

Updated by vo.x (Vit Ondruch) 4 months ago

Just FTR, these are the build options:

gcc -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -mcpu=power8 -mtune=power8 -fasynchronous-unwind-tables -fstack-clash-protection -fPIC -m64 -D_FORTIFY_SOURCE=2 -fstack-protector-strong -fno-strict-overflow -fvisibility=hidden -fexcess-precision=standard -DRUBY_EXPORT -I. -I.ext/include/powerpc64le-linux -I./include -I. -I./enc/unicode/12.1.0    -o addr2line.o -c addr2line.c

And the GCC version:

$ rpm -q gcc
gcc-11.2.1-1.fc35.ppc64le

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/ppc64le-redhat-linux/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: ppc64le-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-targets=powerpcle-linux --disable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl=/builddir/build/BUILD/gcc-11.2.1-20210728/obj-ppc64le-redhat-linux/isl-install --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --enable-secureplt --with-long-double-128 --with-cpu-32=power8 --with-tune-32=power8 --with-cpu-64=power8 --with-tune-64=power8 --build=ppc64le-redhat-linux --with-build-config=bootstrap-lto --enable-link-serialization=1
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.2.1 20210728 (Red Hat 11.2.1-1) (GCC) 

Updated by xtkoba (Tee KOBAYASHI) 4 months ago

The behavior seems as if the patch in #note-9 (https://bugs.ruby-lang.org/attachments/8974) were not applied. If I reverse-apply this patch, yes I get the following output:

1300: debug_info 0x54 Abbrev Number 101 not found
Aborted

Updated by vo.x (Vit Ondruch) 4 months ago

xtkoba (Tee KOBAYASHI) wrote in #note-29:

The behavior seems as if the patch in #note-9

Should it be applied? Based on the #note-10, #note-11 and #note-12, I thought that one is obsolete. That might be misunderstanding on my side, sorry. So let me add it into the mix (I have hig expectations now :D)

Updated by xtkoba (Tee KOBAYASHI) 4 months ago

I might have been misleading. The patch in #note-9 is necessary for correctly reading DW_FORM_ref_addr values in .debug_info sections, even if those values are kept uninterpreted by the patch in #note-11. Both patches have been merged in master already. The rest of the patches I posted here won't be necessary.

Updated by vo.x (Vit Ondruch) 4 months ago

  • Backport changed from 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN to 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: REQUIRED

xtkoba (Tee KOBAYASHI) wrote in #note-31:

I might have been misleading. The patch in #note-9 is necessary for correctly reading DW_FORM_ref_addr values in .debug_info sections, even if those values are kept uninterpreted by the patch in #note-11. Both patches have been merged in master already. The rest of the patches I posted here won't be necessary.

Ah, I am double checking what was committed on master and now I can see that the patch from #note-9 was indeed applied into master, but as part of the git|a9977ba2f9863e3fb1b2346589ebbca67d80536c which made the change invisible. Oh my :/ But this will be helpful for backporting, right? ;)

Nevertheless, I was able to build Ruby on ppc64le with LTO enabled. I'll try to polish the necessary changes tomorrow, send PR against Fedora and check if other arches still works.

Thx a lot for your hard work!

Updated by vo.x (Vit Ondruch) 3 months ago

xtkoba (Tee KOBAYASHI) wrote in #note-31:

I might have been misleading. The patch in #note-9 is necessary for correctly reading DW_FORM_ref_addr values in .debug_info sections, even if those values are kept uninterpreted by the patch in #note-11. Both patches have been merged in master already. The rest of the patches I posted here won't be necessary.

This is is the Fedora commit:

https://src.fedoraproject.org/rpms/ruby/c/1170d812fe0cd43aeddf7f15a63c2e6db3af8059

The two patches you have mentioned are enough to pass the test suite. I leave the rest to be backported eventually (BTW this was reported against Ruby 2.7 originally and it is likely still affected, but I don't think we are going to enable LTO for Ruby 2.7 at this stage, so the 2.7 backport is low priority for me).

So this is resolved from my POV unless you want to apply the ruby-addr2line-abbrev_table-r1.patch

Thx once again.

Updated by xtkoba (Tee KOBAYASHI) 3 months ago

The situation ruby-addr2line-abbrev_table-r1.patch assumes is not likely to happen in the real world. Let's leave it unapplied for now. Thank you.

Actions #35

Updated by jeremyevans0 (Jeremy Evans) 3 months ago

  • Status changed from Open to Closed

Updated by nagachika (Tomoyuki Chikanaga) 3 months ago

MEMO: Backporting 72317b333b85eed483ad00bcd4f40944019a7c13 depends on https://bugs.ruby-lang.org/issues/17585.

Actions

Also available in: Atom PDF