Feature #19541
closedProposal: Generate frame unwinding info for YJIT code
Description
What is being propsed?¶
Currently, when Ruby crashes with yjit generated code on the stack, rb_print_backtrace()
is unable to actually show any frames underneath the yjit code. For example, if you send SIGSEGV to a Ruby process running yjit, this is what you see:
/ruby/miniruby(rb_print_backtrace+0xc) [0xaaaad0276884] /ruby/vm_dump.c:785
/ruby/miniruby(rb_vm_bugreport) /ruby/vm_dump.c:1093
/ruby/miniruby(rb_bug_for_fatal_signal+0xd0) [0xaaaad0075580] /ruby/error.c:813
/ruby/miniruby(sigsegv+0x5c) [0xaaaad01bedac] /ruby/signal.c:919
linux-vdso.so.1(__kernel_rt_sigreturn+0x0) [0xffff91a3e8bc]
/ruby/miniruby(map<(usize, yjit::backend::ir::Insn), (usize, yjit::backend::ir::Insn), yjit::backend::ir::{impl#17}::next_mapped::{closure_env#0}>+0x8c) [0xaaaad03b8b00] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/option.rs:929
/ruby/miniruby(next_mapped+0x3c) [0xaaaad0291dc0] src/backend/ir.rs:1225
/ruby/miniruby(arm64_split+0x114) [0xaaaad0287744] src/backend/arm64/mod.rs:359
/ruby/miniruby(compile_with_regs+0x80) [0xaaaad028bf84] src/backend/arm64/mod.rs:1106
/ruby/miniruby(compile+0xc4) [0xaaaad0291ae0] src/backend/ir.rs:1158
/ruby/miniruby(gen_single_block+0xe44) [0xaaaad02b1f88] src/codegen.rs:854
/ruby/miniruby(gen_block_series_body+0x9c) [0xaaaad03b0250] src/core.rs:1698
/ruby/miniruby(gen_block_series+0x50) [0xaaaad03b0100] src/core.rs:1676
/ruby/miniruby(branch_stub_hit_body+0x80c) [0xaaaad03b1f68] src/core.rs:2021
/ruby/miniruby({closure#0}+0x28) [0xaaaad02eb86c] src/core.rs:1924
/ruby/miniruby(do_call<yjit::core::branch_stub_hit::{closure_env#0}, *const u8>+0x98) [0xaaaad035ba3c] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:492
[0xaaaad035c9b4]
(n.b. - I compiled Ruby with -fasynchronous-unwind-tables –rdynamic –g
in cflags to make sure gcc generates appropriate unwind info & keeps the symbol tables).
Likewise, if you attach gdb to a Ruby process with yjit enabled, gdb can't show thread backtraces through yjit-generated code either.
My proposal is that YJIT generate sufficient unwinding and debug information on all platforms to allow both rb_print_backtrace()
and the platform's debugger (gdb/lldb/WinDbg) to show:
- Full stack traces all the way back to
main
. That is, it should be possible to see frames underneath[0xaaaad035c9b4]
from the backtrace above. - Names for the dynamically generated yjit blocks (e.g. instead of
[0xaaaad035c9b4]
, we should see something likeyjit$$name_of_ruby_method
, wherename_of_ruby_method
is thelabel
for the iseq this is JIT'd code for).
Motivation¶
I have a few motivations for wanting this. Firstly, I feel this functionality is independently useful. When Ruby crashes, the more information we can get, the more likely we are to find the root cause. Likewise, the same principle applies to debugging with gdb - you can get a fuller understanding of what the process is doing if you see the whole stack.
I have often found attaching gdb to the Ruby interpreter helps in understanding problems in Ruby code or C extensions and is something I do relatively frequently; yjit breaking that will definitely be inconvenient for me!
Implementation¶
I have a draft implementation here on how I'd implement this: https://github.com/ruby/ruby/pull/7567. It's currently missing tests & platform support (it only works on Linux aarch64). Also, it implements unwind info generation, so unwinding can work through yjit code, but it does not currently emit symbols to give names to those yjit frames.
My PR contains a document which explains how the Linux interfaces for registering unwind info for JIT'd code work, so I won't duplicate that information here.
The biggest implementation question I had is around the use of Rust crates. Currently, I prototyped my implementation using the gimli & object crates, for generating DWARF info and ELF binaries. However, the yjit build does purposefully does not use cargo & external crates for release builds. There are a few different ways we could go here:
- Don't use the gimli & object crates; instead, re-implement all debug info & object file generation code in yjit.
- Don't use the crates; instead, link againt C libraries to provide this functionality & call them from Rust (perhaps some combination of libelf, libdw, libbfd, or llvm might do what we need)
- Use cargo after all for the release build & download the crates at build-time
- Use cargo for the release build, but vendor everything, so the build doesn't need to download anything
- Only make unwind info generation available in dev mode where cargo is used, and so mark the gimli/object dependencies as optional in Cargo.toml.
We'd need to decide on one of these approaches for this proposal to work. I don't really have a strong sense of the pros/cons of each.
(Side note - my PR actually depends on a fork of gimli - I've been discussing adding the needed interfaces upstream here: https://github.com/gimli-rs/gimli/issues/648).
Benchmarks¶
I ran the yit-bench suite on my branch and compared it to Ruby master:
- My branch: https://gist.github.com/KJTsanaktsidis/5741a9f64e5cd75cdf5fedd846091a4f
- Ruby master: https://gist.github.com/KJTsanaktsidis/592d3ebcf98f6745dfa3efbd30a25acf
This is a (very simple) comparison:
-------------- ------------ ------------ ---------------
bench yjit (ms) branch (ms) branch/yjit (%)
activerecord 97.5 98.5 101.03%
hexapdf 2415.3 2458.2 101.78%
liquid-c 61.9 63.1 101.94%
liquid-render 135.3 135.0 99.78%
mail 104.6 105.5 100.86%
psych-load 1887.1 1922.0 101.85%
railsbench 1544.4 1556.0 100.75%
ruby-lsp 88.4 89.5 101.24%
sequel 147.5 151.1 102.44%
binarytrees 303 305.6 100.86%
chunky_png 1075.8 1079.4 100.33%
erubi 392.9 392.3 99.85%
erubi_rails 14.7 14.7 100.00%
etanni 792.3 791.4 99.89%
fannkuchredux 3815.9 3813.6 99.94%
lee 1030.2 1039.2 100.87%
nbody 49.2 49.3 100.20%
optcarrot 4142 4143.3 100.03%
ruby-json 2860.7 2874.0 100.46%
rubykon 7906.6 7904.2 99.97%
30k_ifelse 348.7 345.4 99.05%
30k_methods 828.6 831.8 100.39%
cfunc_itself 28.8 28.9 100.35%
fib 34.4 34.5 100.29%
getivar 115.5 109.7 94.98%
keyword_args 37.7 38.0 100.80%
respond_to 26 26.1 100.38%
setivar 33.8 33.5 99.11%
setivar_object 208.7 194.3 93.10%
str_concat 52.6 52.2 99.24%
throw 23.8 24.1 101.26%
-------------- ------------ ------------ ---------------
It seems like the performance impact of generating and registering the debug info is marginal.