Project

General

Profile

Feature #19541

Updated by kjtsanaktsidis (KJ Tsanaktsidis) about 1 year ago

## What is being propsed? 

 Currently, when Ruby crashes with yjit generated code on the stack, `rb_print_backtrace()` is unable to actually show any frames underneath the yjit code. For example, if you send SIGSEGV to a Ruby process running yjit, this is what you see: 

 ``` 
 /ruby/miniruby(rb_print_backtrace+0xc) [0xaaaad0276884] /ruby/vm_dump.c:785 
 /ruby/miniruby(rb_vm_bugreport) /ruby/vm_dump.c:1093 
 /ruby/miniruby(rb_bug_for_fatal_signal+0xd0) [0xaaaad0075580] /ruby/error.c:813 
 /ruby/miniruby(sigsegv+0x5c) [0xaaaad01bedac] /ruby/signal.c:919 
 linux-vdso.so.1(__kernel_rt_sigreturn+0x0) [0xffff91a3e8bc] 
 /ruby/miniruby(map<(usize, yjit::backend::ir::Insn), (usize, yjit::backend::ir::Insn), yjit::backend::ir::{impl#17}::next_mapped::{closure_env#0}>+0x8c) [0xaaaad03b8b00] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/option.rs:929 
 /ruby/miniruby(next_mapped+0x3c) [0xaaaad0291dc0] src/backend/ir.rs:1225 
 /ruby/miniruby(arm64_split+0x114) [0xaaaad0287744] src/backend/arm64/mod.rs:359 
 /ruby/miniruby(compile_with_regs+0x80) [0xaaaad028bf84] src/backend/arm64/mod.rs:1106 
 /ruby/miniruby(compile+0xc4) [0xaaaad0291ae0] src/backend/ir.rs:1158 
 /ruby/miniruby(gen_single_block+0xe44) [0xaaaad02b1f88] src/codegen.rs:854 
 /ruby/miniruby(gen_block_series_body+0x9c) [0xaaaad03b0250] src/core.rs:1698 
 /ruby/miniruby(gen_block_series+0x50) [0xaaaad03b0100] src/core.rs:1676 
 /ruby/miniruby(branch_stub_hit_body+0x80c) [0xaaaad03b1f68] src/core.rs:2021 
 /ruby/miniruby({closure#0}+0x28) [0xaaaad02eb86c] src/core.rs:1924 
 /ruby/miniruby(do_call<yjit::core::branch_stub_hit::{closure_env#0}, *const u8>+0x98) [0xaaaad035ba3c] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:492 
 [0xaaaad035c9b4] 
 ``` 

 (n.b. - I compiled Ruby with `-fasynchronous-unwind-tables –rdynamic –g` in cflags to make sure gcc generates appropriate unwind info & keeps the symbol tables). 

 Likewise, if you attach gdb to a Ruby process with yjit enabled, gdb can't show thread backtraces through yjit-generated code either. 

 My proposal is that YJIT generate sufficient unwinding and debug information on all platforms to allow both `rb_print_backtrace()` and the platform's debugger (gdb/lldb/WinDbg) to show: 

 * Full stack traces all the way back to `main`. That is, it should be possible to see frames _underneath_ `[0xaaaad035c9b4]` from the backtrace above. 
 * Names for the dynamically generated yjit blocks (e.g. instead of `[0xaaaad035c9b4]`, we should see something like `yjit$$name_of_ruby_method`, where `name_of_ruby_method` is the `label` for the iseq this is JIT'd code for). 

 ## Motivation 

 I have a few motivations for wanting this. Firstly, I feel this functionality is independently useful. When Ruby crashes, the more information we can get, the more likely we are to find the root cause. Likewise, the same principle applies to debugging with gdb - you can get a fuller understanding of what the process is doing if you see the whole stack. 

 I have often found attaching gdb to the Ruby interpreter helps in understanding problems in Ruby code or C extensions and is something I do relatively frequently; yjit breaking that will definitely be inconvenient for me! 

 ## Implementation 

 I have a draft implementation here on how I'd implement this: https://github.com/ruby/ruby/pull/7567. It's currently missing tests & platform support (it only works on Linux aarch64). Also, it implements unwind info generation, so unwinding can work _through_ yjit code, but it does not currently emit symbols to give _names_ to those yjit frames. 

 My PR contains a document which explains how the Linux interfaces for registering unwind info for JIT'd code work, so I won't duplicate that information here. 

 The biggest implementation question I had is around the use of Rust crates. Currently, I prototyped my implementation using the gimli & object crates, for generating DWARF info and ELF binaries. However, the yjit build does purposefully does not use cargo & external crates for release builds. There are a few different ways we could go here: 

 * Don't use the gimli & object crates; instead, re-implement all debug info & object file generation code in yjit. 
 * Don't use the crates; instead, link againt C libraries to provide this functionality & call them from Rust (perhaps some combination of libelf, libdw, libbfd, or llvm might do what we need) 
 * Use cargo after all for the release build & download the crates at build-time 
 * Use cargo for the release build, but vendor everything, so the build doesn't need to download anything 
 * Only make unwind info generation available in dev mode where cargo is used, and so mark the gimli/object dependencies as optional in Cargo.toml. 

 We'd need to decide on one of these approaches for this proposal to work. I don't really have a strong sense of the pros/cons of each. 

 (Side note - my PR actually depends on a _fork_ of gimli - I've been discussing adding the needed interfaces upstream here: https://github.com/gimli-rs/gimli/issues/648). 

 ## Benchmarks 

 I ran the yit-bench suite on my branch and compared it to Ruby master: 

 * My branch: https://gist.github.com/KJTsanaktsidis/5741a9f64e5cd75cdf5fedd846091a4f 
 * Ruby master: https://gist.github.com/KJTsanaktsidis/592d3ebcf98f6745dfa3efbd30a25acf 

 This is a (very simple) comparison: 

 ``` 
 -------------- ------------ ------------ --------------- 
 bench            yjit (ms)      branch (ms)    branch/yjit (%) 
 activerecord     97.5           98.5           101.03% 
 hexapdf          2415.3         2458.2         101.78% 
 liquid-c         61.9           63.1           101.94% 
 liquid-render    135.3          135.0          99.78% 
 mail             104.6          105.5          100.86% 
 psych-load       1887.1         1922.0         101.85% 
 railsbench       1544.4         1556.0         100.75% 
 ruby-lsp         88.4           89.5           101.24% 
 sequel           147.5          151.1          102.44% 
 binarytrees      303            305.6          100.86% 
 chunky_png       1075.8         1079.4         100.33% 
 erubi            392.9          392.3          99.85% 
 erubi_rails      14.7           14.7           100.00% 
 etanni           792.3          791.4          99.89% 
 fannkuchredux    3815.9         3813.6         99.94% 
 lee              1030.2         1039.2         100.87% 
 nbody            49.2           49.3           100.20% 
 optcarrot        4142           4143.3         100.03% 
 ruby-json        2860.7         2874.0         100.46% 
 rubykon          7906.6         7904.2         99.97% 
 30k_ifelse       348.7          345.4          99.05% 
 30k_methods      828.6          831.8          100.39% 
 cfunc_itself     28.8           28.9           100.35% 
 fib              34.4           34.5           100.29% 
 getivar          115.5          109.7          94.98% 
 keyword_args     37.7           38.0           100.80% 
 respond_to       26             26.1           100.38% 
 setivar          33.8           33.5           99.11% 
 setivar_object 208.7          194.3          93.10% 
 str_concat       52.6           52.2           99.24% 
 throw            23.8           24.1           101.26% 
 -------------- ------------ ------------ --------------- 
 ``` 

 It seems like the performance impact of generating and registering the debug info is marginal.

Back