Bug #19140
closedrb_vm_insn_addr2insn: invalid insn address
Description
I recently upgraded my mastodon installation to 4.0.2 (and possibly also updated a minor ruby version, not sure) and since then I'm experiencing regular crashes with error messages like:
Nov 21 15:25:18 manuelbaerenz puma[1006220]: /nix/store/1m8li3dkfj33kk4xhx6bhjbqi6y9bq6x-mastodon-gems-4.0.2/lib/ruby/gems/3.0.0/gems/activesupport-6.1.7/lib/active_support/json/encoding.rb:58: [BUG] rb_vm_insn_addr2insn: invalid insn address: 0x00007f0646b56e20
An example log with long stacktrace and coredump can be found in https://github.com/mastodon/mastodon/files/10057376/mastodon.log
I previously reported this error here where I learned that this must be a Ruby bug:
https://github.com/rails/rails/issues/46540
https://github.com/mastodon/mastodon/discussions/21311
Updated by alanwu (Alan Wu) over 1 year ago
Thanks for the report. The stack trace shows that it's crashing
during GC, which means some bytecode object on the heap is corrupt:
Stack trace of thread 1006223:
#0 0x00007f0646646bc7 __pthread_kill_implementation (libc.so.6 + 0x8abc7)
#1 0x00007f06465f9b46 raise (libc.so.6 + 0x3db46)
#2 0x00007f06465e44b5 abort (libc.so.6 + 0x284b5)
#3 0x00007f064690dbaa rb_bug (libruby-3.0.4.so.3.0 + 0x39baa)
#4 0x00007f064690ea03 rb_vm_insn_addr2insn2.cold (libruby-3.0.4.so.3.0 + 0x3aa03)
#5 0x00007f06469fc9a7 rb_iseq_mark (libruby-3.0.4.so.3.0 + 0x1289a7)
#6 0x00007f06469cfb6b gc_marks_continue (libruby-3.0.4.so.3.0 + 0xfbb6b)
#7 0x00007f06469cff6b newobj_slowpath_wb_protected (libruby-3.0.4.so.3.0 + 0xfbf6b)
Unfortunately, since some time has eslapsed since the corruption has happened in the
process, the logs alone don't tell the whole story. These type of issues are hard to
debug without a way to reproduce the crash on our end. We'll keep an eye out for similar
reports, but I'm afraid there might not be a quick fix.
It looks like Mastadon might gain Ruby 3.1 support soon. I would suggest upgrading
to 3.1 when it's available to see if the issue still occurs.
Updated by turion (Manuel Bärenz) over 1 year ago
That's valuable insight! Probably it is a volatile memory fault then. I restarted the physical machine now and I haven't experienced the issue since.
Updated by jeremyevans0 (Jeremy Evans) over 1 year ago
- Status changed from Open to Closed