Bug #12095: ruby_vm_at_exit can sometime cause a crash. - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #12095

closed

ruby_vm_at_exit can sometime cause a crash.

Bug #12095: ruby_vm_at_exit can sometime cause a crash.

Added by nicolasnoble (Nicolas Noble) over 10 years ago. Updated about 10 years ago.

Status:

Closed

Assignee:

Target version:

ruby -v:

Backport:

2.1: REQUIRED, 2.2: DONE, 2.3: DONE

[ruby-core:73908]

Description

This behavior has been seen erratically, but one of our users got it to reproduce almost systematically. We didn't managed to understand what made his system special that it would get that crash to reproduce so well.

Here's one of the reports:

https://gist.github.com/blowmage/7ebe774039013bc8c990

The current workaround to that one (alongside a few other comments) is done here: https://github.com/grpc/grpc/pull/5337/files

Note that removing the call to ruby_vm_at_exit makes everything load fine. Also note that the removed comment from that pull request is wrong: this has been happening on versions of Ruby other than 2.0.

It's interesting to note from the backtrace information that this is happening during a garbage collection. The fact that a garbage collection happens at that exact moment is probably the reason that bug is so difficult to reproduce. Perhaps a modified version of ruby might help reproducing it. Or very specific garbage collector settings.

The fault address (0x88) seems to indicate that a NULL pointer into a struct was being dereferenced.

Disassembling the corresponding execution address seems to point at a crash inside obj_info, from the first line of gc_writebarrier_incremental, but this is after a very quick inspection of the code, so don't take my word from it.

This problem has been repoted to us on Ruby 2.0.0, Ruby 2.2.0, Ruby 2.2.3, Ruby 2.3.0, at least.

Files

Download all files

at_exit_fix.patch (2.08 KB) at_exit_fix.patch		evanphx (Evan Phoenix), 03/15/2016 02:47 AM
at_exit_fix.patch (2.18 KB) at_exit_fix.patch		evanphx (Evan Phoenix), 03/15/2016 04:21 PM

Updated by evanphx (Evan Phoenix) over 10 years ago Actions
Copy link
#1 [ruby-core:74312]

I'm hitting this as well, and looking over the code in question on 2.3.0, I wondering if the problem is that the at_exit pseudo-object is actually allocated within the body of rb_vm_t. It's address is taken and passed to rb_ary_push, which perform OBJ_WRITE. That's where wb_incremental is invoked from.

Because the mark bits are not located with the object header anymore, the mark bitmap is consulted but the position in the mark bitmap is calculated against the address of at_exit, which isn't located on the main ruby heap at all!

The path to the bad pointer, given X as the address of at_exit within rb_vm_t is: RVALUE_BLACK_P(X) => RVALUE_MARKED(X) => RVALUE_MARK_BITMAP(X) => GET_HEAP_MARK_BITS(X) => GET_HEAP_PAGE(X) => GET_PAGE_HEADER(X) => GET_PAGE_BODY(X) => ((struct heap_page_body *)((bits_t)(x) & ~(HEAP_ALIGN_MASK))).

The value returned by that above sequence is supposed to return a page header that can itself be dereferenced to find the mark bits. But because the at_exit is in a random place, the page header is basically random bytes, and thus the deference crashes.

Updated by evanphx (Evan Phoenix) over 10 years ago Actions
Copy link
#2 [ruby-core:74319]

File at_exit_fix.patch at_exit_fix.patch added

Attached is a patch that fixes this issue by replacing the troublesome usage of a VALUE to store the at_exit functions with a simple linked list. This patch was created against the ruby_2_3 branch. It should apply cleanly to most branches because of it's small size.

Updated by nobu (Nobuyoshi Nakada) over 10 years ago Actions
Copy link
#3 [ruby-core:74324]

Thank you for the investigation and the patch, I've missed this.

you should:

free the list in ruby_vm_run_at_exit_hooks,
use the argument vm instead of GET_VM(), and
replace the existing typedef of rb_vm_t with mere struct, as multiple typedefs are not allowed in C, IIRC.

Updated by evanphx (Evan Phoenix) over 10 years ago Actions
Copy link
#4 [ruby-core:74342]

File at_exit_fix.patch at_exit_fix.patch added

Thank you for the feedback nobu!

Attached is an updated version of the patch with the changes.

Updated by naruse (Yui NARUSE) over 10 years ago Actions
Copy link
#5 [ruby-core:74796]

Backport changed from 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN, 2.3: UNKNOWN to 2.1: REQUIRED, 2.2: REQUIRED, 2.3: REQUIRED

Updated by nobu (Nobuyoshi Nakada) over 10 years ago Actions
Copy link
#6 [ruby-core:74806]

at_exit functions should be called in the inverse order, LIFO.

Updated by nobu (Nobuyoshi Nakada) over 10 years ago Actions
Copy link
#7

Status changed from Open to Closed

Applied in changeset r54484.

at_exit list

vm_core.h (rb_vm_struct): make at_exit a single linked list but
not RArray, not to mark the registered functions by the write
barrier. based on the patches by Evan Phoenix.
[ruby-core:73908] [Bug #12095]

Updated by naruse (Yui NARUSE) about 10 years ago Actions
Copy link
#8 [ruby-core:74999]

Backport changed from 2.1: REQUIRED, 2.2: REQUIRED, 2.3: REQUIRED to 2.1: REQUIRED, 2.2: REQUIRED, 2.3: DONE

ruby_2_3 r54633 merged revision(s) 54484.

Updated by usa (Usaku NAKAMURA) about 10 years ago Actions
Copy link
#9 [ruby-core:75091]

Backport changed from 2.1: REQUIRED, 2.2: REQUIRED, 2.3: DONE to 2.1: REQUIRED, 2.2: DONE, 2.3: DONE

ruby_2_2 r54683 merged revision(s) 54484.

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Bug #12095

ruby_vm_at_exit can sometime cause a crash.

Updated by evanphx (Evan Phoenix) over 10 years ago Actions
Copy link
#1 [ruby-core:74312]

Updated by evanphx (Evan Phoenix) over 10 years ago Actions
Copy link
#2 [ruby-core:74319]

Updated by nobu (Nobuyoshi Nakada) over 10 years ago Actions
Copy link
#3 [ruby-core:74324]

Updated by evanphx (Evan Phoenix) over 10 years ago Actions
Copy link
#4 [ruby-core:74342]

Updated by naruse (Yui NARUSE) over 10 years ago Actions
Copy link
#5 [ruby-core:74796]

Updated by nobu (Nobuyoshi Nakada) over 10 years ago Actions
Copy link
#6 [ruby-core:74806]

Updated by nobu (Nobuyoshi Nakada) over 10 years ago Actions
Copy link
#7

Updated by naruse (Yui NARUSE) about 10 years ago Actions
Copy link
#8 [ruby-core:74999]

Updated by usa (Usaku NAKAMURA) about 10 years ago Actions
Copy link
#9 [ruby-core:75091]

Project

General

Profile

Ruby

Custom queries

Bug #12095

ruby_vm_at_exit can sometime cause a crash.

Updated by evanphx (Evan Phoenix) over 10 years ago ActionsCopy link #1 [ruby-core:74312]

Updated by evanphx (Evan Phoenix) over 10 years ago ActionsCopy link #2 [ruby-core:74319]

Updated by nobu (Nobuyoshi Nakada) over 10 years ago ActionsCopy link #3 [ruby-core:74324]

Updated by evanphx (Evan Phoenix) over 10 years ago ActionsCopy link #4 [ruby-core:74342]

Updated by naruse (Yui NARUSE) over 10 years ago ActionsCopy link #5 [ruby-core:74796]

Updated by nobu (Nobuyoshi Nakada) over 10 years ago ActionsCopy link #6 [ruby-core:74806]

Updated by nobu (Nobuyoshi Nakada) over 10 years ago ActionsCopy link #7

Updated by naruse (Yui NARUSE) about 10 years ago ActionsCopy link #8 [ruby-core:74999]

Updated by usa (Usaku NAKAMURA) about 10 years ago ActionsCopy link #9 [ruby-core:75091]

Updated by evanphx (Evan Phoenix) over 10 years ago Actions
Copy link
#1 [ruby-core:74312]

Updated by evanphx (Evan Phoenix) over 10 years ago Actions
Copy link
#2 [ruby-core:74319]

Updated by nobu (Nobuyoshi Nakada) over 10 years ago Actions
Copy link
#3 [ruby-core:74324]

Updated by evanphx (Evan Phoenix) over 10 years ago Actions
Copy link
#4 [ruby-core:74342]

Updated by naruse (Yui NARUSE) over 10 years ago Actions
Copy link
#5 [ruby-core:74796]

Updated by nobu (Nobuyoshi Nakada) over 10 years ago Actions
Copy link
#6 [ruby-core:74806]

Updated by nobu (Nobuyoshi Nakada) over 10 years ago Actions
Copy link
#7

Updated by naruse (Yui NARUSE) about 10 years ago Actions
Copy link
#8 [ruby-core:74999]

Updated by usa (Usaku NAKAMURA) about 10 years ago Actions
Copy link
#9 [ruby-core:75091]