Segfault when calling user signal handlers during VM shutdown

ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]


Howdy 👋! I work for Datadog on the ddtrace gem . I've found this issue while investigating a customer crash report.


The original issue was found in a production app. A number of things need to be in play to cause it.

The ruby-odbc gem provides a way of accessing databases through the ODBC API. It wraps a database connection as a Data object, with a free function that, prior to freeing the native resources, disconnects from the database if the connection was still active.

Because disconnecting from the database is a blocking operation, the gem (reasonably, in my opinion), releases the global VM lock before disconnecting.

The trigger for the crash is:

  1. The app in question used puma, and puma installs a Signal.trap('TERM')
  2. The database object was still connected when the app started to shut down
  3. A VM shutdown starts...
  4. Half-way through shutdown, the VM received a SIGTERM signal, and queued it for processing
  5. The VM calls the free function on all objects
  6. The ruby-odbc gem sees there's an active database connection, and tries to release the GVL to call the blocking disconnect
  7. Before releasing the GVL, the VM checks for pending interruptions
  8. The VM tries to run the Ruby-level signal handler method half-way through VM shutdown, when you can no longer run Ruby code
  9. Segfault

How to reproduce (Ruby version & script)

I was able to reproduce on a minimal example on Ruby 3.2.2 (ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]) and recent master (ruby 3.3.0dev (2023-08-17T07:30:01Z master d26b015e83) [x86_64-linux]).

I've put the test-case up on github as well, but here's the important bits:


require 'signal_bug_testcase'

Signal.trap("TERM") { puts "Hello, world" }



#include <ruby.h>
#include <ruby/thread.h>
#include <signal.h>
#include <unistd.h>

typedef struct { int dummy; } BugTestcase;

void *test_nogvl(void *unused) {
    fprintf(stderr, "GVL released!\n");
    return NULL;

static void bug_testcase_free(void* ptr) {
    fprintf(stderr, "Free getting called! Sending signal...\n");
    kill(getpid(), SIGTERM);
    fprintf(stderr, "SIGTERM signal queued, trying to release GVL...\n");
    rb_thread_call_without_gvl(test_nogvl, NULL, NULL, NULL);
    fprintf(stderr, "After releasing GVL!\n");

const rb_data_type_t bug_testcase_data_type = {
    .wrap_struct_name = "SignalBugTestcase",
    .function = { NULL, bug_testcase_free, NULL },

VALUE bug_testcase_alloc(VALUE klass) {
    BugTestcase* obj = calloc(1, sizeof(BugTestcase));
    return TypedData_Make_Struct(klass, BugTestcase, &bug_testcase_data_type, obj);

void Init_signal_bug_testcase(void) {
    VALUE cBugTestcase = rb_define_class("SignalBugTestcase", rb_cObject);

    rb_define_alloc_func(cBugTestcase, bug_testcase_alloc);

Expectation and result

No segfault happens.

Interestingly, on Ruby 2.7, the VM exits half-way through but doesn't always segfault, but running it a few times always triggers the issue. On 3.2+ it crashes every time for me.

I suspect the right thing here is to no longer accept/try to run any Ruby-level signal handlers after VM shutdown starts.

Here's what I see with this test-case:

$ bundle exec ruby lib/signal-bug-testcase.rb 
Free getting called! Sending signal...
SIGTERM signal queued, trying to release GVL...
lib/signal-bug-testcase.rb:3: [BUG] Segmentation fault at 0x0000000000000007
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]

Updated by nobu (Nobuyoshi Nakada) 3 months ago

While what I saw was failure at allocation of an internal array to call the signal handler instead of segfault, it can't work anyway.
This is because, in VM finalization phase, any object allocation is no longer possible and even exception can't raise.

I suspect the right thing here is to no longer accept/try to run any Ruby-level signal handlers after VM shutdown starts.


Updated by ivoanjo (Ivo Anjo) 3 months ago

Thanks for the quick fix! :)


