Project

General

Profile

Actions

Bug #19172

open

`ruby_thread_has_gvl_p` is innacurate sometimes -- document or change?

Bug #19172: `ruby_thread_has_gvl_p` is innacurate sometimes -- document or change?

Added by ivoanjo (Ivo Anjo) over 3 years ago. Updated 3 days ago.

Status:
Open
Assignee:
-
Target version:
-
ruby -v:
(All Ruby versions)
[ruby-core:111146]

Description

Howdy 👋! I work for Datadog on the ddtrace gem and I found a... sharp edge on the internal ruby_thread_has_gvl_p API.

I am aware that ruby_thread_has_gvl_p is documented an experimental API that is exported as a symbol but not present on the VM include files.

Background

In the ddtrace profiling component, we setup a signal handler and then periodically send SIGPROF signals to try to interrupt the running Ruby thread (e.g. the thread that is holding the global VM lock or equivalent).

In the signal handler, we need to perform some API calls which are not safe to do without the GVL. So we need to check if the signal handler got called in the thread that has the GVL.

The issue

int
ruby_thread_has_gvl_p(void)
{
    rb_thread_t *th = ruby_thread_from_native();

    if (th && th->blocking_region_buffer == 0) {
        return 1;
    }
    else {
        return 0;
    }
}

In its current implementation, ruby_thread_has_gvl_p only checks if the thread has a blocking_region_buffer or not. Unfortunately, this means that when called from a thread that lost the GVL but not due to blocking (e.g. via rb_thread_schedule()), it can still claim that a thread is holding the GVL when that is not the case.

I ran into this issue in https://github.com/DataDog/dd-trace-rb/pull/2415, and needed to find a workaround.

Next steps

Since this is an internal VM API, I'm not sure you'd want to change the current behavior, so I was thinking of perhaps two options:

  • Is it worth changing ruby_thread_has_gvl_p to be accurate in the case I've listed?

  • If not, would you accept a PR to document its current limitations, so that others don't run into the same issue I did?


Related issues 1 (0 open1 closed)

Related to Ruby - Feature #20877: Introduce (public) debug assertion for holding the GVL.ClosedActions

Updated by ivoanjo (Ivo Anjo) over 3 years ago Actions #1

  • Description updated (diff)

Updated by luke-gru (Luke Gruber) over 3 years ago Actions #2 [ruby-core:112174]

In doing some reading of the code, it's my understanding that (TH_SCHED(th)->running == th) would be a better way to tell if a thread has the GVL. Maybe I'm understanding it wrong though.

Updated by ivoanjo (Ivo Anjo) over 3 years ago Actions #3 [ruby-core:112178]

Yeah, that's my understanding, and what I'm using in that PR (although with a lot more complexity since I'm still trying to support older Rubies...).

Updated by Eregon (Benoit Daloze) 3 days ago Actions #4 [ruby-core:125516]

The implementation is still the same as of today:
https://github.com/ruby/ruby/blob/4d87d43b01dbb312eb1ff5fbbc6c9f33218d91a2/thread.c#L2100-L2115

Meanwhile, ruby_thread_has_gvl_p() became a public API since #20877 (commit).

So now we have a public C API with known incorrect semantics when used in a signal handler, what should we do then?

Updated by Eregon (Benoit Daloze) 2 days ago Actions #5

  • Related to Feature #20877: Introduce (public) debug assertion for holding the GVL. added
Actions

Also available in: PDF Atom