Project

General

Profile

Actions

Bug #20197

closed

Postponed job invocations are significantly reduced in Ruby 3.3

Added by osyoyu (Daisuke Aritomo) 4 months ago. Updated about 2 months ago.

Status:
Closed
Target version:
-
ruby -v:
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux]
[ruby-core:116347]

Description

The number of postponed job invocations has been significantly reduced in Ruby 3.3.
While my understanding is that postponed jobs provide no guarantee of how soon registered callbacks will fire, I believe the current rate is too low for practical usage, especially for profilers such as StackProf.

A git bisect led me to https://github.com/ruby/ruby/commit/1f0304218cf00e05a4a126196676ba221ebf91f6 which obviously seems to be related, but I'm not sure why.

Repro

Expected

The job fires (nearly) immediately after being triggered.

In the following example, the job is triggered every 100 ms.

% ruby bin/test.rb # runs for 3 seconds
count: 1
count: 2
(snip)
count: 29

Actual

The job gets fired only once.

% ruby bin/test.rb
count: 1

Code

require 'mycext'

time = Time.now
th = Thread.new do
  loop do
    sleep 0.01
    break if Time.now - time > 3 # run for 3 seconds
  end
end
th.join
#include <pthread.h>
#include <stdio.h>
#include <time.h>

#include "ruby.h"
#include "ruby/debug.h"

int called_count;

void
postponed_job(void *ptr)
{
  called_count++;
  printf("count: %d\n", called_count);
}

_Noreturn void *
pthread_main(void *_)
{
  while (1) {
    rb_postponed_job_register_one(0, postponed_job, NULL);

    // Sleep for 100 ms
    struct timespec ts;
    ts.tv_sec = 0;
    ts.tv_nsec = 100 * 1000 * 1000;
    nanosleep(&ts, NULL);
  }
}

RUBY_FUNC_EXPORTED void
Init_mycext(void)
{
  called_count = 0;

  pthread_t pthread;
  pthread_create(&pthread, NULL, pthread_main, NULL);
}

Updated by kjtsanaktsidis (KJ Tsanaktsidis) 4 months ago

  • Assignee set to kjtsanaktsidis (KJ Tsanaktsidis)

I hacked at the postponed job infrastructure pretty extensively in 3.3, I’ll have a look at this bug in the next few days.

Thank you for your report!

Actions #2

Updated by osyoyu (Daisuke Aritomo) 4 months ago

  • Description updated (diff)

Updated by byroot (Jean Boussier) 4 months ago

especially for profilers such as StackProf.

Not that it makes this less of a bug, but note that recent versions of StackProf (and more mordern profilers like Vernier) avoid relying on postponed jobs, because no matter how frequent they may be triggered, that always give biased results.

Ref: https://github.com/tmm1/stackprof/blob/ebdd3af48a2c4ddf35b0a73858ea72dcdb551188/lib/stackprof.rb#L7-L18

Updated by kjtsanaktsidis (KJ Tsanaktsidis) 4 months ago

OK, I investigated this a bit. The reason your example only runs the callback once is that the "is-an-interrupt-pending?" flag is a per-thread (technically per execution context) state. Normally, the rb_postponed_job_register_one (and the new rb_postponed_job_trigger function as well) will set the interrupt-pending flag on the currently thread, which is what you want when running in a signal handler. However, when you're running from a non-ruby thread, you need to pick a ruby thread to set the flag on.

The previous implementation before https://github.com/ruby/ruby/commit/1f0304218cf00e05a4a126196676ba221ebf91f6 would pick vm->ractor.main_ractor->threads.running_ec, which is actually the previous thread to run if there is no currently running thread. Now, however, we always pick the main thread (this seems to be because the semantics of running_ec are different when using the new M:N threading model).

Your example has the main thread sleeping, so it will never check the interrupt-pending flag. Thus, now it doesn't run the callbacks (until the other thread is done).

This PR will I think fix the issue with your reproduction: https://github.com/ruby/ruby/pull/9633.

Are you able to share some more details about your use-case for postponed_job by the way? I ask, because even with this fix (and restoring the Ruby 3.2 behaviour), there are still some race conditions possible (e.g. running_ec might have just entered a long sleep). If we want to make it possible to robustly run postponed jobs "soon" from non-Ruby threads, I think we would need:

  • Add a VM-global interrupt flag
  • Have rb_thread_check_ints check the VM-global flag as well as its own EC-local flag,
  • Have rb_postponed_job_trigger see if we're running on a Ruby thread, and if not, set the VM-global interrupt flag instead of the EC-local one.

This seems like a fair bit of extra complexity to carry around though if there isn't a use-case which requires it.

Updated by osyoyu (Daisuke Aritomo) 4 months ago

Thank you for your investigation and thorough explanation! I now understand what was happening.

This PR will I think fix the issue with your reproduction: https://github.com/ruby/ruby/pull/9633.

Applying this patch worked perfectly on my environment too.

Are you able to share some more details about your use-case for postponed_job by the way?

I was actually working on my own Ruby profiler https://github.com/osyoyu/pf2 , and was using postponed_job to run rb_profile_thread_frames(), just like StackProf's original implementation.
The rb_postponed_job_trigger call could be found in TimerThreadScheduler.
https://github.com/osyoyu/pf2/blob/v0.2.0/ext/pf2/src/timer_thread_scheduler.rs#L82

(As @byroot (Jean Boussier) pointed out, I'm aware that postponed_job isn't strictly required here; I could send a signal to the target pthread, and call rb_profile_thread_frames() in the signal handler. But that's probably an another story)

This seems like a fair bit of extra complexity to carry around though if there isn't a use-case which requires it.

From the perspective of a sampling profiler developer, I think it's safe not to implement this technique. I'm triggering postponed jobs periodically (e.g. every 10 ms or so), and one trigger flag being effectively "ignored" (due to be getting registered in a thread which just entered a long sleep) is negligible, as long as subsequent triggers get fired.

Actions #6

Updated by kjtsanaktsidis (KJ Tsanaktsidis) 4 months ago

  • Status changed from Open to Closed

Applied in changeset git|ef276858d9295208add48e27208c69184dc50472.


Trigger postponed jobs on running_ec if that is available

Currently, any postponed job triggered from a non-ruby thread gets sent
to the main thread, but if the main thread is sleeping it won't be
checking ints. Instead, we should try and interrupt running_ec if that's
possible, and only fall back to the main thread if it's not.

[Bug #20197]

Updated by byroot (Jean Boussier) 4 months ago

  • Backport changed from 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN to 3.0: DONTNEED, 3.1: DONTNEED, 3.2: DONTNEED, 3.3: REQUIRED

@kjtsanaktsidis (KJ Tsanaktsidis) I'm assuming this should be backported to 3.3, but if I'm wrong you can change it to DONTNEED

Updated by kjtsanaktsidis (KJ Tsanaktsidis) 4 months ago

Ah, thank you - yes, this should be backported to 3.3 I think.

Updated by naruse (Yui NARUSE) about 2 months ago

  • Backport changed from 3.0: DONTNEED, 3.1: DONTNEED, 3.2: DONTNEED, 3.3: REQUIRED to 3.0: DONTNEED, 3.1: DONTNEED, 3.2: DONTNEED, 3.3: DONE

ruby_3_3 0793cbbfde261f4fc9bf7045594d62a21e391811 merged revision(s) ef276858d9295208add48e27208c69184dc50472.

Actions

Also available in: Atom PDF

Like0
Like1Like0Like0Like0Like0Like0Like0Like0Like0