Backport #8805

Ruby GC::Profiler returns incorrect info on Solaris (and relatives)

Added by Eric Saxby over 1 year ago. Updated over 1 year ago.

[ruby-core:56762]
Status:Closed
Priority:Normal
Assignee:Tomoyuki Chikanaga

Description

We use SmartOS as our deployment platform, and noticed when attempting to roll out Ruby 2.0.0-p247 to some SmartOS hosts that garbage collection info is broken with integrations such as New Relic. Investigating further, we found that GC::Profiler.total_time always returns 0.0.

GC::Profiler.enable
=> nil
GC.start
=> nil
GC::Profiler.total_time
=> 0.0

It may be related to this issue: https://bugs.ruby-lang.org/issues/7500 where the mechanism by which the profiler gets timestamps from the OS.

Associated revisions

Revision 42639
Added by Akira Tanaka over 1 year ago

  • gc.c (getrusage_time): Fallback clock_gettime to getrusage when clock_gettime fails. Reported by Eric Saxby. [Bug #8805]

Revision 42653
Added by Tomoyuki Chikanaga over 1 year ago

merge revision(s) 42639: [Backport #8805]

* gc.c (getrusage_time): Fallback clock_gettime to getrusage when
  clock_gettime fails.
  Reported by Eric Saxby.   [Bug #8805]

History

#1 Updated by Eric Saxby over 1 year ago

I grabbed the c test code from the other ticket, and with some changes got these results:

cat ./timing.c
#include
#include
#include
#include
#include

double getrusage_time() {
struct rusage usage;
struct timeval time;
getrusage(RUSAGE_SELF, &usage);
time = usage.ru_utime;
return time.tv_sec + time.tv_usec * 1e-6;
}

void print_clock_res(clock_id) {
int rc;
struct timespec res;

    rc = clock_getres(clock_id, &res);
    if (rc == 0)
            printf("clock_res() for %d: %ldns\n", clock_id, res.tv_nsec);

}

double clock_time(clock_id) {
struct timespec ts;

            if (clock_gettime(clock_id, &ts) == 0) {
                            return ts.tv_sec + ts.tv_nsec * 1e-9;
            }
            return 0.0;

}

void print_clock_time(clock_id) {
int n;

            printf("clock_gettime() for %d before: %f\n", clock_id, clock_time(clock_id));
            for (n=0; n<10000; n++) pow(2, 2048);
            printf("clock_gettime() for %d after: %f\n", clock_id, clock_time(clock_id));

}

int main() {
int n;

            printf("getrusage() before: %f\n", getrusage_time());
            for (n=0; n<10000; n++) pow(2, 2048);
            printf("getrusage() after: %f\n", getrusage_time());

printf("\nclock_gettime() CLOCK_HIGHRES\n");

print_clock_res(CLOCK_HIGHRES);
print_clock_time(CLOCK_HIGHRES);

printf("\nclock_gettime() CLOCK_PROCESS_CPUTIME_ID\n");

print_clock_res(CLOCK_PROCESS_CPUTIME_ID);
print_clock_time(CLOCK_PROCESS_CPUTIME_ID);

printf("\nclock_gettime() CLOCK_THREAD_CPUTIME_ID\n");

print_clock_res(CLOCK_THREAD_CPUTIME_ID);
print_clock_time(CLOCK_THREAD_CPUTIME_ID);

printf("\nclock_gettime() CLOCK_MONOTONIC\n");

print_clock_res(CLOCK_MONOTONIC);
print_clock_time(CLOCK_MONOTONIC);

}

gcc -o timing timing.c -lrt

./timing
getrusage() before: 0.000382
getrusage() after: 0.000432

clock_gettime() CLOCK_HIGHRES
clock_res() for 4: 20ns
clock_gettime() for 4 before: 3140535.103844
clock_gettime() for 4 after: 3140535.103877

clock_gettime() CLOCK_PROCESS_CPUTIME_ID
clock_gettime() for 5 before: 0.000000
clock_gettime() for 5 after: 0.000000

clock_gettime() CLOCK_THREAD_CPUTIME_ID
clock_gettime() for 2 before: 0.000000
clock_gettime() for 2 after: 0.000000

clock_gettime() CLOCK_MONOTONIC
clock_res() for 4: 20ns
clock_gettime() for 4 before: 3140535.103960
clock_gettime() for 4 after: 3140535.103990

#2 Updated by Eric Saxby over 1 year ago

After more investigation, the clock_gettime calls for CLOCK_PROCESS_CPUTIME_ID return -1, with an errno of 22. I think this maps to: EINVAL The clock_id argument does not specify a known clock.

Sooo I think for SmartOS, and probably for Solaris in general, GC::Profiler should use getrusage() instead of clock_gettime(). I'm a little unclear on the best way to approach this, however. Maybe configure should test for a 0 response to clock_gettime, and set a flag specific to CLOCK_PROCESS_CPUTIME_ID that gc.c can pick up?

#3 Updated by Akira Tanaka over 1 year ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r42639.
Eric, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • gc.c (getrusage_time): Fallback clock_gettime to getrusage when clock_gettime fails. Reported by Eric Saxby. [Bug #8805]

#4 Updated by Akira Tanaka over 1 year ago

  • Status changed from Closed to Open
  • Backport changed from 1.9.3: UNKNOWN, 2.0.0: UNKNOWN to 1.9.3: DONTNEED, 2.0.0: REQUIRED

#5 Updated by Eric Saxby over 1 year ago

Fantastic! I've rolled out this patch to some SmartOS hosts this morning and it fixes the GC::Profiler total_time reporting.

Thank you so much!

#6 Updated by Eric Saxby over 1 year ago

Hi Akira,

I have one question. It looks like this has only been applied to trunk, and not to the ruby_2_0_0 branch. I'm not yet totally familiar with the version control workflow of ruby-core, but does that mean that this will be released in 2.1? Will it also be in the next patch release of 2.0?

Thank you!

#7 Updated by Akira Tanaka over 1 year ago

2013/8/22 sax (Eric Saxby) sax@livinginthepast.org:

I have one question. It looks like this has only been applied to trunk, and not to the ruby_2_0_0 branch. I'm not yet totally familiar with the version control workflow of ruby-core, but does that mean that this will be released in 2.1? Will it also be in the next patch release of 2.0?

I think this should be backported to Ruby 2.0.0.

So I reopened this issue and updated the backport field as:

The Ruby 2.0.0 maintainer (nagachika) may backport the patch.
--
Tanaka Akira

#8 Updated by Tomoyuki Chikanaga over 1 year ago

  • Tracker changed from Bug to Backport
  • Project changed from Ruby trunk to Backport200
  • Category deleted (core)
  • Status changed from Open to Assigned
  • Assignee set to Tomoyuki Chikanaga

#9 Updated by Tomoyuki Chikanaga over 1 year ago

  • Status changed from Assigned to Closed

This issue was solved with changeset r42653.
Eric, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


merge revision(s) 42639: [Backport #8805]

* gc.c (getrusage_time): Fallback clock_gettime to getrusage when
  clock_gettime fails.
  Reported by Eric Saxby.   [Bug #8805]

#10 Updated by Eric Saxby over 1 year ago

Thank you both!

Also available in: Atom PDF