Backport #8805

Ruby GC::Profiler returns incorrect info on Solaris (and relatives)

Added by Eric Saxby 8 months ago. Updated 8 months ago.

[ruby-core:56762]
Status:Closed
Priority:Normal
Assignee:Tomoyuki Chikanaga

Description

We use SmartOS as our deployment platform, and noticed when attempting to roll out Ruby 2.0.0-p247 to some SmartOS hosts that garbage collection info is broken with integrations such as New Relic. Investigating further, we found that GC::Profiler.total_time always returns 0.0.

GC::Profiler.enable
=> nil
GC.start
=> nil
GC::Profiler.total_time
=> 0.0

It may be related to this issue: https://bugs.ruby-lang.org/issues/7500 where the mechanism by which the profiler gets timestamps from the OS.

Associated revisions

Revision 42653
Added by Tomoyuki Chikanaga 8 months ago

merge revision(s) 42639: [Backport #8805]

* gc.c (getrusage_time): Fallback clock_gettime to getrusage when
  clock_gettime fails.
  Reported by Eric Saxby.   [Bug #8805]

History

#1 Updated by Eric Saxby 8 months ago

I grabbed the c test code from the other ticket, and with some changes got these results:

cat ./timing.c
#include
#include
#include
#include
#include

double getrusagetime() {
struct rusage usage;
struct timeval time;
getrusage(RUSAGE
SELF, &usage);
time = usage.ruutime;
return time.tv
sec + time.tv_usec * 1e-6;
}

void printclockres(clock_id) {
int rc;
struct timespec res;

    rc = clock_getres(clock_id, &res);
    if (rc == 0)
            printf("clock_res() for %d: %ldns\n", clock_id, res.tv_nsec);

}

double clocktime(clockid) {
struct timespec ts;

            if (clock_gettime(clock_id, &ts) == 0) {
                            return ts.tv_sec + ts.tv_nsec * 1e-9;
            }
            return 0.0;

}

void printclocktime(clock_id) {
int n;

            printf("clock_gettime() for %d before: %f\n", clock_id, clock_time(clock_id));
            for (n=0; n<10000; n++) pow(2, 2048);
            printf("clock_gettime() for %d after: %f\n", clock_id, clock_time(clock_id));

}

int main() {
int n;

            printf("getrusage() before: %f\n", getrusage_time());
            for (n=0; n<10000; n++) pow(2, 2048);
            printf("getrusage() after: %f\n", getrusage_time());

printf("\nclock_gettime() CLOCK_HIGHRES\n");

print_clock_res(CLOCK_HIGHRES);
print_clock_time(CLOCK_HIGHRES);

printf("\nclock_gettime() CLOCK_PROCESS_CPUTIME_ID\n");

print_clock_res(CLOCK_PROCESS_CPUTIME_ID);
print_clock_time(CLOCK_PROCESS_CPUTIME_ID);

printf("\nclock_gettime() CLOCK_THREAD_CPUTIME_ID\n");

print_clock_res(CLOCK_THREAD_CPUTIME_ID);
print_clock_time(CLOCK_THREAD_CPUTIME_ID);

printf("\nclock_gettime() CLOCK_MONOTONIC\n");

print_clock_res(CLOCK_MONOTONIC);
print_clock_time(CLOCK_MONOTONIC);

}

gcc -o timing timing.c -lrt

./timing
getrusage() before: 0.000382
getrusage() after: 0.000432

clockgettime() CLOCKHIGHRES
clockres() for 4: 20ns
clock
gettime() for 4 before: 3140535.103844
clock_gettime() for 4 after: 3140535.103877

clockgettime() CLOCKPROCESSCPUTIMEID
clockgettime() for 5 before: 0.000000
clock
gettime() for 5 after: 0.000000

clockgettime() CLOCKTHREADCPUTIMEID
clockgettime() for 2 before: 0.000000
clock
gettime() for 2 after: 0.000000

clockgettime() CLOCKMONOTONIC
clockres() for 4: 20ns
clock
gettime() for 4 before: 3140535.103960
clock_gettime() for 4 after: 3140535.103990

#2 Updated by Eric Saxby 8 months ago

After more investigation, the clockgettime calls for CLOCKPROCESSCPUTIMEID return -1, with an errno of 22. I think this maps to: EINVAL The clock_id argument does not specify a known clock.

Sooo I think for SmartOS, and probably for Solaris in general, GC::Profiler should use getrusage() instead of clockgettime(). I'm a little unclear on the best way to approach this, however. Maybe configure should test for a 0 response to clockgettime, and set a flag specific to CLOCKPROCESSCPUTIME_ID that gc.c can pick up?

#3 Updated by Akira Tanaka 8 months ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r42639.
Eric, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • gc.c (getrusagetime): Fallback clockgettime to getrusage when clock_gettime fails. Reported by Eric Saxby. [Bug #8805]

#4 Updated by Akira Tanaka 8 months ago

  • Status changed from Closed to Open
  • Backport changed from 1.9.3: UNKNOWN, 2.0.0: UNKNOWN to 1.9.3: DONTNEED, 2.0.0: REQUIRED

#5 Updated by Eric Saxby 8 months ago

Fantastic! I've rolled out this patch to some SmartOS hosts this morning and it fixes the GC::Profiler total_time reporting.

Thank you so much!

#6 Updated by Eric Saxby 8 months ago

Hi Akira,

I have one question. It looks like this has only been applied to trunk, and not to the ruby20_0 branch. I'm not yet totally familiar with the version control workflow of ruby-core, but does that mean that this will be released in 2.1? Will it also be in the next patch release of 2.0?

Thank you!

#7 Updated by Akira Tanaka 8 months ago

2013/8/22 sax (Eric Saxby) sax@livinginthepast.org:

I have one question. It looks like this has only been applied to trunk, and not to the ruby20_0 branch. I'm not yet totally familiar with the version control workflow of ruby-core, but does that mean that this will be released in 2.1? Will it also be in the next patch release of 2.0?

I think this should be backported to Ruby 2.0.0.

So I reopened this issue and updated the backport field as:

The Ruby 2.0.0 maintainer (nagachika) may backport the patch.
--
Tanaka Akira

#8 Updated by Tomoyuki Chikanaga 8 months ago

  • Tracker changed from Bug to Backport
  • Project changed from ruby-trunk to Backport200
  • Category deleted (core)
  • Status changed from Open to Assigned
  • Assignee set to Tomoyuki Chikanaga

#9 Updated by Tomoyuki Chikanaga 8 months ago

  • Status changed from Assigned to Closed

This issue was solved with changeset r42653.
Eric, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


merge revision(s) 42639: [Backport #8805]

* gc.c (getrusage_time): Fallback clock_gettime to getrusage when
  clock_gettime fails.
  Reported by Eric Saxby.   [Bug #8805]

#10 Updated by Eric Saxby 8 months ago

Thank you both!

Also available in: Atom PDF