GC::Profiler.total_time under-reports GC time compared to dtrace GC probe measurement
I'm trying to square the numbers that I'm getting from GC::Profiler.total_time against those that I'm getting out of instrumentation with the GC dtrace probes embedded in Ruby, and having a hard time getting the two sources to agree.
I'm not sure if this is due to a legitimate bug in Ruby, or a misunderstanding on my part about what the two measurements mean.
You can reproduce this using the scripts in this gist (run standalone.rb first, it will prompt you for what to do next):
The high-level summary of what that does is:
1. Call GC::Profiler.enable
2. Save GC::Profiler.total_time
3. Instrument with a dtrace script that tracks mark and sweep start/stop and keeps a running total of GC time
4. Run some code that exercises GC
5. Calculate elapsed GC time with GC::Profiler.total_time -
6. Compare the Ruby-measured total GC time to the dtrace-measured total GC time
It seems that the measurement from GC::Profiler is consistently lower than the dtrace measurement, by a non-trivial margin (15-20% in my testing).
Looking at GC::Profiler.raw_data, the bulk of the difference seems to be in the sweep time measurement (mark times line up pretty closely between the two ways of measuring).
Any insight into whether this represents a legitimate bug, an error in my measurement technique, or a misunderstanding of these measurements would be greatly appreciated!
#2 [ruby-core:61448] Updated by Ben Weintraub about 2 years ago
Worth noting: a quick read through gc.c suggested that the missing GC sweep time might be due to me not having built with GC_PROFILE_MORE_DETAIL, but even after building with that, I still see the same discrepancy where GC::Profiler is significantly lower.
#3 [ruby-core:61453] Updated by Ben Weintraub about 2 years ago
I realized that these are actually measuring different things: dtrace's timestamps measure wall clock time, whereas GC::Profiler on Mac OS X uses getrusage, which measures user CPU time. It still seems weird that the two would be so divergent though, given that GC mark and sweep should be CPU-bound.
#4 [ruby-core:61481] Updated by Ben Weintraub about 2 years ago
I think I've figured out the discrepancy here: the dtrace probes wrap around the getrusage(2) calls that GC::Profiler bases its timings on for Mac OS X. The average lazy sweep time is quite short (single-digit microseconds per lazy sweep). Unfortunately, getrusage itself has an overhead of ~1.5 us per call on average on my Mac OS X box, which adds up to 3 us total per lazy sweep (since we call it once to start the timer and once to stop). That means dtrace sees a measurement for lazy sweeps that's on average 3 us higher than what GC::Profiler is able to measure. Because there are so many lazy sweeps, these 3 us chunks add up to a non-trivial amount of time.
I'm guessing that this is less of an issue on Linux, because clock_gettime will be used there instead of getrusage.
Feel free to close this out.
#5 [ruby-core:61498] Updated by Ben Weintraub about 2 years ago
One minor follow-up: it's actually not that getrusage takes a 'long' time (relative to the cost of each lazy sweep invocation), it's the dtrace probes themselves firing. The conclusion remains the same, though GC::Profiler seems correct.