GProf and Perf are good tools to use on Linux.

This article in Japanese by Yui Naruse is a great start.


There is no usable GProf or Perf on Mac, so you'll need to use Instruments or GPerfTools. Instruments comes with XCode.

You'll need to configure Ruby with some nonstandard options:

For GPerfTools:

LIBS="-lprofiler" cflags="-fno-omit-frame-pointer" LDFLAGS="-Wl,-no_pie" ./configure && make

For GPerfTools this doesn't configure their malloc library, just their profiler.

For Instruments:

cflags="-fno-omit-frame-pointer" LDFLAGS="-Wl,-no_pie" ./configure && make

You may want to turn off some optimizations to make it easier to profile or debug. The command line above keeps all optimizations the same.

Your Test Code

For Instruments, you'll need to give the profiler the name of the binary to profile, and it should be the actual binary, not a script like runruby.rb. This may mean setting some extra environment variables.

To run optcarrot with Instruments and an un-installed local Ruby, here is what I use:

PATH=.:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin RUBYLIB=.:.ext/common:.ext/x86_64-darwin15:lib instruments -t "Time Profiler" -D outfile.trace ./ruby ../optcarrot/bin/optcarrot --benchmark ../optcarrot/examples/Lan_Master.nes

With GPerfTools you can mostly get away with using a script like RunRuby, but if you have trouble with recognizing symbols, try doing it the ugly explicit way before you give up.

GPerfTools Results

To gather results, set the CPUPROFILE environment variable. For instance:

CPUPROFILE=/tmp/prof.out ./ruby bin/optcarrot --benchmark examples/Lan_Master.nes

This will write a file starting with the prefix you give. For instance, CPUPROFILE=/tmp/prof.out will write a file like /tmp/prof.out.37525.

You can use pprof to see the results. For example, run "pprof ./ruby /tmp/prof.out.37525 --text" to see the result if your result file was called /tmp/prof.out.37525. A different file will be generated for each run.

Here's a blog post about how to use GPerfTools with the Ruby interpreter on a Mac.

GPerfTools Sampling Methods

GPerfTools is a sampling profiler. That means it has fairly little impact on program speed, but it doesn't see every function that gets called. It only samples a certain number of times per second, and saves a program counter and/or stack trace.

Profiling can be tricky with inlining and other optimizations. You can see the source for GPerfTools, including how it gets a stack trace on x86 and how it gets the Program Counter (PC).


Instruments can be run from the command line to gather results as well. Then you'll need to start the Instruments GUI and open the results file.

Example command line:

PATH=.:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin RUBYLIB=.:.ext/common:.ext/x86_64-darwin15:lib instruments -t "Time Profiler" -D outfile.trace ./ruby ../optcarrot/bin/optcarrot --benchmark ../optcarrot/examples/Lan_Master.nes

Verifying Optimizations

We use optcarrot as an important benchmark for verifying Ruby CPU optimizations. We plan to have more benchmarks for Ruby 3x3. Remember that benchmarks can give inconsistent results. One way to be sure your optimization works is check statistically with a dedicated script or a tool like ABProf.