rperf

rperf logo

rperf

Know where your Ruby spends its time — accurately.
A sampling profiler that corrects safepoint bias using real time deltas.

Gem Version Ruby >= 3.4.0 Manual MIT License

Built-in flamegraph viewer  ·  CPU mode & wall mode (GVL + GC tracking)

Web site, Online manual, GitHub repository

See It in Action

$ rperf exec ruby fib.rb

 Performance stats for 'ruby fib.rb':

         2,023.3 ms   user
             4.3 ms   sys
         2,001.8 ms   real

         2,000.3 ms 100.0%  [Rperf] CPU execution
             3.0 ms         [Ruby ] GC time (4 count: 2 minor, 2 major)
          48,741            [Ruby ] allocated objects
          27,034            [Ruby ] freed objects
               1            [Ruby ] detected threads
              16 MB         [OS   ] peak memory (maxrss)
           5,784            [OS   ] page faults (5,783 minor, 1 major)
              22            [OS   ] context switches (13 voluntary, 9 involuntary)
               0 MB         [OS   ] disk I/O (0 MB read, 0 MB write)

 Flat:
         1,998.4 ms  99.9%  Object#fibonacci (fib.rb)
             1.9 ms   0.1%  Module#method_added (<C method>)

 Cumulative:
         2,000.3 ms 100.0%  <main> (fib.rb)
         1,998.4 ms  99.9%  Object#fibonacci (fib.rb)
             1.9 ms   0.1%  Module#method_added (<C method>)

  1999 samples / 1999 triggers, 0.1% profiler overhead

Quick Start

# Performance summary (wall mode, prints to stderr)
rperf stat ruby app.rb

# Record a profile to file
rperf record ruby app.rb                     # → rperf.json.gz (cpu mode, default)
rperf record -m wall ruby server.rb          # wall mode

# View results in browser
rperf report                                 # open rperf.json.gz in viewer
rperf report --top profile.json.gz           # print top functions to terminal

# Compare two profiles (requires Go)
rperf diff before.json.gz after.json.gz      # open diff in browser

On rperf report, you can see the profile result like this page: rprof viewer

Ruby API

require "rperf"

# Block form — profiles and saves to file
Rperf.start(output: "profile.json.gz", frequency: 500, mode: :cpu) do
  # code to profile
end

# Manual start/stop
Rperf.start(frequency: 1000, mode: :wall)
# ...
data = Rperf.stop
Rperf.save("profile.json.gz", data)

In-browser Viewer

# config.ru
require "rperf/viewer"
require "rperf/rack"

Rperf.start(mode: :wall, defer: true)
use Rperf::Viewer           # visit /rperf/ for flamegraph UI
use Rperf::RackMiddleware   # labels each request
run MyApp

# Snapshot every 60 minutes
Thread.new { loop { sleep 3600; Rperf::Viewer.instance&.take_snapshot! } }

Note: Rperf::Viewer has no built-in authentication. In production, restrict access with your framework’s auth mechanisms (e.g., route constraints in Rails). See the manual for examples.

Environment Variables

Profile without code changes (e.g., Rails):

RPERF_ENABLED=1 RPERF_MODE=wall ruby app.rb    # → rperf.json.gz
rperf report                                    # open in viewer

Run rperf help for full documentation, or see the online manual.

Subcommands

Inspired by Linux perf — familiar subcommand interface for profiling workflows.

Command Description
rperf record Profile a command and save to file (default: .json.gz)
rperf stat Profile a command and print summary to stderr
rperf exec Profile a command and print full report to stderr
rperf report Open viewer for .json.gz; wraps go tool pprof for .pb.gz (requires Go)
rperf diff Compare two profiles (requires Go)
rperf help Show full reference documentation

How It Works

The Challenge: Safepoint Sampling

Most Ruby profilers (e.g., stackprof) use signal handlers to capture stack traces at the exact moment the timer fires. rperf takes a different approach — it samples at safepoints (VM checkpoints), which is safer (no async-signal-safety concerns, reliable access to VM state) but means the sample timing can be delayed. Without correction, this delay would skew the results.

The Fix: Weight = Real Time

rperf uses actual elapsed time as sample weights — so delayed samples carry proportionally more weight, and the profile matches reality:

Timer (signal or thread)         VM thread (postponed job)
────────────────────────         ────────────────────────
  every 1/frequency sec:          at next safepoint:
    rb_postponed_job_trigger()  →   rperf_sample_job()
                                      time_now = read_clock()
                                      weight = time_now - prev_time
                                      record(backtrace, weight)

On Linux, the timer uses timer_create + signal delivery to a dedicated worker thread. On other platforms, a dedicated pthread with nanosleep is used.

If a safepoint is delayed, the sample carries proportionally more weight. The total weight equals the total time, accurately distributed across call stacks.

Modes

Mode Clock What it measures
cpu (default) CLOCK_THREAD_CPUTIME_ID CPU time consumed (excludes sleep/I/O)
wall CLOCK_MONOTONIC Real elapsed time (includes everything)

Use cpu to find what consumes CPU. Use wall to find what makes things slow (I/O, GVL contention, GC).

GVL and GC Labels

rperf hooks GVL and GC events to attribute non-CPU time. These are recorded as labels on samples rather than synthetic stack frames:

Label (key=value) Mode Meaning
%GVL=blocked wall only Off-GVL time (I/O, sleep, C extension releasing GVL)
%GVL=wait wall only Waiting to reacquire the GVL (contention)
%GC=mark cpu and wall Time in GC mark phase (wall time)
%GC=sweep cpu and wall Time in GC sweep phase (wall time)

Why rperf?

Limitations

Output Formats

Format Extension Viewer
JSON (default) .json.gz rperf report (built-in viewer), Rperf.load, any JSON tool
pprof .pb.gz go tool pprof (requires Go), speedscope
collapsed .collapsed FlameGraph, speedscope
text .txt any text viewer

Format is auto-detected from extension, or set explicitly with --format.

License

MIT