A safepoint-based sampling performance profiler for Ruby. Uses actual time deltas as sample weights to correct safepoint bias.
| Online manual | GitHub |
gem install sperf
# Performance summary (wall mode, prints to stderr)
sperf stat ruby app.rb
# Profile to file
sperf record ruby app.rb # → sperf.data (pprof, cpu mode)
sperf record -m wall -o profile.pb.gz ruby server.rb # wall mode, custom output
# View results (report/diff require Go: https://go.dev/dl/)
sperf report # open sperf.data in browser
sperf report --top profile.pb.gz # print top functions to terminal
# Compare two profiles
sperf diff before.pb.gz after.pb.gz # open diff in browser
sperf diff --top before.pb.gz after.pb.gz # print diff to terminal
require "sperf"
# Block form — profiles and saves to file
Sperf.start(output: "profile.pb.gz", frequency: 500, mode: :cpu) do
# code to profile
end
# Manual start/stop
Sperf.start(frequency: 1000, mode: :wall)
# ...
data = Sperf.stop
Sperf.save("profile.pb.gz", data)
Profile without code changes (e.g., Rails):
SPERF_ENABLED=1 SPERF_MODE=wall SPERF_OUTPUT=profile.pb.gz ruby app.rb
Run sperf help for full documentation, or see the online manual.
Inspired by Linux perf — familiar subcommand interface for profiling workflows.
| Command | Description |
|---|---|
sperf record |
Profile a command and save to file |
sperf stat |
Profile a command and print summary to stderr |
sperf report |
Open pprof profile with go tool pprof (requires Go) |
sperf diff |
Compare two pprof profiles (requires Go) |
sperf help |
Show full reference documentation |
Ruby’s sampling profilers collect stack traces at safepoints, not at the exact timer tick. Traditional profilers assign equal weight to every sample, so if a safepoint is delayed 5ms, that delay is invisible.
sperf uses time deltas as sample weights:
Timer (signal or thread) VM thread (postponed job)
──────────────────────── ────────────────────────
every 1/frequency sec: at next safepoint:
rb_postponed_job_trigger() → sperf_sample_job()
time_now = read_clock()
weight = time_now - prev_time
record(backtrace, weight)
On Linux, the timer uses timer_create + signal delivery (no extra thread).
On other platforms, a dedicated pthread with nanosleep is used.
If a safepoint is delayed, the sample carries proportionally more weight. The total weight equals the total time, accurately distributed across call stacks.
| Mode | Clock | What it measures |
|---|---|---|
cpu (default) |
CLOCK_THREAD_CPUTIME_ID |
CPU time consumed (excludes sleep/I/O) |
wall |
CLOCK_MONOTONIC |
Real elapsed time (includes everything) |
Use cpu to find what consumes CPU. Use wall to find what makes things slow (I/O, GVL contention, GC).
sperf hooks GVL and GC events to attribute non-CPU time:
| Frame | Meaning |
|---|---|
[GVL blocked] |
Off-GVL time (I/O, sleep, C extension releasing GVL) |
[GVL wait] |
Waiting to reacquire the GVL (contention) |
[GC marking] |
Time in GC mark phase |
[GC sweeping] |
Time in GC sweep phase |
go tool pprof, speedscope, and other standard tools.sperf stat ruby app.rb) or environment variables (SPERF_ENABLED=1).record, stat, report, diff — inspired by Linux perf.| Format | Extension | Use case |
|---|---|---|
| pprof (default) | .pb.gz |
sperf report, go tool pprof, speedscope |
| collapsed | .collapsed |
FlameGraph (flamegraph.pl), speedscope |
| text | .txt |
Human/AI-readable flat + cumulative report |
Format is auto-detected from extension, or set explicitly with --format.
MIT