From agentops
Profiles code, runs benchmarks, detects regressions, and suggests optimizations across Go, Python, Node.js, Rust, and shell environments.
How this skill is triggered — by the user, by Claude, or both
Slash command
/agentops:perfThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> Quick Ref: `/perf profile <target>` | `/perf bench <target>` | `/perf compare <baseline> <candidate>` | `/perf optimize <target>`
Quick Ref:
/perf profile <target>|/perf bench <target>|/perf compare <baseline> <candidate>|/perf optimize <target>
YOU MUST EXECUTE THIS WORKFLOW. Do not just describe it.
Performance profiling, benchmarking, regression detection, and optimization recommendations for any language runtime. Produces actionable metrics, not vague advice.
| Mode | Command | Purpose |
|---|---|---|
| Profile | /perf profile <target> | Profile execution, find hotspots |
| Benchmark | /perf bench <target> | Create or run benchmarks |
| Compare | /perf compare <baseline> <candidate> | Compare two runs for regression |
| Optimize | /perf optimize <target> | Analyze and apply optimizations |
If no mode is specified, default to profile.
Identify the language/runtime from file extensions, go.mod, package.json, pyproject.toml, Cargo.toml, or explicit user input. Select the profiling stack:
If the symptom may be host pressure rather than target-code performance, read references/system-pressure-triage.md before benchmarking.
For the repeatable measurement loop, profiler selection, and report metrics, read references/profiling-playbook.md.
| Language | Benchmarking | CPU Profile | Memory Profile | Comparison |
|---|---|---|---|---|
| Go | go test -bench | go tool pprof (cpu) | go tool pprof (alloc) | benchstat |
| Python | pytest-benchmark, timeit | cProfile, py-spy | memory_profiler, tracemalloc | manual diff |
| Node | benchmark.js, vitest bench | --prof, clinic.js | --heap-prof, 0x | manual diff |
| Rust | criterion, cargo bench | cargo flamegraph | heaptrack, DHAT | critcmp |
| Shell | hyperfine | time, strace | N/A | hyperfine built-in |
Check which tools are actually installed. If a preferred tool is missing, fall back to standard-library alternatives before asking the user to install anything.
Run existing benchmarks first. If none exist, create them.
# Go
grep -r "func Benchmark" --include="*_test.go" -l .
# Python
find . -name "test_*" -exec grep -l "benchmark\|@pytest.mark.benchmark" {} +
# Rust
grep -r "#\[bench\]" --include="*.rs" -l .
# Node
find . -name "*.bench.*" -o -name "*.benchmark.*"
If benchmarks exist for the target, run them and capture output. If none exist, write benchmarks covering the target function or module.
Benchmark requirements:
-benchtime=3s -count=5)Save raw baseline output to .agents/perf/baseline-YYYY-MM-DD.txt.
func BenchmarkTargetFunction(b *testing.B) {
// Setup outside the loop
input := prepareInput()
b.ResetTimer()
b.ReportAllocs()
for i := 0; i < b.N; i++ {
TargetFunction(input)
}
}
import pytest
@pytest.mark.benchmark(group="target")
def test_target_benchmark(benchmark):
input_data = prepare_input()
result = benchmark(target_function, input_data)
assert result is not None
Find functions consuming the most CPU time.
Go:
go test -bench=BenchmarkTarget -cpuprofile=cpu.prof ./...
go tool pprof -top cpu.prof
go tool pprof -text -cum cpu.prof # cumulative view
Python:
python -m cProfile -s cumulative target_script.py
# Or for running processes:
py-spy top --pid <PID>
py-spy record -o profile.svg --pid <PID>
Find allocation hotspots and potential leaks.
Go:
go test -bench=BenchmarkTarget -memprofile=mem.prof ./...
go tool pprof -top -alloc_space mem.prof
Python:
python -m memory_profiler target_script.py
# Or with tracemalloc in code:
# tracemalloc.start(); ...; snapshot = tracemalloc.take_snapshot()
Identify blocking operations in hot paths.
After profiling, produce a ranked list:
HOTSPOTS (by cumulative CPU time):
1. pkg/engine.Process 42.3% (1.2s) — main processing loop
2. pkg/engine.parseRecord 28.1% (0.8s) — record deserialization
3. pkg/io.ReadBatch 15.7% (0.45s) — disk reads
Classify each finding by estimated impact:
| Impact | Criteria | Action |
|---|---|---|
| High | >20% of total time or >50% of allocations | Fix immediately |
| Medium | 5-20% of total time or notable allocation waste | Fix in this session |
| Low | <5% of total time, minor inefficiency | Log for later |
Check the profiled code against these known performance killers:
strings.Builder / []byte / io.StringWriter)For each finding, state:
Critical rule: ONE optimization at a time.
For high-effort optimization work, load references/optimization-proof-loop.md before changing code. It defines the proof contract for isomorphic rewrites, benchmark deltas, and keep/revert decisions.
For each optimization:
benchstat (Go) or manual diffperf(<scope>): <description> (+X% throughput) or perf(<scope>): <description> (-X% latency)benchstat, or >5% consistent change for manual comparison)Apply optimizations in this order (highest expected impact first):
Write the report to .agents/perf/YYYY-MM-DD-perf-<target>.md.
# Performance Report: <target>
Date: YYYY-MM-DD
Mode: <profile|bench|compare|optimize>
Language: <detected>
## Summary
<1-2 sentence summary of findings>
## Baseline Metrics
| Metric | Value |
|--------|-------|
| ops/sec | ... |
| ns/op | ... |
| B/op | ... |
| allocs/op | ... |
| p50 latency | ... |
| p95 latency | ... |
| p99 latency | ... |
## Hotspots
<ranked list from Step 2>
## Findings
<classified findings from Step 3>
## Optimizations Applied (if optimize mode)
| Change | Before | After | Improvement |
|--------|--------|-------|-------------|
| ... | ... | ... | +X% |
## After Metrics (if optimize mode)
<same table as baseline, with new values>
## Recommendations
<remaining opportunities not addressed in this session>
When running /perf compare <baseline> <candidate>:
benchstat baseline.txt candidate.txtcritcmp baseline candidateOutput a summary table:
COMPARISON: baseline vs candidate
| Benchmark | Baseline | Candidate | Delta | Verdict |
|-----------|----------|-----------|-------|---------|
| BenchmarkProcess | 1.2ms | 0.9ms | -25% | IMPROVEMENT |
| BenchmarkParse | 450ns | 480ns | +6.7% | REGRESSION |
| BenchmarkIO | 3.1ms | 3.0ms | -3.2% | NOISE |
/refactor first to identify hot paths, then benchmark those.GOMAXPROCS=1), close competing processes.time for wall-clock and manual instrumentation for allocation counts.hyperfine for wall-clock benchmarking across any language.references/perf.feature — Executable spec: profile hotspots with metrics, bench, compare-regression, optimize (soc-qk4b)
npx claudepluginhub boshu2/agentops --plugin agentopsOrchestrates performance profiling and optimization across languages. Diagnoses symptoms, dispatches profiling agents, and manages before/after comparisons for latency, memory, CPU, and bundle issues.