From claude-corps
Use when you need measured performance evidence by running a repeatable command on the current branch and a baseline ref
How this skill is triggered — by the user, by Claude, or both
Slash command
/claude-corps:benchmarkThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Run a repeatable benchmark command on the current branch and compare it with a baseline ref. This skill is for measurement, not speculation.
Run a repeatable benchmark command on the current branch and compare it with a baseline ref. This skill is for measurement, not speculation.
If the user wants architectural performance review without running code, use /multi-review with performance-oracle instead.
/benchmark "<command>" [--baseline <ref>] [--iterations N]
Defaults:
--baseline origin/main--iterations 51Reject the request if no command is provided.
git status --shortUse a temporary directory under /tmp or equivalent. Never benchmark the baseline by checking out over the user's current branch.
Environment setup rules:
CLAUDE.md, README.md, and project manifests such as package.json, pyproject.toml, Cargo.toml, or Makefilepnpm install --frozen-lockfile, uv sync, cargo fetch, or make setupFor both current branch and baseline:
N measured executionsPrefer a consistent timing mechanism for every run. Keep environment conditions as similar as possible across both refs.
Compute for current and baseline:
If run-to-run variance is high, say so. Do not overclaim a tiny difference hidden by noise.
## Benchmark Summary
Command: ...
Baseline: ...
Iterations: ...
### Current Branch
Runs: [...]
Median: ...
Min/Max: ...
### Baseline
Runs: [...]
Median: ...
Min/Max: ...
### Delta
Current vs baseline median: ...%
### Notes
- command-native metrics if any
- variance caveats
- anything that may have skewed results
Always remove the temporary baseline worktree before exiting, even on failure.
npx claudepluginhub josephneumann/claude-corps --plugin claude-corpsBenchmarks shell metric command N times across 2-3 git refs or repo states, checks variance, computes deltas vs baseline, outputs reproducible TSV table and summary. For honest code variant comparisons.
Creates and runs reliable benchmarks to measure code change impacts on performance, including latency, throughput. Supports Node.js (vitest, tinybench), Python (pytest-benchmark), frontend (Lighthouse CI), with warmup, stats.
Use when a backpressured loop needs to run benchmarks on a performance-sensitive project and decide whether a change is a regression, an improvement, or a wash — per-iteration sanity checks and the full pre-done run.