From forge
Performance benchmarking. Runs performance tests, compares against baselines, identifies regressions, and produces a benchmark report. Use before /ship for performance-critical changes. Use before shipping perf-critical changes — triggered by 'benchmark this', 'check performance', 'run perf tests', 'is this fast enough', 'performance regression'.
How this skill is triggered — by the user, by Claude, or both
Slash command
/forge:benchmarkThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
You measure performance, compare against baselines, and flag regressions before shipping. Never modify application code during benchmarking — observation only.
You measure performance, compare against baselines, and flag regressions before shipping. Never modify application code during benchmarking — observation only.
Shared protocols apply — see
skills/shared/rules.md,skills/shared/compliance-telemetry.md,skills/shared/workflow-routing.md.
From $ARGUMENTS or auto-detect by scanning the codebase. Confirm targets with user before proceeding.
| Target Type | Signals | Key Metrics |
|---|---|---|
| API Endpoints | Route handlers, HTTP frameworks | p50/p95/p99 response time, req/sec, error rate |
| Database Queries | ORM, raw SQL, migrations | Execution time, row count, query plan |
| Algorithms / Logic | Compute-heavy functions, data processing | Wall clock across input sizes, memory, allocations |
| Build / Bundle | Build scripts, bundler configs | Build time, bundle size (total + per-chunk) |
| Page Load | SSR/CSR frameworks, static assets | Lighthouse score, LCP, FID, CLS, TTI |
Look for .forge/benchmark/baseline.json. If present, this run compares against it; otherwise this run establishes the initial baseline.
Minimum 3 iterations for timing benchmarks to reduce noise. Tools: wrk/ab/autocannon (HTTP), go test -bench/cargo bench/npm run bench (code), du -sh dist/ (bundle), lighthouse CLI or Playwright (page perf). Collect raw results per iteration for variance calculation. If a tool is missing, suggest installation but do not auto-install.
If a baseline exists, compare every metric and mark each OK, WARNING, or REGRESSION.
| Metric | Warning | Regression |
|---|---|---|
| Response time (p95) | > 10% increase | > 20% increase |
| Throughput | > 10% decrease | > 20% decrease |
| Bundle size | > 5% increase | > 10% increase |
| Build time | > 15% increase | > 30% increase |
| Memory usage | > 15% increase | > 25% increase |
| Lighthouse score | > 5 point drop | > 10 point drop |
Save to .forge/benchmark/report.md:
Write current results to .forge/benchmark/baseline.json only when user explicitly approves.
Report status (PASS/REGRESSION), target count, regression count, warnings, and report path. Regressions are warnings, not blockers — user decides acceptability.
/ship; alternative /review if not done./build to optimize then re-benchmark.Never run destructive load tests against production — local/staging only. Always compare against baseline when one exists. Report raw numbers and variance, not just averages. Record environment for reproducibility.
Compliance keys for scripts/compliance-log.sh benchmark <key> <severity>: code-modified/critical — app code modified during benchmarking; insufficient-runs/minor — fewer than 3 timing runs; baseline-saved-unapproved/major — baseline saved without approval.
npx claudepluginhub bakirp/forge --plugin forgeGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.