From workflow
Use after /execute, before /review, when user says "benchmark", "performance", "check speed", or before shipping performance-sensitive changes.
How this skill is triggered — by the user, by Claude, or both
Slash command
/workflow:benchmark [feature-name] or [URL][feature-name] or [URL]The summary Claude sees in its skill listing — used to decide when to auto-load this skill
**Philosophy:** Performance is a feature. Measure it, track it, and catch regressions before they ship. Bundle size is the leading performance indicator — it's cheap to measure and correlates strongly with load time. Before/after comparison on every PR prevents the slow creep that turns fast apps into slow ones.
Philosophy: Performance is a feature. Measure it, track it, and catch regressions before they ship. Bundle size is the leading performance indicator — it's cheap to measure and correlates strongly with load time. Before/after comparison on every PR prevents the slow creep that turns fast apps into slow ones.
Duration targets: BRIEF ~5-10 minutes (bundle size + basic timing), STANDARD ~10-20 minutes (CWV + bundle + comparison), COMPREHENSIVE ~20-40 minutes (full audit + historical trend).
Performance regressions accumulate silently. Each PR adds "just 50ms" until the app is 3 seconds slower than it was 6 months ago. Without measurement, nobody notices until users complain. Benchmarking on every feature prevents this by catching regressions at the source — before they compound.
Run this skill when:
/execute completes, before /reviewDo NOT use for:
At every PAUSE point in this skill, call the AskUserQuestion tool to present structured options to the user.
For pattern details and examples: ../_shared/references/stage-gates.md
Fallback: Only if
AskUserQuestionis not available as a tool (check your tool list), fall back to presenting options as markdown text and waiting for freeform response.
| Mode | When | Measurements |
|---|---|---|
| BRIEF | Quick check, small change | Bundle size comparison only |
| STANDARD | Typical feature | Bundle size + page load times + Core Web Vitals |
| COMPREHENSIVE | Performance-sensitive feature | Full audit + historical trend + resource breakdown + budget check |
Phase 1: Baseline Capture (if needed)
Phase 2: Current Measurement
Phase 3: Comparison & Regression Detection
── PAUSE 1: "Performance report. {N} regressions detected." ──
Before starting, verify:
docs/benchmarks/baseline.json or will be captured in Phase 1Step 1.1 — Check for Existing Baseline:
Look for docs/benchmarks/baseline.json or equivalent. If it exists, use it for comparison. If not, capture one from the main branch.
Step 1.2 — Capture Baseline (if needed):
If no baseline exists, measure the main branch state:
Run the project's production build command (check package.json scripts or project CLAUDE.md):
# Build production bundle using project's build command
# Examples: ng build --prod, npm run build, next build, vite build, dotnet publish
Identify the build output directory from the project's build configuration, then measure file sizes of all JS and CSS bundles. Match bundles by chunk type (main, vendor, lazy), not by filename — modern bundlers use content hashing.
Record:
Save to docs/benchmarks/baseline.json.
Step 2.1 — Build & Measure:
Build the current branch in production mode and capture the same metrics as baseline.
Step 2.2 — Core Web Vitals (STANDARD+):
If browser automation is available, measure for each target page using Lighthouse CLI (npx lighthouse {url} --output json --chrome-flags="--headless"), browser DevTools Performance API, or the project's existing performance tools. If no browser automation is available, measure bundle sizes only (fall back to BRIEF mode).
For timing measurements, take the median of 3 runs — network conditions and system load cause variance. Bundle sizes are deterministic and need only one measurement.
| Metric | Good | Needs Improvement | Poor |
|---|---|---|---|
| LCP (Largest Contentful Paint) | ≤2.5s | 2.5-4s | >4s |
| INP (Interaction to Next Paint) | ≤200ms | 200-500ms | >500ms |
| CLS (Cumulative Layout Shift) | ≤0.1 | 0.1-0.25 | >0.25 |
Step 2.3 — Resource Breakdown (COMPREHENSIVE):
Identify the slowest and largest resources:
Step 3.1 — Compare Against Baseline:
Regression thresholds: These are default thresholds. If the project defines performance budgets (in lighthouse config, bundlesize config, or CLAUDE.md), use those instead.
| Metric | WARNING | REGRESSION |
|---|---|---|
| Bundle size (total) | >10% increase | >25% increase |
| Bundle size (single chunk) | >15% increase | >30% increase |
| Page load time | >20% slower | >50% slower or >500ms absolute |
| LCP | >20% slower | >50% slower or exceeds "Good" threshold |
| INP | >20% slower | >50% slower or exceeds "Good" threshold |
| CLS | >0.05 increase | >0.1 increase or exceeds "Good" threshold |
Step 3.2 — Build Report:
Save to docs/benchmarks/benchmark-{timestamp}.md:
# Performance Benchmark: {Feature Name}
> **Date:** {date}
> **Branch:** {branch}
> **Mode:** {BRIEF/STANDARD/COMPREHENSIVE}
## Summary
| Verdict | Details |
|---------|---------|
| **Overall** | ✅ PASS / ⚠️ WARNING / ❌ REGRESSION |
| **Regressions** | {count} |
| **Warnings** | {count} |
## Bundle Size Comparison
| Bundle | Baseline | Current | Change | Status |
|--------|----------|---------|--------|--------|
| main.js | {size} | {size} | {+/-}% | ✅/⚠️/❌ |
| vendor.js | {size} | {size} | {+/-}% | ✅/⚠️/❌ |
| styles.css | {size} | {size} | {+/-}% | ✅/⚠️/❌ |
| **Total** | **{size}** | **{size}** | **{+/-}%** | **{status}** |
## Page Load Times (STANDARD+)
| Page | Baseline | Current | Change | Status |
|------|----------|---------|--------|--------|
| {route} | {time}ms | {time}ms | {+/-}% | ✅/⚠️/❌ |
## Core Web Vitals (STANDARD+)
| Metric | Baseline | Current | Change | Rating |
|--------|----------|---------|--------|--------|
| LCP | {time} | {time} | {+/-}% | Good/NI/Poor |
| INP | {time} | {time} | {+/-}% | Good/NI/Poor |
| CLS | {score} | {score} | {+/-} | Good/NI/Poor |
## Resource Breakdown (COMPREHENSIVE)
### Largest Bundles
| Bundle | Size | % of Total |
|--------|------|-----------|
| {name} | {size} | {%} |
### Slowest Resources
| Resource | Time | Type |
|----------|------|------|
| {url} | {time}ms | {js/css/img/api} |
PAUSE 1: Present the summary (verdict, regression count, key metrics) as formatted markdown, then:
AskUserQuestion:
question: "Performance benchmark complete. {verdict}. How to proceed?"
header: "Benchmark"
multiSelect: false
options:
- label: "Accept (Recommended)"
description: "Performance is acceptable. Proceed."
- label: "Investigate regressions"
description: "Dig into the specific regressions to find the cause."
- label: "Optimise"
description: "Address performance issues before proceeding."
- label: "Update baseline"
description: "Accept current measurements as the new baseline."
Measuring in Development Mode — Development builds include source maps, unminified code, and hot-reload overhead. Always benchmark production builds for meaningful comparisons.
Ignoring Bundle Size — "It's just 50KB more" — every 50KB compounds. Bundle size is the single most predictable performance metric and the easiest to track. Treat it as seriously as test coverage.
Benchmarking Without Baseline — A number without context is meaningless. "Page loads in 1.2s" — is that good or bad? Without a baseline, you can't detect regressions. Always capture before measuring.
Single-Run Measurements — Network conditions and system load vary. For timing measurements, take the median of 3 runs minimum. Bundle sizes are deterministic and need only one measurement.
Moving the Baseline to Hide Regressions — Updating the baseline should be a conscious decision, not an automatic step. The baseline represents the performance contract with users. Update it when you intentionally change performance characteristics, not to make warnings disappear.
| Signal | Meaning | Next Action |
|---|---|---|
| PASS | No regressions | Proceed to /review |
| WARNING | Minor regressions | Note in review, proceed |
| REGRESSION | Significant regressions | Investigate and fix before /review |
When complete: "Benchmark saved to docs/benchmarks/benchmark-{timestamp}.md. {verdict}."
Skill Version: 1.0 v1.0: Initial release. Bundle size comparison, Core Web Vitals measurement, regression thresholds, before/after comparison, resource breakdown, baseline management. Inspired by gstack's /benchmark performance engineering patterns.
npx claudepluginhub dirkkok101/skillsGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.