From grafana-app-sdk
End-to-end performance, load, and stress testing of public websites with k6. Produces hybrid protocol+browser test suites, SLO-backed thresholds, and monitoring.
How this skill is triggered — by the user, by Claude, or both
Slash command
/grafana-app-sdk:k6-perf-test-websiteThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
An end-to-end, opinionated workflow for performance-testing any
assets/README.mdassets/package.jsonassets/recordings/README.mdassets/recordings/scripts/recorder.template.jsassets/tests/run-all.shassets/tests/workflow.template/average.jsassets/tests/workflow.template/breakpoint.jsassets/tests/workflow.template/browser.jsassets/tests/workflow.template/from-har.jsassets/tests/workflow.template/protocol.jsassets/tests/workflow.template/smoke.jsassets/tests/workflow.template/soak.jsassets/tests/workflow.template/spike.jsassets/tests/workflow.template/stress.jsassets/tools/lg-monitor.shassets/tools/run-with-monitor.shreferences/functional-tests.mdreferences/gotchas.mdreferences/grafana-investigation.mdreferences/hybrid-load-design.mdk6-perf-test-websiteAn end-to-end, opinionated workflow for performance-testing any public website with k6. The skill produces:
This skill enforces a few opinions you should not silently override:
tests/lib/. Iteration-body duplication is preferred;
each script reads cleanly on its own during incident review.k6 version) — required for stable k6/browser,
expect(), async/await iteration functions, and per-request tags.npx playwright install chromium).har-to-k6 (npm i -D har-to-k6).Tools the skill prefers when installed:
mcp-k6 — script creation, Playwright→k6/browser migration, API
lookup. Prefer over hand-writing k6 boilerplate.mcp-grafana — in-session Prometheus/Loki/Tempo/Pyroscope queries
during §9 backend investigation.gcx — Grafana Cloud CLI for shell-friendly queries, datasource
discovery, and Grafana Cloud k6 cloud-run dispatch.k6 binary — local validation runs and breakpoint hunting.k6 x docs (xk6-docs) — look up k6 API surface when writing or
editing scripts without mcp-k6 available.If these tools are not configured the skill falls back to plain CLI
tools (k6, npx, curl) and hand-written scripts. The skill does
not own toolchain setup; defer to the user's existing setup process.
Explicit non-goals:
Tick these off in order. Each step has a section below.
assets/. §2tests/run-all.sh
until green. §4The single most important step. Without explicit workflows, every later step is guesswork.
Ask the user the questions in references/workflow-elicitation.md
and record answers in a runbook.md alongside the scaffolded project.
You must capture: 2-4 named workflows, credentials, read vs write, destructive actions to avoid during soak, worry list, existing SLOs, backend ownership and Grafana access, and per test type whether each runs locally or in Grafana Cloud k6.
If the user can't name at least one workflow, stop and clarify; do not proceed.
Copy the assets/ tree from this skill into the user's chosen
directory. The skill's assets/ directory is at <SKILL_DIR>/assets/,
where <SKILL_DIR> is the absolute path to this skill's directory —
your harness exposes this (e.g. opencode prefixes skill metadata with
a Base directory for this skill: line). If you can't determine
<SKILL_DIR> from context, ask the user.
cp -R "<SKILL_DIR>/assets/." "<target-dir>/"
If cp -R is blocked by sandbox permissions, copy files individually
via your agent's file-write tool.
The scaffolded layout:
<target-dir>/
├── package.json
├── .gitignore
├── README.md
├── runbook.md # you create from §1 answers
├── recordings/
│ ├── README.md
│ └── scripts/
│ └── recorder.template.js # copy per workflow → wN-<short-name>.js, …
├── tests/
│ ├── run-all.sh
│ └── workflow.template/ # copy per workflow → wN-<short-name>/, …
│ ├── from-har.js
│ ├── protocol.js
│ ├── browser.js
│ ├── smoke.js
│ ├── average.js
│ ├── stress.js
│ ├── spike.js
│ ├── soak.js
│ └── breakpoint.js
└── tools/
├── lg-monitor.sh
└── run-with-monitor.sh
For each workflow: copy recorder.template.js → recordings/scripts/wN-<short-name>.js,
copy tests/workflow.template/ → tests/wN-<short-name>/, and
replace <WORKFLOW_PLACEHOLDER> markers with the workflow's short name.
Then install:
cd <target-dir> && npm install && npx playwright install chromium
Per workflow:
recordings/scripts/wN-<short-name>.js: user-action sequence,
recordHar.urlFilter regex (allow-list the target host; block
third-party RUM/ads — see references/recording-with-playwright.md),
and a real Chrome userAgent (the default HeadlessChrome UA
triggers bot-blocking on many sites).node recordings/scripts/wN-<short-name>.js → writes recordings/har/wN-<short-name>.harnpx har-to-k6 recordings/har/wN-<short-name>.har -o tests/wN-<short-name>/from-har.jsfrom-har.js (audit trail for bundle-path changes).If the recorder fails or produces an unusable HAR (bot-blocking,
missing hydration, third-party noise), see the Recording section of
references/gotchas.md and references/recording-with-playwright.md.
Prefer mcp-k6 recording and migration tools if available.
Per workflow:
from-har.js into protocol.js — drop per-request
UA headers, rename groups, parameterise BASE_URL, replace session
tokens, drop sleep(1), add expect() on every load-bearing
response. Full procedure in references/functional-tests.md.browser.js from the Playwright recorder using the
5-step procedure in references/functional-tests.md../tests/run-all.sh. Do not proceed to §5 until it exits 0.Prefer mcp-k6 migration tools for Playwright→k6/browser conversion.
Adjust the opinionated defaults in assets/tests/workflow.template/
to the user's stated SLOs from §1. Four layers:
Default globals:
http_req_failed: ['rate<0.01'],
http_req_duration: ['p(95)<500'],
checks: ['rate>0.99'],
Per-endpoint tagging:
http.get(`${BASE_URL}/api/things`, { tags: { name: 'GetThings' } });
'http_req_duration{name:GetThings}': ['p(95)<400', 'p(99)<800'],
Web Vitals:
browser_web_vital_lcp: ['p(95)<2500'],
browser_web_vital_inp: ['p(95)<200'],
browser_web_vital_cls: ['p(95)<0.1'],
See references/slo-design.md for per-iteration tuning, the
performance.mark custom-Trend pattern, iteration_completed Rate,
breakpoint abort-on-fail thresholds, and loosening rules.
Per workflow, one file per test type. Each file has a protocol scenario (drives load) plus a single browser VU (measures Web Vitals under load). Breakpoint is protocol-only — a browser VU adds noise to the signal.
| Type | Executor | Defaults |
|---|---|---|
| smoke | constant-vus | 3 VUs × 1m |
| average | ramping-vus | 0→20→0 over 14m |
| stress | ramping-vus | 0→50→0 over 20m |
| spike | ramping-vus | 0→100→0 over 2m |
| soak | ramping-vus | 0→10→0 over 70m |
| breakpoint | ramping-arrival-rate | 5/s→500/s over 20m, abortOnFail |
Tune per workflow once you've seen smoke results. See
references/test-types.md for rationale and references/hybrid-load-design.md
for why one file per type and why duplication between files is acceptable.
./tools/run-with-monitor.sh tests/wN-<short-name>/smoke.js
Starts lg-monitor.sh in the background, runs k6, then prints a
summary verdict: OK (≥30% idle), NOTE (10–30%), or WARNING
(<10%). If WARNING, the laptop is the bottleneck — reduce VUs, switch
to cloud, or split across multiple LGs. See references/lg-monitoring.md.
For each test type assigned to cloud in the §1 runbook:
k6 cloud login works (the skill does not own auth setup).k6 cloud run tests/wN-<short-name>/<type>.jsCost reminder: browser VU-hours are billed 10× protocol VU-hours.
Soak and breakpoint are the most expensive. Check limits before long runs.
See references/local-vs-cloud.md.
Only if the user owns the backend and has Grafana access.
mcp-grafana or gcx datasources list.from/to for the run window).See references/grafana-investigation.md for the full flow including
how to verify absence before reporting it.
Fill in the report template from references/reporting.md:
Always be specific. "Latency is high" is not a finding. "GetPizza p(95) hit 1.4s at iteration ~200; correlated with sustained 100% CPU on the recommender service per Grafana panel link" is.
references/workflow-elicitation.md — verbatim question script for §1.references/recording-with-playwright.md — HAR capture, third-party filter regex, hydration signals.references/functional-tests.md — 5-step Playwright→k6/browser conversion procedure.references/hybrid-load-design.md — protocol + 1 browser VU rationale, duplication argument.references/slo-design.md — full threshold rationale, async vs sync metric capture.references/test-types.md — definitions and defaults for all six test types.references/lg-monitoring.md — why the sidecar exists, how to read its output.references/local-vs-cloud.md — framing, cost model, per-test-type tradeoffs.references/grafana-investigation.md — generic backend investigation flow.references/gotchas.md — generic pitfalls.references/reporting.md — final report template.npx claudepluginhub grafana/skills --plugin grafana-app-sdkGuides k6 load testing for APIs, WebSockets, browsers; writes scenarios (smoke/load/stress/spike/soak), sets thresholds, analyzes results, integrates with CI/CD.
Creates and runs load tests with k6, JMeter, and Artillery for web apps and APIs. Validates performance under stress, spike, soak, scalability to detect bottlenecks.
Writes and debugs k6 load test scripts in JavaScript/TypeScript covering all test types, thresholds, checks, scenarios, executors, and CI/CD integration.