mcp-grafana
Strongly-typed Grafana asset builders (dashboards, panels, alerts, contact
points, …) with an MCP surface for LLM clients. Targets Grafana 12.x.
Status: pre-1.0 (0.3.1). The library is usable for a small but
growing set of Grafana assets and exposes them through an MCP server.
The API may change as the surface grows. See AGENTS.md and
research.md for the design and the open decisions.
Install

# As a library or CLI
pnpm add @jburgess-js/mcp-grafana
# or: npm i @jburgess-js/mcp-grafana
# or: yarn add @jburgess-js/mcp-grafana
# As an MCP server, no install needed — npx fetches on demand
npx -y @jburgess-js/mcp-grafana
Why this exists
Grafana dashboards-as-code in TypeScript, with three layers:
- Typed builders over the official Apache-2.0
[
@grafana/grafana-foundation-sdk][foundation-sdk] — schema-valid
Grafana JSON, deterministic output, narrow composable functions.
- Deterministic primitives for the things an LLM can't reliably do
itself: parsing Prometheus exposition format, validating dashboard
shape, walking and patching existing dashboards. Opinion (RED /
USE / golden signals patterns, panel style conventions) lives in
markdown — under
docs/guidance/ for project-authored guidance and
under skills/ for user-installable shareable opinions — so the
model can read and reason about it without us encoding heuristic
rules in TypeScript that duplicate its training (see
AGENTS.md §1.8 and research.md
Entry 011).
- An MCP server that exposes the builders and primitives as tools,
serves the markdown guidance + skills as resources, and surfaces the
flagship workflows (scaffold / audit / review) as prompts, so LLM
clients can compose Grafana assets and commit them as code.
Grafana's own [Metrics Drilldown][drilldown] already solves interactive,
runtime automatic exploration of metrics. This project is for the
committable, versioned, asset-as-code half of the problem.
Where the "intelligence" comes from
When a workflow like scaffolding a dashboard from /metrics
turns raw metrics into a complete dashboard, it's fair to ask: what
decides which panels to build? The deliberate answer is not a
hardcoded generator. There is no scaffold_dashboard() function that
embeds "a counter with a status label means an errors panel" — that
kind of judgement would duplicate what an LLM already knows and rot into
brittle taste-in-code (the reason it's excluded — AGENTS.md §1.8,
research.md Entry 011). The intelligence is the LLM, reading two
things this project ships:
- Curated, source-backed conventions in
skills/grafana-style-guide/SKILL.md.
This is where the best practices live — RED / USE / golden-signals,
row sequencing (categorical "fold" first), unit conventions, legend
cardinality, repeating-panel caps. They aren't invented; the skill's
References section cites the kubernetes-mixin / monitoring-mixins
corpus, Grafana Labs' own Mimir / Loki / Tempo reference
dashboards, Shneiderman (1996), Tufte, the Google SRE Workbook, and
the RED / USE method papers.
- An operational recipe —
docs/guidance/scaffold-from-metrics.md
— that connects the parsed facts to those conventions to the builders.
What the project itself guarantees (vs. what the model is merely
guided toward) splits in two:
- Machine-enforced by
grafana_dashboard_lint (the conventions that
are structural and deterministic): units allow/deny, descriptions
required, timeseries-legend rules, stat.requiresComparison /
handlesUnknown, gauge.requiresBounds, targets.promqlValid /
promqlSemantic, datasourceDeclared,
duplicateTitles, maxRepeat, orphanRow,
layout.firstRowCategorical, layout.panelOverlap, the
variable-hygiene rules
(hiddenButReferenced, emptyDefault, unreferenced), and
links.preservesVariables — the full rule set in
skills/grafana-style-guide/SKILL.md.
- Prose-guided only (taste a linter can't mechanically check): the
deeper signal-first hierarchy — system-wide RED on row 2,
pipeline-ordered per-component rows, multi-timescale strips.