From mcp-grafana
Use before building, generating, or reviewing a Grafana panel or dashboard. Starter style guide covering panel-level rules (units, legends, thresholds, titles, descriptions — timeseries-focused) and dashboard-level rules (row sequencing, drill-down chaining, repeating-panel caps, named anti-patterns). Modeled on the kubernetes-mixin / monitoring-mixins corpus, Grafana Labs' Mimir / Loki / Tempo reference dashboards, and the canonical observability dashboard literature (Shneiderman 1996, Stephen Few, the SRE Workbook). Copy into your skills / rules directory and edit for your team; mcp-grafana ships it as a starter, not a managed default.
How this skill is triggered — by the user, by Claude, or both
Slash command
/mcp-grafana:grafana-style-guideThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A starter style guide for Grafana, modeled on the kubernetes-mixin /
A starter style guide for Grafana, modeled on the kubernetes-mixin /
monitoring-mixins corpus. The prose explains why each convention
exists; the JSON block at the end is a GrafanaStyleGuide instance
suitable for passing to the lintPanel library function or the
grafana_panel_lint MCP tool.
This skill is a copyable artifact. Fork it, edit it, version it in your own dotfiles — mcp-grafana does not auto-update or otherwise manage the copy you install. If your team disagrees with any rule below, the right move is to change it in your local copy.
Design top-down, signal-first. Grafana dashboards follow Ben Shneiderman's "overview first, zoom and filter, details on demand" (Visual Information-Seeking Mantra, 1996) — also called the inverted pyramid in BI writing, and consistent with Stephen Few's at-a-glance monitoring (upper-left wins, because location is the dominant emphasis channel).
The first row is the fold — it loads first, it is the only thing
some viewers will read, and it must answer is anything on fire? in
under a second. Rows below add detail in widening scope; linked
dashboards carry context further via preserved templating
variables ($cluster → $namespace → $instance). The reader's
eye learns one row shape and scans down; one click and the next
dashboard arrives pre-filtered.
The pattern has two known failure modes worth designing around:
The rules below implement this philosophy. ## Dashboards carries
the concrete row-by-row prescription; the panel sections carry the
within-panel rules.
This v0.2 of the guide covers Grafana panels (units, legends, thresholds, titles, descriptions — timeseries-focused; stat / table / heatmap follow the same unit and description rules) and dashboard- level conventions (row sequencing, drill-down chaining, repeating- panel discipline, anti-patterns).
Areas not yet covered, planned for subsequent revisions:
level:metric:operations naming
(per Prometheus's recommended pattern), scrape-interval alignment.Forks may add or remove sections freely. The skill is a starter, not a specification.
Grafana display units control how raw numbers render in tooltips, axes, and legends. The choice carries a small amount of meaning every viewer absorbs; inconsistency across panels in one dashboard is a readability tax.
*_total directly is
a panel that goes up forever. Wrap in rate() / irate() /
increase() before display and pick a per-second unit:
reqpsopswpsbytes (binary IEC) for almost everything; use decbytes (SI)
only when the underlying metric is documented in SI (rare in
Prometheus exposition; common in cloud-provider APIs).s (Grafana auto-scales to ms / µs).dtdurations for the
"3d 7h" rendering.percentunit. Set min: 0, max: 1 so the y-axis
doesn't autoscale and visually exaggerate small movements.percent. Same advice on min/max.short.locale. It renders with thousands separators that differ
between viewers' locales — breaks screenshot diffs, breaks copy-paste
into incident docs.none. If the metric truly is unitless, say short. none
reads as "I forgot."Legend choice should follow the cardinality of the series the panel produces, which is a property of the PromQL, not of the panel type.
mean, lastNotNull, max calcs. Readable at a glance, and the
three calcs cover "where is it now", "where does it usually sit", and
"how bad does it get". Examples that qualify as bounded:
2xx | 3xx | 4xx | 5xx)up{job="..."} for a small fixed set of jobsp50 | p90 | p99)legendFormat strings that include the metric name. The panel title
already names the metric; the legend should name the series within it
(a label value, not the metric).0.995 because that's the SLO is more useful than a red line at
0.99 because it's round.grafana_dashboard_inspect's summary view; treat that count
as a quality metric for your dashboard.This guide's prose is deepest on timeseries panels, but builders now
exist for all the common types (timeseries, stat, table, state-timeline,
heatmap, gauge, plus row) and the lint primitive already enforces
stat-specific rules (stat.requiresComparison, stat.handlesUnknown)
and a layout rule that keys on panel type (layout.firstRowCategorical).
Per-type style conventions beyond those continue to grow; the unit and
description rules apply uniformly across panel types in the meantime.
instant queries rarely render well as tables.min and max
(the panels.gauge.requiresBounds lint rule enforces this) — without
them Grafana auto-scales the arc and the needle position is
meaningless. The kubernetes-mixin / Mimir / Loki / Tempo corpus
deliberately avoids gauges and uses stats with color thresholds even
for percent-style metrics; gauges are a community-dashboard idiom
worth questioning before adopting.The single most important dashboard-level decision is what the operator sees before they scroll. Everything else is downstream.
Top to bottom, every general-purpose service dashboard rhymes:
overview.libsonnet dashboard puts a 6-unit state-timeline ("are
writes / reads / rules / alerting / storage passing?") next to a
3-unit firing-alerts list; numeric tiles are deferred to row 2.
The instinct to lead with big numbers ("requests/sec: 12,345!") is
wrong — the operator's first question is is anything red?, not
what's the value?.QPS | Latency (p50/p99) | Per-instance p99 —
so a reader learns one row shape and scans down. This is the
literal convention in the Mimir / Loki / Tempo writes and reads
dashboards; the per-instance third panel is what makes outliers
visible at incident speed.Dashboards that are not general-purpose (per-node, per-pod, per-component deep-dives) deliberately skip the fold row and open with timeseries — the audience knows what they're looking at and will scroll. Don't blindly apply the row-1 fold convention.
Because that distinction is intent, not structure, the machine-checked
form of this rule (layout.firstRowCategorical, below) is opt-in by
tag: tag your overview dashboards overview and the linter checks
the fold only on those, leaving drill-downs alone. Adopt the tag as a
team convention and the "lead with categorical health" rule enforces
itself on exactly the dashboards it should.
A green SLO tile with no comparison context is what Tufte calls a
"service-engine-soon" light: it tells the operator nothing about
why. Every top-row aggregate must carry a comparison signal — a
trailing sparkline, a previous-period delta (+12% vs 7d), or a
small-multiple grid alongside — so "compared to what?" is answerable
without leaving the row.
repeat by $variable is the rule of thumb up to ~10 instances
(Grafana imposes no hard cap; ~10 is the operator-attention budget,
not a technical limit). Beyond that, it becomes the
per-device-page anti-pattern from the Cacti / Observium era —
pages enumerating every interface, every sensor, every CPU core as
its own panel, scaling by adding pixels rather than ranking. Above
~10, replace with one of:
grafana_table_panel_build; rank descending,
cap at 20-50)grafana_state_timeline_panel_build; rows =
entities, columns = time, color = state)grafana_heatmap_panel_build; rows = entities, color =
current value)All three scale to hundreds of entities without exhausting pixels or operator attention.
KPIs with daily or weekly cycles deserve a single row showing the
same metric at four resolutions side-by-side (1h / 24h / 7d / 30d
via per-panel timeFrom overrides). The MRTG tradition (Oetiker,
1995) got this right; modern Grafana dashboards mostly dropped it
because the time picker became dogma and TSDB-backed multi-range
queries were expensive. With recording rules and downsampled tiers
the cost is now negligible. Add when seasonality matters
(diurnal traffic, weekly batch loads, monthly billing cycles).
Scroll for related detail at the same scope; click for deeper
detail at a different scope. The crucial rule: linked dashboards
must preserve template variables. $cluster → $namespace →
$instance → $pod should chain so an operator landing on a
per-pod dashboard arrives pre-filtered. If a click forces re-picking
variables, the dashboard set has a hole.
Grafana stats default to two thresholds (green / red). The NMS tradition encodes more states; the canonical mapping worth adopting:
1 → up / green0 → down / rednull / NaN → grey (telemetry absent is not healthy — a
missing series must not read as a green tile). Grafana 12's
default threshold logic colors null using the lowest threshold
band, which silently produces green-or-red depending on your
bottom band. To get grey-for-null reliably, set an explicit value
mapping (Null / NaN → grey) on the stat panel or configure
noValue text + a neutral background. Don't rely on threshold
inheritance.The acknowledged dimension is orthogonal to the value: silenced alerts should still show, with reduced visual weight, so the operator knows the system is still in the bad state.
Encode variance, central tendency, and reliability in one panel rather than three. The SmokePing tradition is the canonical illustration — median RTT as a colored line, distribution as graded "smoke" behind it, packet loss as a color shift in the line itself. The Grafana-native equivalents are under-used:
Three panels for "latency p50", "latency p99", "error rate" can often collapse to one panel that says the same thing with less scanning.
Grafana's own data: ~60% of hand-built dashboards go unused
(Grafana 7.4 release post, 2021).
For any dashboard set bigger than ~3 services, generate dashboards
from a single template rather than hand-building each — the
monitoring-mixins pattern
(dashboards + alerts + recording rules as a generated bundle).
Every service inherits the same row sequence, the same panel widths,
the same drill-down chain. Hand-tweaking individual dashboards is
what produces sprawl. For Grafana-12-era teams not on jsonnet, this
library's TS builders (buildTimeseriesPanel, buildRowPanel,
buildStatPanel, buildTablePanel, buildStateTimelinePanel,
buildHeatmapPanel, buildGaugePanel today) give the same generative
path on a different substrate.
repeat by $host over a fleet
of 50+ instances, producing a 200-panel scroll. See "Repeating
panels" above.This skill carries the opinion — what good looks like. The project-authored guidance documents carry the workflow — how to apply opinion to an existing dashboard (find → loop update → validate):
mcp://grafana/docs/guidance/units.md — audit and fix panel units
across a dashboard.mcp://grafana/docs/guidance/descriptions.md — audit and fill
missing descriptions.mcp://grafana/docs/guidance/thresholds.md — when to set a
threshold vs refuse.mcp://grafana/docs/guidance/bulk-panel-updates.md — the
panel_find → loop panel_update → validateDashboard pattern
the audit workflows lean on.mcp://grafana/docs/guidance/session-resource-registry.md — keep
large dashboard JSON out of LLM context: dashboard_load once,
pass the URI to every read/write tool, dashboard_export only when
you actually need the JSON back. Reach for it on dashboards over a
few hundred lines, or when chaining multiple write operations.mcp://grafana/docs/guidance/scaffold-from-metrics.md — turn a
Prometheus /metrics scrape into a committable, lint-clean dashboard
(the build leg).mcp://grafana/docs/guidance/audit-review.md — load → inspect →
lint → prioritise → fix → verify an existing dashboard (the audit leg).mcp://grafana/docs/guidance/pr-review.md — dashboard_diff two
dashboards and turn the semantic deltas into a risk-ordered changelist
(the review leg).Each guidance doc defers opinion to this skill; this skill defers workflow to those docs. Reach for both.
The lint primitive (lintPanel / lintDashboard) currently checks
the structural rules in the JSON block below: dashboard-level
(duplicateTitles, maxRepeat, hiddenButReferenced,
emptyDefault, preservesVariables, datasourceDeclared,
orphanRow, unreferenced, layout.firstRowCategorical,
layout.panelOverlap), and
panel-level (stat.requiresComparison covers the sparkline-on-
aggregate rule per #53; stat.handlesUnknown covers the explicit
null/NaN handling rule per #56; gauge.requiresBounds covers the
fixed-range rule per #93; targets.promqlValid runs PromQL
syntactic validation against the same Lezer grammar Grafana's PromQL
editor uses, with Grafana templating variables pre-substituted so
$__rate_interval etc. don't trigger false positives;
targets.promqlSemantic adds the offline AST semantic check per #97 —
v1 catches the range-vector requirement, rate(foo) with no [5m]).
The
overview-first fold composition rule (issue #54) is now
machine-checked via layout.firstRowCategorical — but only on
dashboards explicitly tagged as overviews (see that rule below); the
unscoped row-sequence judgement (row 2 = RED/USE, rows 3..N =
pipeline-ordered decomposition) and multi-timescale strips remain
review-checklist items. Treat those as review items until lint
catches up.
GrafanaStyleGuide JSONThe block below is the input the lintPanel library function /
grafana_panel_lint MCP tool consumes. It's a GrafanaStyleGuide
umbrella; cross-type rules (units, descriptions) live nested under
panels so the panel slice (PanelStyleGuide) is everything needed
to lint one panel.
{
"panels": {
"timeseries": {
"legend": {
"placement": "right",
"displayMode": "table",
"calcs": ["mean", "lastNotNull", "max"]
}
},
"stat": {
"requiresComparison": true,
"handlesUnknown": true
},
"gauge": {
"requiresBounds": true
},
"targets": {
"promqlValid": true,
"promqlSemantic": true
},
"units": {
"allowList": [
"short", "percent", "percentunit",
"reqps", "ops", "wps", "rps",
"bytes", "decbytes",
"s", "ms", "ns", "dtdurations"
],
"deny": ["locale", "none"]
},
"descriptions": {
"required": true
}
},
"dashboards": {
"panels": {
"duplicateTitles": true,
"maxRepeat": 10,
"datasourceDeclared": true,
"orphanRow": true
},
"variables": {
"hiddenButReferenced": true,
"emptyDefault": true,
"unreferenced": true
},
"links": {
"preservesVariables": true
},
"layout": {
"firstRowCategorical": { "overviewTag": "overview" },
"panelOverlap": true
}
}
}
The dashboards block configures rules that span the whole dashboard
(rather than checking one panel). duplicateTitles flags non-row
panels that share a title (rows and repeat-using panels are
excluded — both legitimately share titles). Accepts true / false
for the simple case, or { "except": ["Title 1", "Title 2"] } to
exempt intentional duplicates (e.g. a KPI stat panel paired with its
timeseries trend that share a title by convention):
"duplicateTitles": { "except": ["Requests", "Errors"] }
hiddenButReferenced
flags templating variables with hide: 2 (both label and value
hidden in the UI) interpolated in a panel or row title — viewer sees
the value without context. emptyDefault flags query /
datasource / interval variables with no current.value
(custom, constant, textbox, adhoc are exempt because empty
is legitimate for those). unreferenced flags a templating variable
that is never interpolated anywhere — dead config. Detection searches
the whole dashboard for $v / ${v} / [[v]] plus bare-name
repeat fields, so a variable used indirectly (in another variable's
query, an annotation, a link, a transformation, or a repeat) is not
flagged; adhoc variables are exempt (they apply filters implicitly,
never by name).
maxRepeat caps the cardinality of repeat by $variable panels.
The default suggested above (10) is a rough budget — a row of 10
panels is dense but readable; 50+ is the Cacti-era per-device-page
anti-pattern (a wall of identical charts that nobody reads). Above
the cap, prefer a Top-N table, a state-timeline matrix, or a
heatmap instead. Accepts number (shown above) or { "max": N }.
preservesVariables flags internal dashboard-to-dashboard links
(URL path /d/ or /dashboard/) that drop every referenced
templating variable. The viewer lands with empty selectors and has
to re-pick everything they already had. Partial drops are
intentional — a per-pod → per-cluster drill-up legitimately drops
$pod — and not flagged. External URLs (runbooks, GitHub, etc.)
are always ignored.
datasourceDeclared flags any non-row panel without a usable
datasource ref — either the field is missing or it's an empty
{} (no uid and no type). Without it Grafana falls back to the
instance-wide default datasource; if no default is set the panel
queries nothing and renders blank. That's the "silent broken
dashboard" failure mode: layout, titles, and even thresholds look
fine on screen, but no data flows. Templating-variable refs
({ uid: "$datasource", type: "prometheus" }) pass — they are the
standard multi-environment pattern, resolved at render time. Row
panels are excluded (rows don't query). Severity is warn rather
than error because the instance default might cover the panel —
but relying on it is fragile (different envs, missing default,
dashboard imported to a Grafana where the default datasource is
different). Set explicitly.
orphanRow flags a row panel with no panels under it — a dead
section header that renders as a blank band. A row counts as orphan
when its nested panels[] is empty and it is the last panel or is
immediately followed by another row (so an expanded row whose
children are flat siblings after it, the modern layout, is correctly
not flagged). Remove the row or move panels into it. Severity info.
layout.firstRowCategorical flags an overview dashboard whose
first row (the fold) is a wall of numbers/graphs — stat, gauge,
timeseries, barchart, bargauge — with no categorical-health
panel (state-timeline, alertlist) among them. The operator's
first question is is anything red?, not what's the value? (see
## Dashboards → "Row sequence"). Because there is no structural
"this is an overview dashboard" signal in Grafana JSON — and
drill-down / per-service / per-pod dashboards legitimately open
with timeseries — the rule must be scoped explicitly. The
recommended convention is to tag overview dashboards overview
and configure { "overviewTag": "overview" }; the rule then fires
only on dashboards carrying that tag. (true fires on every
dashboard — use it only for a style-guide copy that governs a folder
of nothing but overview dashboards.) Detection reads the top band of
positioned top-level panels; a fold that already carries a
state-timeline or alertlist passes, and a text-only header fold never
fires. The fix: lead row 1 with a state-timeline
(grafana_state_timeline_panel_build) plus an alertlist, and defer
numeric tiles to row 2.
layout.panelOverlap flags two top-level panels whose gridPos
rectangles intersect — the panels share grid cells and one renders on
top of the other, hiding its data. Pure geometry (no taste): rectangles
overlap when a.x < b.x+b.w && b.x < a.x+a.w && a.y < b.y+b.h && b.y < a.y+a.h (strict, so panels placed edge-to-edge are fine). Rows and
collapsed-row children are excluded. The build path never produces
overlaps; this guards hand-edited and imported dashboards. Severity
warn — it actively hides data. Fix: reposition so the rectangles
don't intersect.
stat.requiresComparison flags stat panels with
options.graphMode === "none" (or absent — provisioned dashboards
routinely omit the field). The sparkline gives the viewer a
comparison signal alongside the current value; without it, a stat
panel shows just a number, and a number without trend context is the
"aggregate ≠ summary" failure mode this skill's ## Dashboards
section calls out. Set graphMode: "area" or "line" on every stat
panel to opt in to the comparison.
stat.handlesUnknown flags stat panels with no explicit signal for
what to show when the value is null or NaN. Grafana's default
behaviour is to inherit the lowest threshold band's colour for
null — silently green (or red, on a reverse-coloured panel) rather
than the "no data" the operator expects. Fix either way:
fieldConfig.defaults.mappings[] entry
with type: "special" and options.match: "null" (or "nan" /
"null+nan"). The dominant pattern in the wild (e.g. node-exporter
dashboards) sets result.text: "N/A" and omits the color field
entirely. That's deliberate: text-only mappings communicate "no
data" without overriding the panel's threshold palette.noValue — set fieldConfig.defaults.noValue to a non-empty
string (e.g. "N/A", "–"). Simpler when you don't need per-shape
distinction between null, NaN, and empty.The rule checks presence of either escape hatch, not the colour of
the mapping result. Earlier triage (#56) considered a stricter
unknownIsGrey shape with a colour-tolerance policy; fixture evidence
showed real null-mapping JSON omits colours entirely, so the colour
check would have overfit. If a real bug surfaces (operator tripped by
an explicitly mis-coloured null), a sharpened sub-rule lands then.
targets.promqlValid flags any panel target whose expr field fails
to parse against the PromQL grammar — same Lezer grammar Grafana's
own PromQL editor, Mimir's editor, and the Prometheus UI all build on
(@prometheus-io/lezer-promql). Severity warn: the dashboard
imports fine and the rest of the panel renders; the broken target
just produces "no data" at query time. Catches typos like unclosed
brackets (rate(foo[5m), malformed durations ([5xyz]), missing
operands, broken operator chains. Grafana templating variables
($__rate_interval, ${env}, [[env]]) are pre-substituted with
grammar-safe placeholders before parsing, so a real-world stored
expression like rate(http_requests_total[$__rate_interval]) doesn't
trigger a false positive. Only target.expr is checked; non-
Prometheus target fields (query for Loki, rawQuery for SQL) have
different syntax and are intentionally skipped. For the standalone
tool form (validate one expression at a time, mid-composition), call
grafana_promql_validate instead of running the dashboard-level rule.
targets.promqlSemantic is the companion semantic check —
expressions that parse cleanly but fail at query time. v1 catches the
range-vector requirement: a range-vector function (rate, irate,
increase, delta, deriv, the *_over_time family, …) applied to a
bare instant vector — rate(http_requests_total) with no [5m], the
single most common PromQL mistake (Prometheus errors with "expected
range vector, got instant vector"). It analyses the same Lezer AST
offline — no metric metadata, no network. It fires only on the exact
bare-VectorSelector-argument shape (near-zero false positives), runs
only on syntactically-valid expressions, and skips arguments carrying a
Grafana variable (whose expansion it can't see). Type-aware checks
(rate() on a gauge) would need live metric metadata and stay out of
scope. Severity warn. Future additive extensions: function arity,
the quantile_over_time second-argument form.
legend.calcs accepts two shapes. A bare string[] (shown above) is
set-equal — order of the calcs in the array is ignored; the panel
matches as long as it carries the same multiset. This is the common
case: "every legend should carry the same aggregations regardless of
which column ended up first." To opt into order-sensitivity, use the
explicit form { "expected": [...], "match": "exact" }; the "set"
match mode is equivalent to the bare-array default. Subset / superset
modes are deliberately not supported — the cross-set "which extras
are OK?" question is taste-laden and belongs in this skill, not in
code. If different dashboard families need different calc sets, fork
the skill copy and carry both.
Rule identifiers in lintPanel / grafana_panel_lint use JSONPath-style
dotted paths into this umbrella shape — e.g.
panels.timeseries.legend.placement references the placement field
nested under panels.timeseries.legend, and panels.units.allowList
references the cross-type unit allow list nested under panels.units.
Not literal keys with dots in them.
The MCP tool accepts either the full umbrella above or the
PanelStyleGuide slice ({ timeseries?, units?, descriptions? })
directly; it unwraps the umbrella by extracting .panels when that
key is present at the top level.
grafana/mimir/operations/mimir-mixin, grafana/loki/production/loki-mixin, grafana/tempo/operations/tempo-mixin — production patterns from the team that ships Grafana. The Mimir overview dashboard is the strongest single example of the state-timeline + alert-list "fold."unit strings.repeat by $variable.Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub jburgess/mcp-grafana --plugin mcp-grafana