From grafana-app-sdk
Monitors Grafana Cloud costs, sets usage alerts, attributes spending by label, and reduces cardinality with Adaptive Metrics/Logs. Use for observability budget analysis and optimization.
How this skill is triggered — by the user, by Claude, or both
Slash command
/grafana-app-sdk:cost-managementThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **Docs**: https://grafana.com/docs/grafana-cloud/cost-management-and-billing/
Docs: https://grafana.com/docs/grafana-cloud/cost-management-and-billing/
Access: My Account → Cost Management (or within your Grafana Cloud stack)
FOCUS-compliant (FinOps Open Cost and Usage Specification) billing dashboards showing:
Tag your telemetry at ingestion to enable per-team cost reporting:
// Add cost attribution labels in Alloy
prometheus.remote_write "cloud" {
endpoint {
url = sys.env("PROMETHEUS_URL")
basic_auth {
username = sys.env("PROM_USER")
password = sys.env("GRAFANA_CLOUD_API_KEY")
}
}
external_labels = {
team = "platform",
project = "checkout-service",
env = "production",
}
}
loki.write "cloud" {
endpoint {
url = sys.env("LOKI_URL")
basic_auth {
username = sys.env("LOKI_USER")
password = sys.env("GRAFANA_CLOUD_API_KEY")
}
}
external_labels = {
team = "platform",
project = "checkout-service",
}
}
Set alerts before you hit quota or budget thresholds:
# Alert when approaching metrics quota
groups:
- name: grafana-cloud-usage
rules:
- alert: MetricsUsageHigh
expr: grafana_cloud_metrics_active_series / grafana_cloud_metrics_limit > 0.8
for: 1h
labels:
severity: warning
annotations:
summary: "Grafana Cloud metrics usage >80% of quota"
- alert: LogsIngestionHigh
expr: increase(grafana_cloud_logs_bytes_ingested_total[24h]) > 50e9 # 50GB/day
labels:
severity: warning
annotations:
summary: "Grafana Cloud log ingestion >50GB today"
Automatically identifies unused or high-cardinality metrics and generates aggregation rules.
# View recommendations
curl https://yourstack.grafana.net/api/plugins/grafana-adaptive-metrics-app/resources/v1/recommendations \
-H "Authorization: Bearer <token>"
# Apply aggregation rule — drops high-cardinality labels from a metric
- match: "^http_request_duration_seconds.*"
action: keep
match_labels:
- method
- status_code
- service
# Drops: pod, container, instance, node — reduces series from 10k → 50
Workflow:
Drop or sample log lines before ingestion using Loki's pipeline stages in Alloy:
loki.process "filter_logs" {
forward_to = [loki.write.cloud.receiver]
// Drop health check logs (high volume, low value)
stage.drop {
expression = ".*GET /health.*"
}
// Drop debug logs in production
stage.drop {
source = "level"
expression = "debug"
}
// Sample verbose info logs (keep 10%)
stage.sampling {
rate = 0.1
source = "level"
value = "info"
}
}
Use Alloy tail-based sampling to keep only important traces:
otelcol.processor.tail_sampling "cost_control" {
decision_wait = "10s"
policy {
name = "keep-errors"
type = "status_code"
status_code { status_codes = ["ERROR"] }
}
policy {
name = "keep-slow"
type = "latency"
latency { threshold_ms = 1000 }
}
policy {
name = "sample-rest"
type = "probabilistic"
probabilistic { sampling_percentage = 5 }
}
output {
traces = [otelcol.exporter.otlp.cloud.input]
}
}
# Active metric series (billed unit for metrics)
grafana_cloud_metrics_active_series
# Series by label (find high-cardinality sources)
topk(20, count by (__name__) ({__name__=~".+"}))
# Log bytes ingested per stream
sum(increase(loki_ingester_chunk_size_bytes_sum[24h])) by (namespace, app)
# Trace spans ingested
rate(tempo_distributor_spans_received_total[5m])
topk(20, count by (__name__))team, project) to all Alloy configs| Signal | Billing Unit |
|---|---|
| Metrics | Active series (unique label combinations) |
| Logs | Bytes ingested |
| Traces | Spans ingested |
| Profiles | Bytes ingested |
| Synthetic Monitoring | Check executions |
| k6 | VUh (Virtual User hours) |
npx claudepluginhub grafana/skills --plugin grafana-app-sdkAnalyzes Prometheus metrics usage and generates aggregation rules to reduce Active Series count and lower Grafana Cloud costs.
Queries Prometheus and Loki billing metrics via Grafana API for active series, ingestion rates, storage usage, cardinality, and observability costs.
Automates test-driven Grafana Cloud observability setup: SLOs, alerting, synthetic monitoring, k6 load testing, IRM on-call, dashboards, cost optimization, GitOps export.