From vanguard-frontier-agentic
Reviews Prometheus and AlertManager configuration for cardinality explosion risks, alert correctness, scrape security, remote_write safety, and retention adequacy.
How this skill is triggered — by the user, by Claude, or both
Slash command
/vanguard-frontier-agentic:prometheus-alerting-cardinality-reviewThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill reviews Prometheus and AlertManager configuration for cardinality explosion risks, recording rule adequacy, alert expression correctness, routing tree safety, scrape configuration security, and retention posture. Cardinality explosion is the leading cause of Prometheus OOM crashes in production, and flapping alerts from missing `for:` durations erode on-call trust faster than any oth...
This skill reviews Prometheus and AlertManager configuration for cardinality explosion risks, recording rule adequacy, alert expression correctness, routing tree safety, scrape configuration security, and retention posture. Cardinality explosion is the leading cause of Prometheus OOM crashes in production, and flapping alerts from missing for: durations erode on-call trust faster than any other alerting defect.
user_id, request_id, session_id, url_path, pod_hash) — these cause cardinality explosion and must be moved off the label set or aggregated away.prometheus_tsdb_head_series exceeding 5 million as a cardinality warning threshold; note it if the user reports series counts or if the config makes it likely.for: 0m, for: 0s, or no for: field as HIGH — bare threshold alerts flap on every scrape jitter.honor_labels: true on any scrape target that is not a trusted federation endpoint as HIGH — it allows the scraped workload to override job and instance labels.http://external-host) as a potential SSRF candidate and flag it.remote_write configs where write_relabel_configs drop non-__ metric labels — data loss is silent.remote_write or Thanos/Cortex integration as MEDIUM compliance risk.Load these only when needed:
Return, at minimum:
npx claudepluginhub raishin/vanguard-frontier-agentic --plugin vanguard-frontier-agenticGuides Prometheus setup, metric scraping, recording rules, alerting, and service discovery for infrastructure and application monitoring.
Configures Prometheus for time-series metrics collection, scrape targets, service discovery, recording rules, and federation patterns for multi-cluster deployments. Useful when setting up centralized monitoring for microservices or migrating to a modern observability stack.