From obol
Read-only queries against metrics and logs from a deployed Obol DVpod on Kubernetes. Use when investigating cluster health, charon errors, peer connectivity, validator duty performance, or beacon node behavior on a running DVpod. For deploying, configuring, or modifying a DVpod (including enabling Prometheus or Loki shipping), use the `dvpod` skill. For Obol's hosted Grafana with cross-cluster fleet view, use the `obol-monitoring` skill.
How this skill is triggered — by the user, by Claude, or both
Slash command
/obol:dvpod-monitoring [query type] [release] — e.g. health my-dv-pod, errors, peers, duties, logs[query type] [release] — e.g. health my-dv-pod, errors, peers, duties, logsThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
You investigate the health and behavior of a running DVpod by querying its metrics and logs. **You never modify state** — no `helm upgrade`, no `kubectl apply`, no secret writes. If the user wants to enable monitoring, change values, or deploy anything, hand off to the `dvpod` skill.
You investigate the health and behavior of a running DVpod by querying its metrics and logs. You never modify state — no helm upgrade, no kubectl apply, no secret writes. If the user wants to enable monitoring, change values, or deploy anything, hand off to the dvpod skill.
dvpod skill: deploy, configure (including enabling Prometheus and configuring Loki shipping), upgrade, troubleshoot, destroy.obol-monitoring skill: Obol's hosted Grafana, fleet-wide view across clusters.For namespace, release, pod, and current-values discovery, read skills/dvpod/discovery.md.
Before any query, decide which path applies for metrics and logs by reading the release's current values:
helm get values <release> -n <namespace> -o json
centralMonitoring.enabled=true. There will be a prometheus Service on port 9090 in the namespace:
kubectl get svc prometheus -n <namespace>
Query path: port-forward svc/prometheus 9090 and hit the standard Prometheus HTTP API.serviceMonitor.enabled=true. Auto-discover a Prometheus the operator manages:
kubectl get prometheus -A -o jsonpath='{range .items[*]}{.metadata.namespace}/{.metadata.name}{"\n"}{end}' 2>/dev/null
kubectl get svc -A -l app.kubernetes.io/name=prometheus 2>/dev/null
kubectl get svc -A -l app=kube-prometheus-stack-prometheus 2>/dev/null
Port-forward whichever service is found. If multiple, ask the user which./metrics — always available, but single-instance and text-format only:
kubectl port-forward -n <namespace> <pod> 3620:3620
Use for spot checks; PromQL aggregation is not available against this endpoint.centralMonitoring via the dvpod skill.kubectl logs is always available and is the default. Works even on a stock deployment with no monitoring enabled.charon.lokiAddresses to a real Loki and can give you a queryable Loki HTTP endpoint. The chart configures push only — query routing is the user's responsibility.Use AskUserQuestion to clarify, but skip if the request is already specific.
health action)For the standard health check (readyz, active validators, peer connectivity, beacon node visibility, recent error/warn summary), run the bundled script:
bash skills/dvpod-monitoring/health.sh [release] [namespace]
helm list -A.centralMonitoring.enabled=true (bundled Prometheus). Errors out with guidance if not.Use this for all routine /dvpod-monitoring health invocations. Fall through to the per-query path below only when the user asks for something the snapshot doesn't cover (custom PromQL, deep log search, specific incident windows).
# Run port-forward in the background, then query.
kubectl port-forward -n <namespace> svc/prometheus 9090:9090 >/dev/null 2>&1 &
PF_PID=$!
trap "kill $PF_PID 2>/dev/null" EXIT
PROM=http://localhost:9090
# Instant query
curl -sG "$PROM/api/v1/query" --data-urlencode 'query=<PROMQL>'
# Range query (last 15m, 30s step)
curl -sG "$PROM/api/v1/query_range" \
--data-urlencode 'query=<PROMQL>' \
--data-urlencode "start=$(date -u -d '15 minutes ago' +%s)" \
--data-urlencode "end=$(date -u +%s)" \
--data-urlencode 'step=30s'
/metrics (no Prometheus)kubectl port-forward -n <namespace> <pod> 3620:3620 >/dev/null 2>&1 &
PF_PID=$!
trap "kill $PF_PID 2>/dev/null" EXIT
# Pull the whole metrics dump and grep for what you need
curl -s localhost:3620/metrics | grep -E '^(app_monitoring_readyz|core_scheduler_validators_active|p2p_ping_success)'
kubectl logs# Recent charon log lines
kubectl logs -n <namespace> <pod> -c charon --tail=200
# Errors in last 15 minutes
kubectl logs -n <namespace> <pod> -c charon --since=15m | grep -iE 'error|warn'
# Validator client logs
kubectl logs -n <namespace> <pod> -c validator-client --since=15m
# Across replicas in a release (if the StatefulSet has more than one pod)
kubectl logs -n <namespace> -l app.kubernetes.io/instance=<release> -c charon --tail=100 --prefix
LOKI=<user-provided-loki-base>
curl -sG "$LOKI/loki/api/v1/query_range" \
--data-urlencode 'query={service_name="charon"} |= "error"' \
--data-urlencode "start=$(date -u -d '15 minutes ago' +%s)000000000" \
--data-urlencode "end=$(date -u +%s)000000000" \
--data-urlencode 'limit=200'
For a query cookbook, see queries.md.
cluster_peer if present; show error/warn lines verbatim with timestamps; suppress repetitive noise."status":"error", surface the error/errorType and stop — do not invent results.When showing results, watch for and explicitly call out:
app_monitoring_readyz != 1 — node is not ready (BN unreachable, CL not synced, peer threshold not met).p2p_ping_latency_secs p90 — peer network is slow; check p2p_peer_connection_types for relayed vs direct.p2p_ping_success == 0 for a peer — that operator is unreachable.core_scheduler_validators_active < cluster_validators — some validators not active yet (DKG just completed and deposit unconfirmed, or validator exited).topic / component to identify the failing subsystem.skills/dvpod/troubleshooting.md for the operator-address mismatch case.dvpod skill./metrics works without Prometheus but supports only point-in-time scalar lookups.Parse the first word of $ARGUMENTS as the query type:
/dvpod-monitoring health [release] — readyz, peers, active validators/dvpod-monitoring errors [release] — recent charon errors/dvpod-monitoring peers [release] — peer connectivity / ping latency/dvpod-monitoring duties [release] — duty success/failure breakdown/dvpod-monitoring beacon [release] — beacon node latency / errors/dvpod-monitoring query <PROMQL> — raw PromQL passthrough/dvpod-monitoring logs [release] [pattern] — search logs (kubectl or Loki depending on what is available)/dvpod-monitoring (no args) — auto-detect a release, run pre-flight, and offer the menuCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub obolnetwork/skills --plugin obol