From coreweave-pack
Guides triage and resolution of CoreWeave GPU incidents for inference services using kubectl checks, common fixes, and rollback procedures.
How this skill is triggered — by the user, by Claude, or both
Slash command
/coreweave-pack:coreweave-incident-runbookThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
```bash
# 1. Check pod status
kubectl get pods -l app=inference -o wide
# 2. Check recent events
kubectl get events --sort-by=.lastTimestamp | tail -20
# 3. Check node status
kubectl get nodes -l gpu.nvidia.com/class -o wide
# 4. Check GPU health
kubectl exec -it $(kubectl get pod -l app=inference -o name | head -1) -- nvidia-smi
kubectl rollout undo deployment/inference
For data handling, see coreweave-data-handling.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin coreweave-packDiagnoses and fixes CoreWeave Kubernetes errors: GPU scheduling failures, pending pods, CUDA OOM, NCCL timeouts, PVC mounting, image pull issues.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.