From kube-ai-devkit
Diagnose why a Kubernetes service is unhealthy using Tilt resource data, logs, and pod events
How this skill is triggered — by the user, by Claude, or both
Slash command
/kube-ai-devkit:k8s-debugThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Systematically diagnose why a Kubernetes service is unhealthy in the local Tilt environment.
Systematically diagnose why a Kubernetes service is unhealthy in the local Tilt environment.
{service} — The Tilt resource name to diagnose (required)1.1 Resource Status
Call tilt_get_resource for the service. Record:
runtimeStatus and updateStatus1.2 Logs
Call tilt_logs with lines: 200. Search for:
ERROR, Exception, Fatal, panicConnection refused, ECONNREFUSEDOOMKilled, signal: killedmigration, database, table1.3 Full Description
Call tilt_describe for K8s-level events and pod spec details.
1.4 Code Changes (if applicable)
# Check recent changes to the service source
git log --oneline -10 -- src/{ServiceName}/ 2>/dev/null || true
Check each common failure mode:
| Failure Mode | Indicators | Root Cause |
|---|---|---|
| Build error | updateStatus: error | Compilation failure — check build error message |
| CrashLoopBackOff | High restart count, runtimeStatus: error | Startup crash — check logs for exception |
| ImagePullBackOff | Pod status contains ImagePull | Image not found — check image name/tag/registry |
| OOMKilled | Logs show signal: killed or K8s event OOMKilled | Memory limit too low |
| Dependency down | Logs show Connection refused to postgres/nats/redis | Upstream resource not ready |
| Migration failure | Logs show migration errors | Database schema issue |
| Config error | Logs show missing env var or config key | Missing ConfigMap/Secret |
| Pending | runtimeStatus: pending, no pod | Insufficient cluster resources |
Apply 5-Whys to the identified failure:
## Diagnosis: {service}
**Status:** runtime={status}, update={status}
**Pod:** {podName} (restarts: {count})
### Root Cause
{One sentence description}
### Evidence
- {Key log line or event}
- {Build error or condition}
- {Recent code change if relevant}
### Fix
{Specific action to take — command, code change, or config update}
### Prevention
{What to add/change to prevent recurrence}
For common issues, suggest the fix directly:
tilt_trigger {dependency} or check its logsGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub makigjuro/kube-ai-devkit --plugin kube-ai-devkit