From milo-activity
Investigate a cluster incident by systematically checking recent activity, failed operations, and resource changes. Use when the user reports something is broken, asks "what happened", mentions an outage, incident, or error, or wants to understand recent cluster changes.
How this skill is triggered — by the user, by Claude, or both
Slash command
/milo-activity:investigate-incidentThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Investigate the following incident:
Investigate the following incident:
$ARGUMENTS
Start by establishing the time window and scope from the description above. If no time window is specified, default to the last 1 hour. If no namespace or resource is specified, search across all namespaces.
Follow your investigation methodology and return a structured report with timeline, key actors, failures, root cause hypothesis, and next steps.
npx claudepluginhub datum-cloud/claude-code-plugins --plugin milo-activityInvestigates live Kubernetes incidents: anchors timeline, bisects recent changes (rollouts, ConfigMaps, RBAC, HPA), classifies failure paths (OOM, DNS, cascading), and proposes mitigations.
Diagnoses production incidents by detecting environment, gathering symptoms, reading logs with Grep/Bash, checking metrics, tracing requests to find root causes and propose fixes with rollbacks.
Incident response — diagnose production issues, find root cause, propose fix with rollback. Use when asked about "something is broken", "production issue", "why is this down", "incident", or "debug production".