From engineering
Triage and manage production incidents. Trigger with "we have an incident", "production is down", "something is broken", "there's an outage", "SEV1", or when the user describes a production issue needing immediate response.
How this skill is triggered — by the user, by Claude, or both
Slash command
/engineering:incident-responseThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Guide incident response from detection through resolution and postmortem.
Guide incident response from detection through resolution and postmortem.
| Level | Criteria | Response Time |
|---|---|---|
| SEV1 | Service down, all users affected | Immediate, all-hands |
| SEV2 | Major feature degraded, many users affected | Within 15 min |
| SEV3 | Minor feature issue, some users affected | Within 1 hour |
| SEV4 | Cosmetic or low-impact issue | Next business day |
Provide clear, factual updates at regular cadence. Include: what's happening, who's affected, what we're doing, when the next update is.
Blameless. Focus on systems and processes. Include timeline, root cause analysis (5 whys), what went well, what went poorly, and action items with owners and due dates.
npx claudepluginhub 8gg-git/knowledge-work-plugins --plugin engineeringClassifies production incidents by severity (SEV1-P3), assembles response teams, coordinates diagnosis, communication, and resolution tracking for outages and critical issues.
Runs incident response workflow: triage severity and roles, draft communications, track mitigation, generate blameless postmortem from alerts or status updates.
Execute structured live incident response: declare severity, assign roles, mitigate, communicate, resolve, and run blameless postmortems for production incidents.