From draft
Manages incident lifecycle with three modes: new (triage, communicate, mitigate), update (status update), and postmortem (blameless RCA report).
How this skill is triggered — by the user, by Claude, or both
Slash command
/draft:incident-responseThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are managing an incident through its full lifecycle using structured incident management practices.
You are managing an incident through its full lifecycle using structured incident management practices.
Communicate first. Fix second. Learn always.
ls draft/ 2>/dev/null
This skill works standalone — incidents don't wait for project setup.
core/shared/draft-context-loading.md./draft:incident-response new <description> — Start new incident/draft:incident-response update <status> — Post status update/draft:incident-response postmortem — Generate postmortem report/draft:incident-response (no args) — Interactive: ask which modeClassify severity:
| Level | Response Time | Who | Examples |
|---|---|---|---|
| SEV1 | Immediate, all-hands | Entire team | Data loss, complete outage, security breach |
| SEV2 | 15 minutes | On-call + team lead | Major feature broken, significant degradation |
| SEV3 | 1 hour | On-call | Minor feature broken, workaround exists |
| SEV4 | Next business day | Assigned engineer | Cosmetic issue, minor inconvenience |
Assess:
draft/product.md user types if available)draft/.ai-context.md service topology if available)Generate initial status update:
INCIDENT: {description}
Severity: SEV{1-4}
Impact: {who/what is affected}
Status: Investigating
Commander: {name or "unassigned"}
Next update: {time — SEV1: 15min, SEV2: 30min, SEV3: 1hr}
get_issue, get_issue_description, get_issue_comments)curl/wget to fetch dashboards or error pages mentionedssh to access remote log paths if mentionedlast 24h)Following core/agents/ops.md production-safety mindset:
Document all actions taken with timestamps.
Save to: draft/incidents/incident-<timestamp>.md or draft/tracks/<id>/incident.md
# Incident: {description}
| Field | Value |
|-------|-------|
| **Severity** | SEV{N} |
| **Status** | {Investigating/Mitigating/Resolved} |
| **Started** | {timestamp} |
| **Commander** | {name} |
## Timeline
| Time | Action |
|------|--------|
| {time} | Incident detected |
| {time} | Triage: classified as SEV{N} |
| {time} | {mitigation action} |
## Evidence
| Source | Finding |
|--------|---------|
| {source} | {finding} |
## Status Updates
{chronological updates}
git log for related commits during incident windowReference core/agents/rca.md methodology:
5 Whys Analysis:
Root Cause Classification:
Detection Lag: When was the bug introduced vs when was it detected?
SLO Impact: Which SLOs were affected and by how much?
HLD Claims vs Reality (): If the affected service has a hld.md (search draft/tracks/*/hld.md for §Detailed Design components matching the failing module), compare incident behavior against HLD claims:
draft/tracks/<id>/hld.md §Resiliency) for each gap — avoid markdown anchor slugs since renderers (GitHub, mkdocs, Hugo) generate different slugs for nested headings. These citations feed the §Action Items as "amend HLD §X — claim was {claim} but reality showed {reality}."MANDATORY: Include YAML frontmatter with git metadata. Follow core/shared/git-report-metadata.md.
Save to: draft/incidents/postmortem-<timestamp>.md with symlink postmortem-latest.md
Or track-scoped: draft/tracks/<id>/postmortem.md
# Postmortem: {incident title}
## Summary
{2-3 sentences: what happened, impact, duration}
## Impact
- **Duration:** {start} to {end} ({total time})
- **Users affected:** {count or percentage}
- **SLO impact:** {which SLOs, by how much}
- **Data impact:** {any data loss or corruption}
## Timeline
| Time | Event |
|------|-------|
| {time} | {event} |
## Root Cause
{1-2 sentence root cause statement}
### 5 Whys
1. Why? → {answer}
2. Why? → {answer}
...
### Classification
- **Type:** {classification}
- **Detection Lag:** {introduced} → {detected} = {gap}
## What Went Well
- {positive observations}
## What Went Wrong
- {things that made the incident worse}
## Design Claims vs Reality
{populated when an HLD was available — list each HLD claim that did not hold, citing the specific §section}
| HLD Section | Claim | Reality During Incident | Recommended HLD Amendment |
|-------------|-------|-------------------------|---------------------------|
| §Resiliency | {what was claimed} | {what actually happened} | {how to update HLD} |
## Action Items
| # | Action | Owner | Deadline | Status |
|---|--------|-------|----------|--------|
| 1 | {detection improvement} | {name} | {date} | [ ] |
| 2 | {process improvement} | {name} | {date} | [ ] |
| 3 | {code improvement} | {name} | {date} | [ ] |
| 4 | Amend `draft/tracks/<id>/hld.md` §{section} (if claim drift identified) | {design owner} | {date} | [ ] |
Follow core/shared/jira-sync.md:
⚠️ Test Writing Guardrail: If postmortem identifies missing tests, ASK: "Want me to create regression test tasks? [Y/n]"
/draft:new-track when incident keywords detected in description/draft:regression (find the breaking commit), /draft:learn (update guardrails)/draft:new-track for the fixIf no incident file found (update/postmortem mode): List available incidents, ask which one If no Jira ticket: Proceed without sync, note: "Link a Jira ticket for automatic sync"
npx claudepluginhub drafthq/draft --plugin draftRuns incident response workflow: triage severity and roles, draft communications, track mitigation, generate blameless postmortem from alerts or status updates.
Execute structured live incident response: declare severity, assign roles, mitigate, communicate, resolve, and run blameless postmortems for production incidents.
Guides structured incident response from detection through post-mortem, including severity classification, response workflow, and post-mortem templates.