Skill

incident

IT incident response workflow. This skill should be used when the user asks to "log an incident", "handle an incident", "create incident report", "document an outage", "incident response", "service is down", "major incident", or describes a service disruption requiring structured response. Also use when the user mentions SRE concepts like error budget impact, blameless review, or AIOps-detected incidents. Guides through the full ITIL v5 incident lifecycle: detect, classify, investigate, resolve, communicate, and close.

Popularity

Parent stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/sysadmin-skills:incident <description of incident or symptoms>

User invocable

Model invocable

Inline context

Default effort

Argument hint<description of incident or symptoms>

Tool Access

This skill is limited to the following tools:

ReadWriteEditGlobGrepBashWebSearchWebFetchAskUserQuestion

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Structured incident response following ITIL 4 Incident Management. Produces a complete incident report as a local markdown file.

SKILL.md

181 lines · ~1.8k tokens

Stats

LanguageJavaScript

Parent stars1

MaintenanceFair

Last CommitApr 9, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Incident Response Workflow

Structured incident response following ITIL 4 Incident Management. Produces a complete incident report as a local markdown file.

Related Skills (auto-loaded)

itil — priority classification (P1-P4), incident management process, escalation matrix
comms — stakeholder notification templates, severity-based tone guidance
done-ops — verification gate before closing the incident

Output

File: {OUTPUT_DIR}/incident-{YYYY-MM-DD}-{slug}.md

slug: lowercase kebab-case summary, e.g. email-outbound-failure, vpn-india-outage
Incident ID (INC-{YYYYMMDD}-{NNN}): {NNN} is a daily sequence number starting at 001. If the organization uses a ticketing system (ServiceNow, Jira SM), use that system's ID instead.

Default output directory: ./incidents/. If a docs/incidents/ directory exists in the project, use that instead. Ask the user if neither exists. A user-specified path takes precedence over all defaults.

All incident documents are written in English. Before writing to file, preview the content for the user in their preferred language.

Workflow

Step 1: Gather Information

Collect the following from the user (use AskUserQuestion for missing items):

Field	Required	Example
What is happening?	Yes	"Email service is not sending outbound messages"
When did it start?	Yes	"Noticed at 09:15 today" / "Alerts fired at 14:30"
Who/what is affected?	Yes	"All users in HQ" / "Customer-facing API"
Severity indicators	Yes	Is there a workaround? How many users? Revenue impact?
What has been tried?	No	"Restarted the mail service, no change"

Step 2: Classify (SEV then P)

First — assign Severity (objective impact scope):

Level	Scope	Definition
SEV-1	All users / all regions	Core service completely down, no workaround
SEV-2	Many users or major function	Significant degradation, partial workaround
SEV-3	Subset of users, non-critical	Limited impact, workaround available
SEV-4	Single user or negligible	Cosmetic, proactive, informational

Then — assign Priority (handling order, Impact × Urgency):

	High Urgency	Medium Urgency	Low Urgency
High Impact	P1 Critical	P2 High	P3 Medium
Medium Impact	P2 High	P3 Medium	P4 Low
Low Impact	P3 Medium	P4 Low	P4 Low

Present both SEV and P to the user for confirmation. MUST use the exact format: SEV-1, SEV-2, P1, P2 (not "Critical" or "High" alone).

If SEV-1 or P1/P2: Major Incident procedures apply (see itil skill reference references/incident-management.md).

Step 3: Document Timeline

Create the incident report file with the initial entry. MUST follow this exact structure — do not rearrange or rename sections:

# Incident Report: {TITLE}

| Field | Value |
|---|---|
| Incident ID | INC-{YYYYMMDD}-{NNN} |
| Severity | SEV-{1/2/3/4} |
| Priority | P{1/2/3/4} |
| Status | Investigating |
| Reported by | {NAME/SYSTEM} |
| Start time | {YYYY-MM-DD HH:MM TZ} |
| Affected services | {LIST} |
| Affected users | {SCOPE} |

## Impact Summary

{One paragraph describing what users are experiencing.}

## Timeline

| Time | Event |
|---|---|
| {HH:MM} | {Event description} |

## Investigation

{Findings so far, diagnostics performed, hypotheses.}

## Resolution

{To be completed upon resolution.}

## Follow-up Actions

- [ ] {Action items identified during the incident}

Step 4: Draft Communications

Using the comms skill, draft the appropriate notifications:

Initial notification — based on priority level and audience
Present the draft to the user for review before including in the report

For P1/P2: remind the user about the update cadence (P1: every 15-30 min, P2: every 30-60 min).

If a status page update is also needed, coordinate with the /statuspage skill. The incident report contains full technical detail; the status page update contains only impact-focused, user-safe information.

Step 5: Track Progress

As the user provides updates:

Append to the Timeline section
Update the Status field (Investigating → Identified → Monitoring → Resolved)
Draft follow-up communications as needed
Add to Investigation section with findings

Step 6: Resolution

When the user indicates the issue is resolved:

Load done-ops verification gate
Walk through the incident closure checklist:
- Service restored and verified?
- User/reporter confirmation obtained?
- Monitoring stable (no recurring alerts)?
- Documentation complete?
- Problem record needed? (recurring incident or unknown root cause)
Update the incident report:
- Set Status to Resolved
- Record resolution time
- Fill in the Resolution section
- Calculate total duration
Draft the resolution notification using comms templates

Step 7: Close and Follow-up

Complete the incident report:

Record all follow-up actions with owners and due dates
Note if a Post-Incident Review is needed (mandatory for P1/P2, recommended for P3)
Recommend creating a Problem record if root cause is unknown
Present the final report to the user for review

SRE Postmortem (for SEV-1 and SEV-2): If this was a SEV-1 or SEV-2 incident, a traditional PIR may not be sufficient. Prompt the user:

"This was a {SEV-1/SEV-2} incident. Consider running a blameless postmortem using /postmortem — it produces a more systematic analysis with contributing factors, what went well, and SMART action items. Use /problem if root cause is unknown and the incident is likely to recur."

Error budget: If the organization tracks SLOs, note the estimated error budget impact in the follow-up section.

Gotchas

Always use the user-facing service name in communications, not internal system names
Never include IP addresses, hostnames, or credentials in the incident report
For P1 incidents, do not wait for complete information before drafting the initial notification — speed matters
Resolution time = time from detection to confirmed restoration, not when the ticket is closed
If the incident was caused by a recent change, link the change request ID in the report
Ask the user whether this incident needs to follow their organization's ticketing system (ServiceNow, Jira Service Management, etc.) in addition to the local markdown report

incident

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

incident

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

Incident Response Workflow

Related Skills (auto-loaded)

Output

Workflow

Step 1: Gather Information

Step 2: Classify (SEV then P)

Step 3: Document Timeline

Step 4: Draft Communications

Step 5: Track Progress

Step 6: Resolution

Step 7: Close and Follow-up

Gotchas

Similar Skills

Incident Response Workflow

Related Skills (auto-loaded)

Output

Workflow

Step 1: Gather Information

Step 2: Classify (SEV then P)

Step 3: Document Timeline

Step 4: Draft Communications

Step 5: Track Progress

Step 6: Resolution

Step 7: Close and Follow-up

Gotchas

Similar Skills