From sysadmin-skills
IT incident response workflow. This skill should be used when the user asks to "log an incident", "handle an incident", "create incident report", "document an outage", "incident response", "service is down", "major incident", or describes a service disruption requiring structured response. Also use when the user mentions SRE concepts like error budget impact, blameless review, or AIOps-detected incidents. Guides through the full ITIL v5 incident lifecycle: detect, classify, investigate, resolve, communicate, and close.
How this skill is triggered — by the user, by Claude, or both
Slash command
/sysadmin-skills:incident <description of incident or symptoms><description of incident or symptoms>This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Structured incident response following ITIL 4 Incident Management. Produces a complete incident report as a local markdown file.
Structured incident response following ITIL 4 Incident Management. Produces a complete incident report as a local markdown file.
File: {OUTPUT_DIR}/incident-{YYYY-MM-DD}-{slug}.md
email-outbound-failure, vpn-india-outageINC-{YYYYMMDD}-{NNN}): {NNN} is a daily sequence number starting at 001. If the organization uses a ticketing system (ServiceNow, Jira SM), use that system's ID instead.Default output directory: ./incidents/. If a docs/incidents/ directory exists in the project, use that instead. Ask the user if neither exists. A user-specified path takes precedence over all defaults.
All incident documents are written in English. Before writing to file, preview the content for the user in their preferred language.
Collect the following from the user (use AskUserQuestion for missing items):
| Field | Required | Example |
|---|---|---|
| What is happening? | Yes | "Email service is not sending outbound messages" |
| When did it start? | Yes | "Noticed at 09:15 today" / "Alerts fired at 14:30" |
| Who/what is affected? | Yes | "All users in HQ" / "Customer-facing API" |
| Severity indicators | Yes | Is there a workaround? How many users? Revenue impact? |
| What has been tried? | No | "Restarted the mail service, no change" |
First — assign Severity (objective impact scope):
| Level | Scope | Definition |
|---|---|---|
| SEV-1 | All users / all regions | Core service completely down, no workaround |
| SEV-2 | Many users or major function | Significant degradation, partial workaround |
| SEV-3 | Subset of users, non-critical | Limited impact, workaround available |
| SEV-4 | Single user or negligible | Cosmetic, proactive, informational |
Then — assign Priority (handling order, Impact × Urgency):
| High Urgency | Medium Urgency | Low Urgency | |
|---|---|---|---|
| High Impact | P1 Critical | P2 High | P3 Medium |
| Medium Impact | P2 High | P3 Medium | P4 Low |
| Low Impact | P3 Medium | P4 Low | P4 Low |
Present both SEV and P to the user for confirmation. MUST use the exact format: SEV-1, SEV-2, P1, P2 (not "Critical" or "High" alone).
If SEV-1 or P1/P2: Major Incident procedures apply (see itil skill reference references/incident-management.md).
Create the incident report file with the initial entry. MUST follow this exact structure — do not rearrange or rename sections:
# Incident Report: {TITLE}
| Field | Value |
|---|---|
| Incident ID | INC-{YYYYMMDD}-{NNN} |
| Severity | SEV-{1/2/3/4} |
| Priority | P{1/2/3/4} |
| Status | Investigating |
| Reported by | {NAME/SYSTEM} |
| Start time | {YYYY-MM-DD HH:MM TZ} |
| Affected services | {LIST} |
| Affected users | {SCOPE} |
## Impact Summary
{One paragraph describing what users are experiencing.}
## Timeline
| Time | Event |
|---|---|
| {HH:MM} | {Event description} |
## Investigation
{Findings so far, diagnostics performed, hypotheses.}
## Resolution
{To be completed upon resolution.}
## Follow-up Actions
- [ ] {Action items identified during the incident}
Using the comms skill, draft the appropriate notifications:
For P1/P2: remind the user about the update cadence (P1: every 15-30 min, P2: every 30-60 min).
If a status page update is also needed, coordinate with the /statuspage skill. The incident report contains full technical detail; the status page update contains only impact-focused, user-safe information.
As the user provides updates:
When the user indicates the issue is resolved:
Complete the incident report:
SRE Postmortem (for SEV-1 and SEV-2): If this was a SEV-1 or SEV-2 incident, a traditional PIR may not be sufficient. Prompt the user:
"This was a {SEV-1/SEV-2} incident. Consider running a blameless postmortem using
/postmortem— it produces a more systematic analysis with contributing factors, what went well, and SMART action items. Use/problemif root cause is unknown and the incident is likely to recur."
Error budget: If the organization tracks SLOs, note the estimated error budget impact in the follow-up section.
npx claudepluginhub bouob/claude-plugins --plugin sysadmin-skillsProvides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.