From infra
Draft a blameless postmortem from incident evidence - logs, timeline, Slack/chat transcript, and deploy history.
How this skill is triggered — by the user, by Claude, or both
Slash command
/infra:draft-postmortem <incident-id or path to evidence directory><incident-id or path to evidence directory>This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Build a blameless postmortem draft from incident evidence - logs, chat transcripts, deploy events, and a user-supplied timeline - into a markdown document ready to paste into the org's incident doc store. This skill is **read-only** — it never posts to chat, mutates incident tracking tools, or publishes the draft anywhere.
Build a blameless postmortem draft from incident evidence - logs, chat transcripts, deploy events, and a user-supplied timeline - into a markdown document ready to paste into the org's incident doc store. This skill is read-only — it never posts to chat, mutates incident tracking tools, or publishes the draft anywhere.
The user runs /draft-postmortem <incident-id or path>. The argument is either:
INC-1234) used to label the document and search nearby evidence directories, orIf the argument is omitted, prompt once with AskUserQuestion for the incident ID and an evidence path before proceeding.
Use AskUserQuestion to collect what the evidence files alone cannot tell you:
Question: "I need a few framing details before drafting."
Options:
1. "Incident start (UTC)"
2. "Incident end (UTC)"
3. "Impacted services"
4. "Severity (SEV1..SEV4)"
5. "Links or paths to logs, chat, deploys, dashboards"
Collect answers field by field. If the user declines to provide a field, record [needs input] and continue — never fabricate a value. Then enumerate the evidence directory:
*.log, *.json, *.ndjson — structured log exports.slack-*.json, chat-*.txt, transcript*.md — chat transcripts.deploys*.csv, deploys*.json, *.release.yaml — deploy history.dashboard-*.png, grafana-*.png — attached screenshots (reference by filename, do not attempt OCR).Correlate evidence into a unified timeline. Every row must use the exact format:
HH:MM UTC - <actor or system> - <event>
Rules for timeline construction:
(per <source>) and mark the row (conflict).HH:MM UTC [needs input] for the time.Apply 5-whys to the proximate cause, then separate findings into three labeled layers:
Do not assign blame to individuals in any layer.
After producing the three layers, dispatch to the principal-sre subagent to critique the 5-whys analysis. Ask it to look for: short-circuited "why" chains, layer mis-assignment (calling a latent condition a proximate cause), and missing system conditions (rollback path, alert routing, change gating). Integrate its findings as additional rows in the appropriate layers; do not let it rewrite the postmortem voice or replace this skill's verdict labels.
Agent({
subagent_type: "principal-sre",
description: "Critique the 5-whys",
prompt: "Review this incident's contributing-factor analysis: <proximate cause + contributing factors + latent conditions you produced>. Stress-test the 5-whys: where does the chain stop short? What system conditions made this failure possible that aren't yet listed? Return top findings in severity order."
})
The skill remains owner of the postmortem document and blameless framing. The agent only sharpens the contributing-factor analysis.
Assemble the document with these sections in order:
Summary — 3-5 sentences covering what broke, for whom, for how long.Impact — user-visible effect, revenue impact, SLO burn. Use [needs input] when data is missing; never estimate revenue.Timeline — the table from step 2.Detection — how the incident was first detected (alert, customer report, operator); time-to-detect.Response — who paged, mitigations attempted in order, what worked.Root-cause analysis — the three-layer output from step 3.What went well — at least one item sourced from evidence; if nothing is evident, write [needs input].What didn't go well — sourced from evidence or chat transcript; no speculation.Action items — table with columns Item | Owner | Target date | Type. Type is one of prevent | detect | mitigate | document. When owner or date is unknown, write TBD.Scan the completed draft and rewrite any sentence where a named person is the grammatical subject of a verb tied to the failure.
Keep names only in neutral context (e.g. "IC: Alice; Comms: Bob" in the Response section). Flag remaining name-subject-verb patterns in a final ### Blameless review notes section if any survived rewriting.
# Postmortem - <incident-id>
**Severity:** <SEV?> **Status:** Draft
**Window (UTC):** <start> -> <end> **Duration:** <HH:MM>
**Impacted services:** <list>
## Summary
...
## Impact
- Users: ...
- Revenue: [needs input]
- SLO burn: ...
## Timeline
| Time (UTC) | Actor / system | Event |
|------------|----------------|-------|
| 15:42 | deploy-bot | Release v1.42.0 rolled to prod |
| 15:48 | prometheus | HighErrorRate alert fired on api-svc |
...
## Detection
...
## Response
...
## Root-cause analysis
**Proximate cause:** ...
**Contributing factors:** ...
**Latent conditions:** ...
## What went well
...
## What didn't go well
...
## Action items
| Item | Owner | Target date | Type |
|------|-------|-------------|------|
| Add canary coverage for /healthz | TBD | TBD | prevent |
| Fix runbook URL in alert definition | TBD | TBD | detect |
### Blameless review notes
- <any remaining name-subject-verb sentences to revisit>
This skill produces a document, not a pass/fail gate. Close the report with one of the following readiness labels:
[needs input] markers, zero rows in Blameless review notes.[needs input] marker or TBD owner, no blameless violations. Safe to circulate internally.[needs input]. Do not circulate until resolved.[needs input] and continue.[needs input] or TBD.$ARGUMENTS
npx claudepluginhub brenthaertlein/universal-skills --plugin infraProvides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.