From pm-engineering
Generates a blameless incident postmortem with timeline, root cause analysis, impact summary, and closeable action items from rough notes.
How this skill is triggered — by the user, by Claude, or both
Slash command
/pm-engineering:incident-postmortemThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill produces a complete, blameless incident postmortem document following industry-standard format. Output enforces blameless framing throughout — system gaps over individual failures — and drives toward specific, closeable action items rather than vague process commitments.
This skill produces a complete, blameless incident postmortem document following industry-standard format. Output enforces blameless framing throughout — system gaps over individual failures — and drives toward specific, closeable action items rather than vague process commitments.
Ask the user for these if not provided:
Incident ID: [ID] Severity: [P1/P2/P3] Date: [Date] Duration: [Start time → Resolution time — total duration] Status: [Resolved / Monitoring / Ongoing] Author: [Leave blank for user to fill] Last updated: [Date]
[3–5 sentences. Describe what happened, who was affected, and what was done to resolve it. Written for a non-technical stakeholder. No jargon. No blame.]
| Dimension | Details |
|---|---|
| Users affected | [Number or percentage] |
| Services degraded | [List affected services] |
| Business impact | [Revenue, SLA breach, support tickets, etc. if known] |
| Duration | [Total time from first detection to full resolution] |
List events in chronological order. Each entry: [HH:MM UTC] — [What happened. Who did what. What changed.]
Rules for timeline entries:
Primary root cause: [One clear sentence. Technical but plain. "A misconfigured deployment config caused..."]
Contributing factors:
Why did our existing safeguards not prevent this? [Honest paragraph explaining why monitoring, tests, or processes didn't catch this earlier. This is where blameless analysis matters most — focus on system gaps, not individual failures.]
What fixed it? [Clear description of the actual fix — one paragraph] Why did this work? [Brief technical explanation] Was there a temporary mitigation before full resolution? [Yes/No — describe if yes]
| # | Action | Owner | Due Date | Priority |
|---|---|---|---|---|
| 1 | [Specific, testable action] | [Team or person] | [Date] | P1/P2/P3 |
Rules for action items:
[3–5 honest observations about the response. Include: fast collaboration, good runbooks used, effective escalation, clear communication. This section builds team confidence and reinforces good habits.]
[3–5 key insights from this incident that are worth sharing beyond this team. Write these as transferable lessons — e.g. "Our runbook for database failover didn't account for read-replica lag. All runbooks involving database failover should be reviewed."]
[Optional — list external communications sent: status page updates, customer emails, support responses. Include timestamps.]
npx claudepluginhub mohitagw15856/pm-claude-skills --plugin pm-engineeringDocuments incidents, outages, or production failures with blameless post-mortems. Includes timeline, root cause analysis, and action items.
Guides writing blameless postmortems for SEV1/SEV2 incidents using templates, timelines, root cause analysis, and action items to foster learning.
Guides writing blameless postmortems for incident reviews, root cause analysis, and organizational learning.