From ywc-agent-toolkit
Structures incident postmortems with timeline, root cause (5 Whys), impact assessment, and mitigation action items. Supports multi-language prompts and client-facing reports.
How this skill is triggered — by the user, by Claude, or both
Slash command
/ywc-agent-toolkit:ywc-incident-postmortemThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
**Announce at start:** "I'm using the ywc-incident-postmortem skill to write a structured incident postmortem."
Announce at start: "I'm using the ywc-incident-postmortem skill to write a structured incident postmortem."
| Excuse | Reality |
|---|---|
| "This is a minor incident — no postmortem needed" | Even 5-minute outages teach systemic lessons. A short postmortem beats none. |
| "I remember what happened — no need to write it down" | Memory degrades within 48 hours. Root cause narratives shift without documentation. |
| "Solo developers don't need postmortems" | Client accountability and personal learning both require structured records. |
| "I'll write it later when things calm down" | Postmortem accuracy drops sharply after 24-48 hours. Write while logs are fresh. |
| "The root cause is obvious — fix and move on" | Obvious causes miss systemic contributors. 5 Whys reveals what quick fixes hide. |
| "I already fixed it — action items are unnecessary" | Fixes without follow-up items repeat. Action items create accountability. |
| "This happened before — no need to document again" | Repeat incidents signal systemic failure. Each recurrence needs its own record to track patterns. |
| Flag | Description |
|---|---|
--draft | Interactive mode — asks questions and builds postmortem step by step (default) |
--template | Output a blank postmortem template without asking questions |
--client | Append a sanitized client-facing incident summary (no internal details) |
--format <markdown|html> | Output format. Default markdown. With html, writes a self-contained HTML report to claudedocs/. See html-output.md |
Before starting, gather incident evidence:
# Recent deployments to correlate with incident timeline
git log --oneline --since="3 days ago"
# Any recent tag or release
git describe --tags --abbrev=0 2>/dev/null || echo "(no tags)"
Step 1 — Gather basics Ask: service name, incident start time, end time, how detected (alert / user report / developer discovery), who responded.
Step 2 — Reconstruct timeline Build a chronological event log with timestamps (detection → investigation → root cause identified → fix deployed → resolved).
Step 3 — Assess impact Affected users (count or %), duration in minutes, severity (SEV1/SEV2/SEV3), SLA breach, revenue impact if known.
Step 4 — Root cause analysis
Apply 5 Whys. Identify the primary root cause and contributing factors separately. When the Claude Code runtime is in use and the named-agent catalog at claude-code/agents/ is installed, dispatch Task(subagent_type: ywc-root-cause-analyst) with the bounded packet (failure symptom + timeline excerpt + relevant code snippet) so an Opus-tier analyst walks the 5 Whys with explicit primary-cause vs contributing-factor separation and per-level evidence citations (claude-code/agents/ywc-root-cause-analyst.md). At most 1 dispatch per postmortem. Runtimes without named-agent dispatch perform the walk inline using the same discipline.
Step 4.5 — Security advisor dispatch (only when the incident crosses a security boundary) Run this step only when the Step 4 root cause involves one of: auth bypass, authorization failure, secret / token / credential leak, PII or sensitive-data exposure, data exfiltration, SSRF, IDOR, injection (SQL / command / template), or any A01–A10 OWASP category. Skip otherwise.
Procedure:
claude-code/agents/ is installed, dispatch Task(subagent_type: ywc-security-engineer) with the bounded payload. Otherwise dispatch a model: sonnet subagent with the same payload plus the canonical persona prompt copied from claude-code/agents/ywc-security-engineer.md Mission section.--client) keeps the redaction discipline: never expose the exploit chain, only the user-facing impact + the prevention classBudget: at most 1 dispatch per postmortem. A second security question signals scope split — file a follow-up postmortem.
Step 5 — Actions taken during incident List mitigation steps taken in real time (rollback, hotfix, manual data correction).
Step 6 — Prevention action items Generate specific, assignable items with deadlines. Not "improve monitoring" — "Add DB connection pool alert by YYYY-MM-DD".
Step 7 — Lessons learned What went well, what failed, what was surprising.
Step 8 — Client report (if --client) Generate sanitized summary: user impact + resolution + prevention steps. No internal names, stack traces, or architecture details.
| Level | Definition | Example |
|---|---|---|
| SEV1 | Complete outage, data loss, or security breach | Payment service down |
| SEV2 | Major feature unavailable or >10% user degradation | Login fails for subset of users |
| SEV3 | Minor feature degraded, workaround available | Export button broken |
Produces one or two Markdown documents:
--client)See references/postmortem-template.md for the full internal template. See references/client-report-template.md for the sanitized client template.
HTML mode (
--format html) — writes the postmortem as a self-contained HTML report instead of Markdown: a color-coded severity banner, a collapsible event timeline, and aCopy as Markdownbutton. Structure and conventions follow html-output.md. When--clientis also set, the sanitized client report is produced as a separate HTML file. The Markdown surface is preserved inside each file, so downstream integration is unaffected.
ywc-security-audit: If audit reveals an active exploit or breach, use this skill to write the postmortem.ywc-root-cause-analyst (Step 4): Opus-tier analyst walks 5 Whys with per-level evidence citations + primary-cause vs contributing-factor separation. At most 1 dispatch per postmortem.ywc-security-engineer (Step 4.5): when the Step 4 root cause crosses a security boundary (auth / authz / secret / PII / OWASP A01–A10). The advisor returns OWASP / CWE-cited findings and concrete remediation steps that feed back into the Root Cause section + Step 6 Prevention Action Items. Skipped when the incident is non-security.ywc-changelog-release-notes: Incident action items may drive a patch release; key fixes feed into the next changelog.| Mistake | Better approach |
|---|---|
| Writing "human error" as root cause | Human error is a symptom. Ask why the system allowed the error to cause impact. |
| Action items without deadline or owner | Every item needs an assignee and target date. "Fix monitoring" → "Add DB alert by YYYY-MM-DD" |
| Writing postmortem days later from memory | Write within 24-48 hours while logs and chat history are available. |
| Skipping impact quantification | Always quantify: number of users, duration, revenue or SLA impact if known. |
| Client report exposing internal details | Describe user impact and resolution only — no stack traces, service names, or architecture. |
Guides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub yongwoon/ywc-agent-toolkit