From extended-rca
Run a thorough, methodology-aware post-incident root cause analysis on a software outage or production incident. Use this skill whenever the user wants to understand *why* something really broke — after the incident is resolved and they hand over some mix of incident summary, timeline, logs, or a draft postmortem. Also use it when they say things like "do an RCA," "5 whys on this," "extended postmortem," "fishbone this incident," "root cause analysis," "contributing factors," or "what really went wrong." The skill combines narrative 5-whys and fishbone categorization with explicit layer separation (trigger / proximate / contributing / systemic) and corrective actions tagged by type (Prevent / Detect / Mitigate / Respond), and produces a structured engineering-grade writeup. Distinct from active incident triage — use `engineering:incident` / `engineering:incident-response` for that.
How this skill is triggered — by the user, by Claude, or both
Slash command
/extended-rca:extended-rcaThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Most RCAs stop too early. The team finds a plausible proximate cause ("the deploy introduced a bug"), writes it up, schedules one ticket, and moves on. Six weeks later the same class of incident recurs.
Most RCAs stop too early. The team finds a plausible proximate cause ("the deploy introduced a bug"), writes it up, schedules one ticket, and moves on. Six weeks later the same class of incident recurs.
A good extended RCA goes deeper in three ways at once:
Under time pressure, engineers reach for the first plausible story. This skill exists to resist that pull and produce analyses a team can actually learn from.
Use this skill when the incident is already resolved and the user wants depth. For active triage or in-the-moment incident response, point them at engineering:incident / engineering:incident-response instead. If they just want a short status-page blurb, this is overkill — offer a one-paragraph summary instead.
Look at what the user actually has and what they want from it:
references/review-rubric.md. Output is a findings list, not a rewritten artifact. Distinct from deepen: review critiques, deepen extends.references/trends-synthesis.md to cluster recurring systemic themes across incidents. The output is a trends document, not a per-incident RCA.The slash command's modes (deepen, review, trends, hypotheses) map to these shapes. Default mode runs the full universal phases.
The analysis progresses through these phases. They build on each other.
Before reasoning about causes, pin down what actually happened. Write a clean version of:
If you can't construct any of these with confidence, stop and ask one or two targeted questions. A shaky factual base will poison everything downstream. But don't over-interview — work with what the user has and flag real gaps in the final artifact.
This is the single most important habit of a good RCA: do not commit to one story yet. List at least three candidate causal narratives that could plausibly explain the symptoms — even when one feels obviously right. Confirmation bias is real and extremely hard to notice from the inside.
A helpful trick when the incident touches multiple concerns: generate hypotheses by category, fishbone-style. The classic "6 M's" from manufacturing don't quite fit software, so use these categories instead. Full details and prompting questions for each category are in references/fishbone.md — read it when you need to break out of tunnel vision.
Not every category will have content for every incident. But forcing yourself to ask "what could People have contributed here? Process? Data?" uncovers hypotheses you'd otherwise miss.
For each hypothesis, note briefly what evidence would confirm it and what would disconfirm it. Then weigh the actual evidence. Keep the losing hypotheses in the final artifact as a short "considered and ruled out" section with the disconfirming evidence — this prevents re-exploration later and shows the analysis was rigorous.
For the leading hypothesis (or hypotheses — sometimes there are two converging chains), build a why-chain. The classic 5-whys is a useful shape, but it has two failure modes this skill actively resists. They're covered in depth in references/five-whys.md; the short version:
A chain has reached enough depth when the terminal "why" points at something structural — a design choice, an ownership gap, a policy that doesn't exist, an incentive that works against correctness. "Alice merged a bug" is a trigger, not a root cause. "No test existed that could exercise real traffic patterns against staging, because investing in that tooling has lost every prioritization bet against feature work for the last three quarters" is a structural cause.
Structure findings into four distinct layers. Confusing them is the most common RCA failure mode — every layer gets smooshed into "the root cause" and the real systemic learning is lost.
Every significant incident has content at every layer. If the user's draft stops at trigger + proximate cause, your contribution as an extended RCA is most visible in the contributing and systemic layers.
Actions are more useful when tagged by what they do. Use this taxonomy:
A healthy set of actions has content in at least three of these buckets. A list of only "Prevent" actions is a red flag — it implicitly assumes the team will never fail again, which is not a safe assumption for any serious production system. For each action, note the layer of causation it addresses, an owner (or "TBD"), and whether it's must-do or should-consider. Three strong actions beat ten weak ones.
Before you declare the artifact done, run a verification pass against references/self-check.md. The checklist catches the failure modes this skill is designed to resist:
If any check fails, fix it before handing the artifact over. The self-check is not optional; it's what distinguishes an extended RCA from a shallow one that happens to follow the template.
Produce a Markdown document using the template in references/artifact-template.md. The structure there is deliberate — each section exists to resist a specific failure mode:
Read the template before writing so you internalize the structure. Then fill it in using the findings from the phases above.
Write in the past tense. Be specific — name services, endpoints, commits, config keys, flag names. Avoid vague verbs ("the system broke," "something went wrong") — say what broke and how.
Stay non-blameful in phrasing but not non-specific in content. "The engineer who merged the bad commit" is bad style. "The change was merged without a reviewer familiar with the pricing client's nil-handling, because review-routing rules don't account for cross-service calls" is good style — it's specific, it names a process gap, and it points at something fixable.
When a contributing or systemic cause reflects poorly on a team or a decision, say it clearly. Softening systemic findings into vagueness ("communication could be improved") defeats the point.
A good RCA is partly defined by what it refuses to do. Watch for these in your output and correct them:
Extended RCA is reasoning over incomplete information. Be honest about confidence. If the evidence only weakly supports a systemic cause, say so and put it in "open questions" rather than stating it as established. Overstating confidence in systemic findings is how RCAs turn into political instruments rather than learning instruments.
references/five-whys.md — how to do 5-whys without falling into the usual trapsreferences/fishbone.md — the 6-category framework adapted for software incidentsreferences/artifact-template.md — the Markdown template for the final writeupreferences/self-check.md — Phase 6 verification checklist, applied before handing any artifact overreferences/review-rubric.md — grading rubric and output format for /rca reviewreferences/trends-synthesis.md — how to do cross-incident synthesis for /rca trendsProvides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
npx claudepluginhub sebdenes/extendedrca