From agent-almanac
Structured recovery for damaged systems using triage, stabilization, scaffolding, and progressive rebuild. Useful after incidents, failed migrations, or accumulated tech debt.
How this skill is triggered — by the user, by Claude, or both
Slash command
/agent-almanac:repair-damageThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Implement regenerative recovery for systems that have sustained structural damage — whether from incidents, failed migrations, accumulated neglect, or external disruption. Uses biological wound-healing as a framework: triage, stabilization, scaffolding, progressive rebuild, and scar tissue management.
Implement regenerative recovery for systems that have sustained structural damage — whether from incidents, failed migrations, accumulated neglect, or external disruption. Uses biological wound-healing as a framework: triage, stabilization, scaffolding, progressive rebuild, and scar tissue management.
adapt-architecture) left the system in a damaged intermediate statedefend-colony) when the colony sustained damageRapidly assess all damage and classify by severity and urgency.
Wound Classification:
┌──────────┬──────────────────────┬────────────────────────────────────┐
│ Class │ Severity │ Response │
├──────────┼──────────────────────┼────────────────────────────────────┤
│ Critical │ Core function lost, │ Immediate: stop bleeding, activate │
│ │ data at risk, │ backup, redirect traffic, page │
│ │ actively spreading │ on-call team │
├──────────┼──────────────────────┼────────────────────────────────────┤
│ Serious │ Important function │ Urgent: fix within hours/days, │
│ │ degraded, no spread │ workarounds acceptable short-term │
├──────────┼──────────────────────┼────────────────────────────────────┤
│ Moderate │ Non-critical function│ Scheduled: fix within sprint, │
│ │ affected, contained │ prioritize against other work │
├──────────┼──────────────────────┼────────────────────────────────────┤
│ Minor │ Cosmetic or edge │ Backlog: fix when convenient, │
│ │ case, no user impact │ may self-resolve │
└──────────┴──────────────────────┴────────────────────────────────────┘
Expected: A complete wound inventory classified by severity, with a prioritized repair order that accounts for wound interactions.
On failure: If triage takes too long (the system is actively degrading), skip detailed classification and focus on: "What is the single most critical thing to stabilize?" Fix that first, then return to full triage.
Stop the damage from spreading before beginning repair.
Expected: The system is stable (not actively degrading) even if degraded. Damage is contained and not spreading. Evidence is preserved for root cause analysis.
On failure: If stabilization fails (damage continues spreading despite containment), escalate to full system fallback: activate disaster recovery, switch to backup system, or gracefully degrade to minimal viable operation. Stabilization that takes too long becomes the disaster.
Construct the temporary structures that support the repair process.
Expected: A repair environment with diagnostic capability, a sequenced repair plan, and awareness of scar tissue risk.
On failure: If setting up a proper repair environment is too slow (system urgency demands immediate production changes), apply changes directly but with extreme discipline: one change at a time, tested by the available means, rolled back if it doesn't help.
Repair damage systematically, verifying each fix before proceeding.
Expected: Critical and serious wounds are repaired with verified fixes. Emergency patches are removed. The system is restored to functional operation.
On failure: If a repair attempt fails or causes regression, roll back to the previous state and reassess. If multiple repair attempts fail for the same wound, the damage may be too deep for local repair — consider whether the affected component needs full replacement rather than repair (see dissolve-form).
Address the workarounds and shortcuts introduced during emergency repair, and strengthen against recurrence.
defend-colony immune memory)Expected: Scar tissue is managed (removed, replaced, or accepted with documentation). The system is not only repaired but more resilient than before the damage. Learnings are captured for future incidents.
On failure: If scar tissue management is deprioritized ("it works, don't touch it"), schedule it explicitly. Unmanaged scar tissue accumulates and eventually contributes to the next incident. If the root cause can't be identified, strengthen detection and recovery speed as compensating controls.
assess-form — damage assessment shares methodology with form assessmentadapt-architecture — architectural adaptation may be needed if damage reveals structural weaknessdissolve-form — for components too damaged to repair; dissolve and rebuilddefend-colony — defense triggers repair; post-incident recovery feeds back into defenseshift-camouflage — surface adaptation can mask damage while repair proceeds (with caution)conduct-post-mortem — structured post-incident analysis complements root cause identificationwrite-incident-runbook — repair procedures should be captured as runbooks for future incidentsnpx claudepluginhub pjt222/agent-almanacGuides structured recovery from errors and failures by assessing, logging root cause, and applying safe fixes with rollback plans. Prevents cascading mistakes via forbidden patterns and escalation rules.
Responds to production incidents using a structured workflow: classify severity, triage impact, mitigate, root-cause, and write a blameless post-mortem. Use for outages, production issues, or security incidents.
Executes structured production incident response: triages P1-P3 severity, contains blast radius (rollback, mitigation), root-causes after stabilization, logs timeline, generates postmortem. Triggers on outages or 'incident'.