From smorch-ops
SEV1-4 incident response runbook. Drives /smo-incident command. Enforces SOP-10 structure (detect, ack, mitigate, resolve, review). Provides operator with the right question at the right time. Replaces ad-hoc Slack triage with disciplined, repeatable flow.
How this skill is triggered — by the user, by Claude, or both
Slash command
/smorch-ops:incident-runbookThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
**Goal:** Reduce MTTR. Produce post-mortems that actually prevent recurrence.
Goal: Reduce MTTR. Produce post-mortems that actually prevent recurrence.
Entry points:
check-config-drift.sh / check-skill-drift.py cron/smo-health red flagAction: whoever sees first → acks in Telegram within SLA (SEV1 <5 min, SEV2 <30 min, SEV3 <2h)
Decision tree (5 questions):
Then declare in Telegram: "🔴 SEV{n}: {one-line-symptom}"
SEV1/2: stop the bleeding FIRST, understand SECOND.
/smo-rollback/smo-drift --target {host} → --fix if safe/smo-secrets --rotateSEV3/4: understand first, mitigate deliberately.
/smo-health returns all green/smo-incident to generate post-mortemdocs/incidents/smorch-brain/canonical/lessons.md if patternMust include:
Abbreviated template — timeline + root cause + action items only.
/smo-drift --fix during unrelated SEV could hide state.)/smo-triage → live diagnostic (what's happening NOW)/smo-incident → write-up (what happened, why, prevent)incident-runbook (this) → the protocol the operator follows| SEV | Ack | Mitigate | Resolve | Review |
|---|---|---|---|---|
| 1 | 5 min | 30 min | 2h | 48h |
| 2 | 30 min | 2h | 8h | 72h |
| 3 | 2h | 24h | 1 sprint | 1 sprint |
| 4 | 24h | next sprint | when convenient | optional |
Breached SLA → logged in docs/incidents/sla-breaches.csv. 3 breaches in 30 days → protocol review.
npx claudepluginhub smorchestra-ai/smorch-dev --plugin smorch-opsProvides CDSS development patterns for drug interaction checking, dose validation, clinical scoring (NEWS2, qSOFA), and alert classification integrated into EMR workflows.