From pm-engineering
Generates structured operational runbooks for services, incidents, or deployments with prerequisites, step-by-step procedures, rollback steps, and escalation paths.
How this skill is triggered — by the user, by Claude, or both
Slash command
/pm-engineering:runbook-writerThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Produces operational runbooks for services, incident types, and deployment procedures — structured so an on-call engineer who's never touched the system can follow them under pressure.
Produces operational runbooks for services, incident types, and deployment procedures — structured so an on-call engineer who's never touched the system can follow them under pressure.
Ask for these if not provided:
Runbook: [Runbook Title] Service: [Service Name] Type: [Deployment / Incident Response / Maintenance / DR] Last Updated: [Insert today's date in YYYY-MM-DD format] Owner: [Team or person] Severity: [P1 / P2 / P3 — if incident-type]
What this runbook covers: [1–2 sentences on the scenario this runbook handles]
When to use this runbook:
high-error-rate-payment-service]main]Estimated time to complete: [X minutes / X–Y minutes depending on outcome]
Impact if not completed correctly: [e.g. Payment processing degraded / Data loss risk / Users locked out]
Access required:
production-account]vault read secret/payment-service]Tools required:
kubectl v1.28+]Before you start:
#ops-live that you're starting]Number every step. Use exact commands. Do not paraphrase tool names or flags.
Step 1: [Action name] [What you're doing and why — one sentence]
# Exact command
[command here]
Expected output: [what should appear if this worked]
If this fails: [Exact error message to look for] → [What to do, or see Troubleshooting]
Step 2: [Action name] [Same structure as Step 1]
Step 3: Verify Always include a verification step after the main procedure:
[verification command]
Expected state: [What a healthy system looks like after this runbook completes]
How to undo this procedure if something went wrong:
Step R1: [Rollback action]
[rollback command]
Verify rollback: [command to confirm rollback succeeded]
| Symptom | Likely Cause | Resolution |
|---|---|---|
| [Error message or observable symptom] | [Why this happens] | [Exact fix or next step] |
| [Another symptom] | [Cause] | [Resolution] |
If this runbook does not resolve the issue:
| Condition | Who to Contact | How |
|---|---|---|
| [e.g. DB unavailable after 10 min] | [DBA on-call] | [PagerDuty policy: db-oncall] |
| [e.g. Payment provider unresponsive] | [Vendor contact] | [Contact in 1Password: vendor-escalation] |
Always update the incident timeline in [tool] before escalating.
After completing the runbook:
#ops-live with outcomenpx claudepluginhub mohitagw15856/pm-claude-skills --plugin pm-engineeringGenerates operational runbooks for repeatable incident procedures that any engineer can execute under pressure. Follows Google SRE and PagerDuty best practices.
Generates Markdown runbooks for incident response, operational procedures, troubleshooting guides, and emergency protocols from system analysis. Outputs structured files with metadata, steps, decision trees, and escalation paths.
Generates Markdown runbooks for operational procedures with steps, prerequisites, troubleshooting, rollback, escalation, and history. Use for on-call or ops recurring tasks.