From vercel-pack
Executes Vercel incident runbook: triages outages with status checks, deployment inspection, and logs; performs instant rollbacks; follows decision tree for platform vs app issues.
How this skill is triggered — by the user, by Claude, or both
Slash command
/vercel-pack:vercel-incident-runbookThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Step-by-step incident response for Vercel deployment failures, function errors, and platform outages. Covers rapid triage, instant rollback, communication templates, and postmortem procedures.
Step-by-step incident response for Vercel deployment failures, function errors, and platform outages. Covers rapid triage, instant rollback, communication templates, and postmortem procedures.
# 1. Check if it's a Vercel platform issue
curl -s "https://www.vercel-status.com/api/v2/summary.json" \
| jq '.status.description, [.components[] | select(.status != "operational") | {name, status}]'
# 2. Check current production deployment status
vercel ls --prod
vercel inspect $(vercel ls --prod --json | jq -r '.[0].url')
# 3. Check recent deployments — did a deploy just happen?
curl -s -H "Authorization: Bearer $VERCEL_TOKEN" \
"https://api.vercel.com/v6/deployments?target=production&limit=5&projectId=prj_xxx" \
| jq '.deployments[] | {uid, state, createdAt: (.createdAt/1000 | todate), url}'
# 4. Check function logs for errors
vercel logs $(vercel ls --prod --json | jq -r '.[0].url') --level=error --limit=20
Is vercel-status.com showing an incident?
├── YES → Vercel platform issue
│ ├── Subscribe to updates on status page
│ ├── Post internal status: "Vercel platform incident — monitoring"
│ └── No action needed from us — wait for Vercel resolution
│
└── NO → Issue is in our deployment
├── Did a deployment happen in the last 30 minutes?
│ ├── YES → Likely deployment regression
│ │ └── ROLLBACK immediately (Step 3)
│ └── NO → Application-level issue
│ ├── Check function logs for new errors
│ ├── Check external dependency status (DB, APIs)
│ └── Investigate and hotfix (Step 4)
│
└── Is the issue region-specific?
├── YES → Check function regions, possible edge issue
└── NO → Global issue, check code and env vars
# Option A: Rollback to previous production deployment (fastest)
vercel rollback
# This instantly swaps production traffic — no rebuild needed
# Option B: Rollback to a specific known-good deployment
vercel rollback dpl_xxxxxxxxxxxx
# Option C: Via API (for automation/PagerDuty integration)
curl -X POST "https://api.vercel.com/v9/projects/my-app/promote" \
-H "Authorization: Bearer $VERCEL_TOKEN" \
-H "Content-Type: application/json" \
-d '{"deploymentId": "dpl_known_good_id"}'
# Verify rollback succeeded
vercel ls --prod
curl -s https://yourdomain.com/api/health | jq .
# Collect evidence while it's fresh
mkdir incident-$(date +%Y%m%d)
cd incident-$(date +%Y%m%d)
# Function logs around the incident time
vercel logs https://yourdomain.com --limit=200 > function-logs.txt
# Deployment diff — what changed?
curl -s -H "Authorization: Bearer $VERCEL_TOKEN" \
"https://api.vercel.com/v13/deployments/dpl_broken" \
| jq '.meta' > broken-deployment-meta.json
# Compare env vars between working and broken deployments
vercel env ls > env-vars.txt
# Check git diff between last good and broken commit
git log --oneline -10
git diff dpl_good_commit..dpl_broken_commit -- api/ src/
// vercel.json — temporary maintenance mode via rewrite
{
"rewrites": [
{
"source": "/((?!_next|api/health).*)",
"destination": "/maintenance.html"
}
]
}
<!-- public/maintenance.html -->
<!DOCTYPE html>
<html>
<head><title>Maintenance</title></head>
<body>
<h1>We'll be right back</h1>
<p>We're performing scheduled maintenance. Please check back shortly.</p>
</body>
</html>
Internal — Slack (Incident Start)
:rotating_light: INCIDENT: [Project Name] production issue detected
Status: Investigating
Impact: [Description of user impact]
Start time: [UTC timestamp]
On-call: @[engineer]
Thread: replies here
Internal — Slack (Mitigation)
:white_check_mark: MITIGATED: [Project Name]
Action: Rolled back to deployment dpl_xxx
Impact duration: [X minutes]
Root cause: [Brief description]
Postmortem: [link] scheduled for [date]
External — Status Page
Title: Degraded performance on [service]
Body: We are investigating reports of [issue]. Some users may experience
[impact]. Our team is actively working on a resolution.
Update: The issue has been resolved. [Brief root cause].
# Incident Postmortem: [Title]
## Summary
- Duration: [start] to [end] ([X minutes])
- Impact: [users/requests affected]
- Severity: [P1/P2/P3]
## Timeline (UTC)
- HH:MM — [event]
- HH:MM — Alert fired
- HH:MM — On-call acknowledged
- HH:MM — Root cause identified
- HH:MM — Rollback executed
- HH:MM — Service restored
## Root Cause
[What broke and why]
## Resolution
[What was done to fix it]
## Action Items
- [ ] [Preventive action] — Owner: @xxx — Due: [date]
- [ ] [Detection improvement] — Owner: @xxx — Due: [date]
- [ ] [Process improvement] — Owner: @xxx — Due: [date]
| Severity | Definition | Response Time | Rollback? |
|---|---|---|---|
| P1 | Production down, all users affected | < 5 min | Immediate |
| P2 | Degraded, some users affected | < 15 min | If not fixable in 30 min |
| P3 | Minor issue, workaround exists | < 1 hour | No |
| P4 | Cosmetic or non-urgent | Next business day | No |
| Scenario | Action |
|---|---|
| Vercel status page shows incident | Monitor, communicate, no deployment changes |
vercel rollback fails | Use API promotion: POST to /v9/projects/.../promote |
| Rollback deployment also broken | Deploy from a known-good git tag |
| Cannot access Vercel dashboard | Use CLI with saved VERCEL_TOKEN |
| Log retention expired | Check external log drain provider |
For data handling and compliance, see vercel-data-handling.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin vercel-packCollects Vercel debug bundle with deployment inspect, function logs, build info, routes, and env vars for support tickets and troubleshooting.
Diagnoses and recovers from production incidents with structured triage, severity classification, and root cause analysis. Use when service is down, errors spike, or users report critical bugs.
Guides Vercel Agent for AI-powered code reviews on PRs, incident investigations via logs/metrics, and SDK installations. Use for automating PR analysis and anomaly debugging in Vercel projects.