From pdm-quality-testing
Define a structured monitoring plan for the 72 hours after a release — with clear metrics, alert thresholds, escalation paths, and a stabilization criteria.
How this skill is triggered — by the user, by Claude, or both
Slash command
/pdm-quality-testing:post-launch-monitoringThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Define a structured monitoring plan for the 72 hours after a release — with clear metrics, alert thresholds, escalation paths, and a stabilization criteria.
Define a structured monitoring plan for the 72 hours after a release — with clear metrics, alert thresholds, escalation paths, and a stabilization criteria.
| Metric | Normal Baseline | Alert Threshold | Escalation Threshold |
|---|---|---|---|
| Error rate | < 0.5% | > 1% | > 2% |
| p95 Response time | < 300ms | > 500ms | > 1000ms |
| Throughput | 1000 req/min | < 700 req/min | < 500 req/min |
| CPU / Memory | < 60% | > 80% | > 90% |
| Failed logins | < 5/min | > 20/min | > 50/min |
| Support tickets | Baseline | 2× baseline | 3× baseline |
| Period | Check Frequency | Who |
|---|---|---|
| First 2 hours | Every 15 minutes | PDM + On-call Dev |
| Hours 2–24 | Every hour | On-call Dev |
| Day 2–3 | Every 4 hours | On-call Dev |
| Day 4–14 | Daily | PDM |
Initiate rollback if:
Release is stable when:
/lessons-learned or /delivery-reportnpx claudepluginhub devmuslim/pdm-skills --plugin pdm-quality-testingDesign monitoring and alerting that catches production issues fast without creating alert fatigue. Use when establishing observability or improving incident response.
Designs monitoring systems: SLOs, uptime checks, error tracking, alert routing, on-call rotations. Use when setting up or fixing monitoring, alert fatigue, or incident gaps.
Monitors production after a 100% deploy completes, comparing metrics and screenshots against a pre-deploy baseline to detect silent regressions (console errors, perf drops, broken pages) during the first hours/days.