From qa-debug
This skill should be used when the user asks to "fix this bug", "how do I prevent this from happening again", "design a rollback plan", "add a circuit breaker", "implement retry with backoff", "harden this service", "what pattern should I use to handle this failure", "how do I make this more resilient", "graceful degradation", or needs guidance on reliability engineering patterns, rollback strategies, architectural hardening decisions, or fail-safe design. Also activate when running /debug (step 4: propose fix) or /postmortem (action items) commands.
How this skill is triggered — by the user, by Claude, or both
Slash command
/qa-debug:remediation-playbooksThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Match the failure mode to the right pattern before proposing code:
Match the failure mode to the right pattern before proposing code:
| Failure Mode | Primary Pattern | Secondary Pattern |
|---|---|---|
| External service goes down | Circuit Breaker | Fallback / Graceful Degradation |
| External service is slow | Timeout + Retry | Bulkhead |
| Burst traffic exceeds capacity | Rate Limiting + Queue | Bulkhead |
| Retry storms amplify outages | Exponential Backoff + Jitter | Circuit Breaker |
| Write fails mid-operation | Idempotency Key | Reversible Migration |
| Bad deploy corrupts state | Feature Flag | git revert + re-deploy |
Prevents cascading failure when a dependency is unhealthy:
wait = min(base * 2^attempt, max_wait) + random(0, base)
max_attempts ceiling (3–5 for synchronous; higher for async workers)Never let a network call block indefinitely:
Isolate resource pools per consumer to limit blast radius:
Return a safe default when the primary path fails:
| Change Type | Rollback Method | Estimated Time |
|---|---|---|
| Code deploy | git revert + re-deploy, or feature flag → off | Minutes |
| DB migration | Run reversible down() migration | Minutes–hours (depends on row count) |
| Config change | Revert to previous value in version control | Seconds |
| Dependency bump | Pin to previous version in lock file; re-install | Minutes |
| Infrastructure change | Terraform apply previous state | Minutes–hours |
Non-negotiable rollback rules:
down() migration — test it in CI before the PR mergesEscalate if:
Fix locally if:
See references/playbooks.md for extended patterns including distributed tracing, database-specific patterns, frontend resilience, and async/queue failure modes.
npx claudepluginhub luxcordia/qa-debug --plugin qa-debugAssesses and guides implementation of system resilience patterns: circuit breakers, retries, bulkheads, graceful degradation, health checks, timeouts. Prevents cascading failures in distributed services.
Assists implementing circuit breakers, retries, bulkheads, and resilience patterns for fault-tolerant distributed systems.
Detects missing resilience patterns (circuit breakers, retries, timeouts, bulkheads) in service dependencies and recommends production-grade fault tolerance configurations.