Define recovery objectives (RTO/RPO), backup strategies, failover procedures, and testing protocols. Use when planning disaster recovery or establishing continuity practices.
How this skill is triggered — by the user, by Claude, or both
Slash command
/infrastructure-design:disaster-recovery-planThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Design recovery strategies with defined objectives, tested procedures, and regular validation.
Design recovery strategies with defined objectives, tested procedures, and regular validation.
You are planning disaster recovery. Define RTO/RPO requirements, design backup and failover strategies, plan testing. Read business impact analysis, current backups, and regulatory requirements.
Based on IT disaster recovery best practices (NIST, ISO 27031):
Define Business Requirements: For each critical system, what's RTO (max downtime) and RPO (max data loss)? Business impact: lost revenue, SLA violations, customer trust?
Design Backup Strategy: Full daily backup + hourly incremental. Or continuous replication for stricter RPO. Test recovery from backups monthly; document recovery steps.
Plan Failover: For RTO < 1 hour, set up active-passive (standby system). For RTO < 5 minutes, active-active (both systems live). Implement health checks and automatic failover.
Document Procedures: Who decides to failover? What are manual steps? How do you know failover succeeded? Test documentation with dry runs; update after each test.
Schedule Regular Testing: Monthly failover drills for critical systems. Test both planned (maintenance window) and unplanned (kill production server) scenarios. Document findings and improvements.
npx claudepluginhub sethdford/claude-skills --plugin architect-infrastructure-designDesigns cloud disaster recovery plans with RTO/RPO tiers, backup architecture, IaC, and communication procedures based on AWS, NIST, and ISO standards.
Defines RPO/RTO targets, designs backup architecture, and guides disaster recovery drills for full-region or platform outages. Also handles ransomware planning and post-incident restoration gap analysis.
Produces a complete disaster recovery plan with RPO/RTO targets, per-scenario runbooks, backup procedures, testing cadence, and communication templates for a service or system.