From build-like-amazon
Guides staged production deployments with expanding blast radius (one-box → one-AZ → regional → global), mandatory bake times, and automatic rollback on alarm breaches.
How this skill is triggered — by the user, by Claude, or both
Slash command
/build-like-amazon:progressive-deploymentThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Every production change follows a staged rollout that expands blast radius gradually: One-Box → One-AZ → Regional → Global. Each stage has mandatory bake time during which alarms are monitored. Any alarm breach triggers automatic rollback to the last known-good state. The goal is to detect problems when they affect the fewest customers possible.
Every production change follows a staged rollout that expands blast radius gradually: One-Box → One-AZ → Regional → Global. Each stage has mandatory bake time during which alarms are monitored. Any alarm breach triggers automatic rollback to the last known-good state. The goal is to detect problems when they affect the fewest customers possible.
At Amazon, no service deploys to all hosts simultaneously. The pipeline enforces progressive deployment because the cost of a bad deployment to 100% of hosts is measured in customer trust and revenue. A 1-box deployment that catches a bug costs almost nothing. A full-fleet deployment that causes a critical incident costs millions and erodes customer confidence. Teams that skip stages eventually cause high-severity events—this is not theoretical, it is observed repeatedly.
Automatic rollback fires when any of these conditions are met:
| Intention | Mechanism |
|---|---|
| "I'll watch the metrics after deploy" | Pipeline blocks progression until bake time completes with green metrics |
| "This change is too small to cause issues" | Every change, regardless of size, goes through all stages |
| "We need this out fast" | Emergency pipeline still has one-box + reduced bake, never zero stages |
| "I'll rollback manually if something breaks" | Rollback is automatic on alarm—human speed is too slow for customers |
| What They Say | Why It's Wrong | What To Do Instead |
|---|---|---|
| "It's just a config change" | Config changes cause more outages than code changes at Amazon | Same pipeline, same stages, same bake times |
| "We're blocking on this for a launch" | A failed launch is worse than a delayed launch | Use the emergency pipeline (reduced bake, not zero bake) |
| "The change was already tested in staging" | Staging doesn't have production traffic patterns, data volumes, or dependency behavior | Staging reduces risk; it doesn't eliminate it. Pipeline stages are still required |
| "It's Friday afternoon but this is urgent" | Weekend deploys have longer detection time due to reduced monitoring attention | Deploy Monday, or accept extended bake times (2x) for off-hours |
Before marking a deployment complete, confirm:
npx claudepluginhub robisson/build-like-amazon-agent-skillsProvides deployment plans for blue-green, canary releases, progressive rollouts, automated rollback, feature flag coordination, and zero-downtime migrations. For high-risk changes and rollouts.
Selects and designs deployment strategies (blue-green, canary, rolling) for safe production releases.
Provides rollback procedures, risk assessment, pre/post-deployment validation checklists, and contingency planning for safe deployments.