From grimoire
Models current resource utilization and traffic growth to project when infrastructure capacity will be exhausted, helping you provision proactively before SLO violations occur.
How this skill is triggered — by the user, by Claude, or both
Slash command
/grimoire:calculate-capacity-planThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Model current resource utilization and traffic growth to determine when and how much additional infrastructure to provision to maintain SLO compliance.
Model current resource utilization and traffic growth to determine when and how much additional infrastructure to provision to maintain SLO compliance.
Adopted by: Google SRE (capacity planning is a core SRE function), Netflix (capacity headroom model), AWS (Auto Scaling Group sizing methodology) Impact: Allspaw (2008): organizations without capacity planning experience 2-3 avoidable capacity-related outages per year; over-provisioning without capacity plans wastes 30-40% of cloud spend; under-provisioning causes SLO violations Why best: Infrastructure takes time to provision; without a forward-looking model, you react to capacity issues after they cause outages rather than before
Sources: Allspaw "The Art of Capacity Planning" O'Reilly (2008); Murphy et al. "Site Reliability Workbook" O'Reilly (2018) Ch. 17; Google SRE capacity planning practices
Measure current resource utilization — Collect 90-day p50 and p95 utilization for each resource dimension: CPU, memory, disk I/O, network bandwidth, database connections, and cache hit rate. Use percentiles, not averages — averages mask peaks. P95 utilization is your effective utilization baseline.
Identify the bottleneck resource — From your utilization data, identify which resource reaches saturation first as traffic increases. Common bottlenecks: CPU (compute-bound services), database connections (connection pool exhaustion), memory (high-memory workloads), I/O (storage-bound services). All capacity modeling focuses on the bottleneck resource.
Establish a traffic growth model — Analyze 90-day traffic trends (requests per second, active users, data volume). Fit a growth model: linear (constant traffic addition), exponential (percentage growth per period), or step-function (event-driven spikes). Get business context: planned marketing campaigns, product launches, seasonal peaks. Your growth model drives the capacity timeline.
Calculate resource-to-traffic ratio — Determine: at current traffic, what resource utilization exists? Example: 1000 RPS → 40% CPU on 8 vCPUs. Compute the ratio: 1000 RPS / 40% = 2500 RPS before CPU saturation. Apply a safety buffer: plan capacity for 70% max utilization (30% headroom for traffic spikes and failover).
Project when capacity will be exhausted — Apply the growth model to the safe capacity threshold. Example: 2500 RPS safe limit, current 1000 RPS, growing 15% per month. Months to exhaustion = log(2500/1000) / log(1.15) ≈ 6.5 months. Subtract your infrastructure provisioning lead time (cloud: 0 days for on-demand, 2-4 weeks for reserved procurement).
Model for N+1 and multi-region redundancy — Capacity must support N+1 node failure. If you have 3 web servers and one fails, the remaining 2 must handle 100% of traffic. Plan capacity at N+1 utilization: 2/3 of normal utilization per server (67%), so 3 servers run at 67%, not 90%. Multi-region: each region must handle full traffic load if the peer region fails.
Account for efficiency initiatives — If planned optimizations will reduce resource consumption (caching, query optimization, algorithm improvement), model their impact. Example: adding read replicas expected to reduce primary DB load by 40%. Apply these reductions conservatively (50% of the planned gain) to account for implementation delays.
Calculate the procurement/provisioning timeline — Determine lead times: on-demand cloud instances (minutes), reserved instances (1-3 days to activate), physical hardware (6-12 weeks), network capacity increases (2-4 weeks for ISP circuits). The capacity plan output is: "We need to provision X resources by date Y or we will breach our SLO."
Build a capacity model spreadsheet — Columns: date, traffic (RPS/users), resource utilization (by dimension), % of safe capacity, headroom remaining. Rows: monthly projections for 12 months. Include scenario analysis: baseline, +50% growth spike, major launch event. Share with engineering leadership monthly.
Review and update quarterly — Traffic growth rates change; efficiency initiatives complete or slip; product roadmaps shift. Review the capacity model quarterly against actuals. Adjust the growth model based on observed trends. Capacity plans that aren't updated against actuals drift from reality within 3 months.
npx claudepluginhub jeffreytse/grimoire --plugin grimoireForecast infrastructure resource needs (compute, storage, network). Model growth scenarios and utilization targets. Use when planning infrastructure investments or optimizing costs.
Produces a structured capacity planning document covering traffic forecasts, resource requirements, scaling strategy, cost projections, and infrastructure action roadmap.
Analyzes CPU, memory, storage, network utilization; forecasts growth trends; recommends scaling strategies with cost estimates. Useful for infrastructure capacity planning and bottleneck identification.