From grimoire
Defines service reliability targets, error budgets, and SLI/SLO/SLA structures based on Google SRE practices. Use when designing or reviewing reliability commitments.
How this skill is triggered — by the user, by Claude, or both
Slash command
/grimoire:design-sloThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Define measurable reliability commitments that align engineering effort with what users actually care about.
Define measurable reliability commitments that align engineering effort with what users actually care about.
Adopted by: Google, Spotify, Dropbox, Netflix, and the majority of mature SRE organizations; mandated practice in the Google SRE model adopted by thousands of engineering organizations globally
Impact: Google's SRE book reports error budget policies cut feature/reliability conflict resolution time significantly; Dropbox publicly documented SLO adoption reducing production incidents by 30% within one year
Why best: Without SLOs, reliability work is driven by gut feel and loudest complaint. SLOs make reliability a quantified engineering decision: when error budget is full, ship features; when it burns fast, halt releases and fix reliability. This makes the reliability/feature trade-off objective and removes it from politics.
SLI: Percentage of HTTP requests to /api/checkout completing in under 800ms, measured at the load balancer.
SLO: 99.5% of checkout requests complete under 800ms over a rolling 28-day window.
Error budget: 0.5% of requests × 28 days = roughly 3 hours of budget before the policy triggers a feature freeze.
npx claudepluginhub jeffreytse/grimoire --plugin grimoireDefines Service Level Objectives (SLOs) and error budget policies for services. Creates documents with SLIs, targets, burn rate alerts, and review cadences.
Defines SLOs and error budgets for service reliability, enabling data-driven trade-offs between feature velocity and system stability.
Designs SLOs with SLIs, targets, alerting thresholds, and error budgets following Google SRE best practices. Use for defining reliability targets or service indicators.