From grimoire
Audits error budget consumption, sets burn rate alerts, and guides reliability investment decisions based on SLO budget status. Use when balancing feature velocity vs. reliability.
How this skill is triggered — by the user, by Claude, or both
Slash command
/grimoire:audit-error-budgetThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Analyze error budget consumption to decide whether to invest in reliability work or continue shipping features.
Analyze error budget consumption to decide whether to invest in reliability work or continue shipping features.
Adopted by: Google SRE teams; Spotify engineering; Atlassian reliability organization; formalized in Google's SRE Workbook as the standard method for balancing reliability against feature velocity
Impact: Google's original SRE book documents error budget policies eliminating the feature/reliability conflict by making trade-offs data-driven; organizations that implement error budget policies report reduced escalation frequency and faster incident resolution prioritization
Why best: Error budget is the mathematical expression of acceptable unreliability. Consuming it is not inherently bad — it means features shipped. Consuming it too fast is bad — it means users suffered unexpectedly. The audit identifies whether consumption is within the expected rate and triggers defined policy responses when it is not.
budget_remaining = (current_error_rate - SLO_target) × window_length; express in minutes of downtime equivalent and percentage of budget consumedburn_rate = actual_error_rate / error_rate_allowed_by_SLO; a burn rate of 1.0 is neutral; above 1.0 depletes budget; below 1.0 recovers itScenario: SLO is 99.9% over 28 days. After 14 days, error rate is 0.3% (3× the allowed 0.1%). Burn rate = 3.0. Budget consumed: 60% with 50% of window remaining. Policy trigger: reliability work enters next sprint; no new feature releases until burn rate drops below 1.0.
design-slo first to establish the baseline before attempting an audit.npx claudepluginhub jeffreytse/grimoire --plugin grimoireGuides defining SLOs, selecting SLIs, and implementing error budget policies for service reliability, alerting, and balancing velocity.
Defines Service Level Objectives (SLOs) and error budget policies for services. Creates documents with SLIs, targets, burn rate alerts, and review cadences.
Helps define SLOs, SLIs, and SLAs with error budget tracking and burn rate alerts. Use when implementing SRE practices or setting data-driven reliability targets.