Skill

slos-and-triggers

Guides interpreting Honeycomb SLO compliance, error budgets, burn rates, and trigger status. Detects misconfigured SLIs, advises deploy freezes vs on-call paging, designs burn alerts.

monitoring

devops

Popularity

Parent stars

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/honeycomb:slos-and-triggers

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Guidance for configuring and reasoning about reliability in Honeycomb. The `get_slos`

Supporting Files

references/alerting-strategy.mdreferences/slo-design-guide.mdreferences/trigger-examples.md

SKILL.md

109 lines · ~1.3k tokens

Stats

LanguagePython

Parent stars10

Parent forks1

MaintenanceGood

Last CommitMar 12, 2026

Actions

View Source View Plugin View on GitHub View README

Honeycomb SLOs and Triggers

Guidance for configuring and reasoning about reliability in Honeycomb. The get_slos and get_triggers tools document their own parameters — this skill focuses on designing effective SLOs, choosing between SLOs and triggers, and interpreting what the numbers mean.

Availability: SLOs require Pro or Enterprise plan. Triggers available on all plans.

SLO vs Trigger — When to Use Which

Question	SLO	Trigger
"Are we meeting our reliability commitments?"	Yes	No
"Is something broken right now?"	No	Yes
"How fast are we burning our error budget?"	Yes (burn alerts)	No
"Did error count exceed a threshold?"	No	Yes
"Should we slow down deploys?"	Yes (budget remaining)	No

Rule of thumb: SLOs measure reliability against commitments over time. Triggers catch immediate operational issues.

Designing Effective SLOs

Define the SLI

An SLI is a per-event boolean: was this event successful? Implemented as a calculated field returning 1 (success) or 0 (failure).

Latency SLI: LTE(duration_ms, 500) — requests faster than 500ms
Availability SLI: LTE(http.status_code, 499) — non-5xx responses
Business logic SLI: EQUALS(checkout.status, "completed") — successful checkouts

Set the Target

Start conservative (99% before 99.99%)
Measure current baseline first with P50/P99 queries
Set target slightly above current performance
Ask: what reliability do users actually need?

Configure Burn Alerts

At minimum, two alerts:

Fast burn (exhaustion time ~4h): pages on-call via PagerDuty
Slow burn (budget rate over 24h): notifies team via Slack

Best Practices

Measure close to the user (at the edge, not deep in the stack)
Design around user workflows, not team boundaries
Favor broad SLOs over many narrow ones
Start with one SLO, reduce noise, then expand

Interpreting SLO Status

When reviewing SLOs with get_slos:

Budget remaining > 50%: Healthy — room for risk
Budget remaining 10-50%: Caution — slow down changes
Budget remaining < 10%: At risk — freeze non-critical deploys
Budget negative: Breached — investigate immediately with the production-investigation skill
Compliance at 0%: Likely misconfigured SLI (wrong column, inverted logic, no matching events) — check the SLI definition

Configuring Triggers

Prefer Count-Based Over Percentile-Based

"50 requests slower than 2s" is more actionable than "P99 is 2100ms." Use COUNT WHERE duration_ms > threshold instead of P99 triggers.

Common Patterns

Error spike: COUNT WHERE error = true, threshold > N in 5 min
Slow requests: COUNT WHERE duration_ms > 2000, threshold > N in 5 min
Traffic drop: COUNT WHERE is_root, threshold < N in 10 min (below normal)

Best Practices

Name: What the alert is. Description: What to do (link to runbook).
Set duration 5-10 min minimum to avoid flapping
Start less sensitive, tighten based on false positive rate

Multi-Service SLOs

Share a single error budget across up to 10 services.

SLI must be an environment-level calculated field
Events from included services weighted equally
Use cases: multiple edge services, monolith-to-microservices migration

Additional Resources

Reference Files

${CLAUDE_PLUGIN_ROOT}/skills/slos-and-triggers/references/slo-design-guide.md — Detailed SLO design methodology, multi-service SLOs, error budget math
${CLAUDE_PLUGIN_ROOT}/skills/slos-and-triggers/references/trigger-examples.md — Complete trigger example library organized by use case
${CLAUDE_PLUGIN_ROOT}/skills/slos-and-triggers/references/alerting-strategy.md — How to combine SLO burn alerts and triggers into a cohesive alerting strategy

Cross-References

For constructing SLI queries and calculated fields, see the query-patterns skill
For investigating SLO budget burn, see the production-investigation skill

slos-and-triggers

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

slos-and-triggers

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Honeycomb SLOs and Triggers

SLO vs Trigger — When to Use Which

Designing Effective SLOs

Define the SLI

Set the Target

Configure Burn Alerts

Best Practices

Interpreting SLO Status

Configuring Triggers

Prefer Count-Based Over Percentile-Based

Common Patterns

Best Practices

Multi-Service SLOs

Additional Resources

Reference Files

Cross-References

Similar Skills

Honeycomb SLOs and Triggers

SLO vs Trigger — When to Use Which

Designing Effective SLOs

Define the SLI

Set the Target

Configure Burn Alerts

Best Practices

Interpreting SLO Status

Configuring Triggers

Prefer Count-Based Over Percentile-Based

Common Patterns

Best Practices

Multi-Service SLOs

Additional Resources

Reference Files

Cross-References

Similar Skills