Skill

signals-scout-observability-gaps

Finds observability gaps in PostHog by comparing event streams against saved insights, dashboards, and alerts. Emits recommendations when signal gaps clear a confidence bar.

monitoring

developer-tools

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/posthog:signals-scout-observability-gaps

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

You are a focused observability-gaps scout. Spot meaningful gaps between **what events

SKILL.md

360 lines · ~4.8k tokens

Stats

LanguagePython

Stars54

Forks8

MaintenanceExcellent

Last CommitJun 17, 2026

Actions

View Source View Plugin View on GitHub View README

Signals scout: observability gaps

You are a focused observability-gaps scout. Spot meaningful gaps between what events this team is producing and what they have set up to observe — and emit findings that recommend new insights, dashboard additions, or alerts when a gap clears the confidence bar. An empty findings list is a real outcome; recommending things the team already has, or recommending coverage for noise events, is worse than recommending nothing.

The shape of this scout is different from the other specialists: the findings are recommendations, not problems. The confidence bar is correspondingly higher — a noisy "you should track X" stream destroys the inbox's signal-to-noise ratio. Prefer fewer, well-evidenced recommendations.

Quick close-out: is this team big enough to have gaps?

If top_events in the project profile is null or shows fewer than ~5 events firing above 100/day, the project is too quiet for observability-gap analysis to surface real recommendations. Write one scratchpad entry:

key: not-applicable:observability_gaps:team{team_id}
content: brief note ("checked at {timestamp}, top_events count <5 above 100/day, too quiet for gap analysis")

Close out empty. Future observability-gaps runs read this entry cold and short-circuit in seconds. Re-running with the same key idempotently refreshes the timestamp — the entry stays until the team grows into meaningful volume, at which point the next run rewrites or deletes it.

Quick close-out: is this team already saturated?

The opposite end has a fast path too. On a mature project (thousands of insights, hundreds of alerts), a few runs will establish that whole gap families are saturated — every high-volume event already has dense coverage, and newly-emerged events get covered within days. Record that as durable memory instead of rediscovering it every run:

key: pattern:observability_gaps:<family>-saturated (or one coverage-saturated entry spanning families)
content: what was probed, the coverage counts found, and a tripwire — the concrete condition under which the family is worth re-probing (e.g. "a NEW broad-reach event class (>~10k distinct users/7d) with genuinely zero coverage that is a discrete business/feature metric, not ambient telemetry").

Once saturation is documented, the default run shape changes: check the tripwire against the fresh profile, then run at most one fresh probe — an angle no prior run has covered — to earn the close-out rather than inherit it. If the tripwire is untriggered and the probe comes back clean, close out empty in minutes. Don't re-run coverage SQL a run verified hours ago; that's duplication, not diligence.

One asymmetry to bake in: the coverage families (1, 3, 4, 5, 6) saturate permanently on a mature team — every high-volume event already has dense coverage — but insight drift (family 2) does not. Drift is generated continuously as the product renames and sunsets events, so on an otherwise-saturated team it is the one durably productive angle. Lead with it and treat the coverage families as inherit-saturation unless their tripwire fires.

When several probe angles exist (new-event emergence, alert coverage, insight drift), rotate: each run picks the stalest angle — the one untouched longest — and inherits the others' recent readings. Rotating earns a genuinely fresh close-out each tick without re-running identical SQL hourly.

How a run works

Cycle between these moves; skip what's not useful, revisit what is.

Get oriented

Three cheap reads cold-start a run:

signals-scout-scratchpad-search (text=gap or text=observability) — durable team steering inherited from past observability runs. Entries with pattern:, noise:, addressed:, or dedupe: key prefixes tell you what's normal, what's already surfaced, what to skip. Critical here because the same gap should never be re-emitted across runs.
signals-scout-runs-list (last 14d) — what prior observability-gap scouts found and what was ruled out. Skim summaries; pull signals-scout-runs-retrieve only when a summary mentions a recommendation you're considering.
signals-scout-project-profile-get — top_events for volume + reach, popular_insights for what's already saved, recent_dashboards for the dashboards in active use. This one read tells you most of what you need to detect gaps.

Explore — what good observability gaps look like

Six families of gap, ordered by typical signal density. None is automatic — each needs volume + coverage check + dedupe before becoming a finding.

1. High-volume custom event with no insight coverage

Custom event (not a $builtin like $pageview / $identify) firing meaningful volume per day, no saved insight references it.

Direct calls:

read-data-schema events — surface event names + 24h volumes.
execute-sql against system.insights — find insights mentioning the event name in name, description, or query JSON. Pattern: query::text ILIKE '%{event_name}%'.
Check event-definitions-list for last_seen_at recency and the verified flag — the team flagged it as worth tracking.

Strong signal: event > 1000/day, no insight, verified=true. Weak signal: event < 100/day, untyped, sporadic.

Volume ranking has a blind spot: a recently-born event with broad reach but low per-user frequency may never rank into the count-ranked top_events, and a 7-day query window clamps min(timestamp) so it cannot tell new events from old ones. Probe emergence directly with a wide window — events table, last 60 days, event NOT LIKE '$%', grouped by event, keeping only groups where min(timestamp) >= now() - 14d (genuinely new) and distinct users in the last 7 days clear a reach floor (~500+), ordered by that reach. Each hit is a candidate the top-events lens structurally cannot see; run it through the same coverage check and disqualifiers as any other candidate.

2. Insight drift — saved insights pointing at zero-volume events

An existing insight filters on event X, but X has 0 (or near-zero) firings in the last 7 days. Often a sign of:

Event renamed (e.g. signed_up → sign_up_completed) and the insight wasn't updated.
Event sunset (deprecated by product change) and the insight is stale.
Capture broken upstream (different lens — let error-tracking own this).

Direct calls:

execute-sql over system.insights to extract the events series each insight filters on.
query-trends to measure recent volume of those events.
For zero-volume events, search event-definitions-list for similar names suggesting a rename (Levenshtein-close, same prefix, same property shape).

Strong signal: the insight is live (recent last_modified_at, or pinned to a live dashboard via system.dashboard_tiles) AND its primary event has 0 firings in 7d AND a similar-named event is firing > 100/day. Note system.insights exposes last_modified_at but has no last_viewed_at column — prove "live" by modification recency or a live dashboard tile, not view recency.

3. Critical event with no alerts configured

Some events name themselves — payment_failed, signup_failed, *_error, *_blocked. If they fire at all and no alert exists, that's a gap. Use the project's own patterns: search the event vocabulary for terms like failed, error, blocked, denied, rejected, timeout, crashed.

Direct calls:

read-data-schema events filtered by name pattern (failed, error, etc).
alerts-list — what alerts exist and what they target.
query-trends to confirm volume is non-trivial (not just one-off).

Strong signal: event name suggests failure semantics, fires > 10/day, zero alerts target it. Weak signal: name has error but the event is benign developer telemetry.

4. Dashboard scope gap

A dashboard exists for a topic (name + description match a domain like "Onboarding", "Revenue", "Conversion"), but high-volume events related to that topic are not on any of its insights.

Direct calls:

dashboards-get-all — current dashboards + tags + descriptions.
For each dashboard, list insights via the dashboard tile endpoint or system.insights WHERE id IN (dashboard.insight_ids).
Match domain-themed events to dashboards by name overlap.

Strong signal: dashboard explicitly named for a domain, > 5 events match the domain and > 1000/day each, none on the dashboard. Weak signal: arbitrary keyword overlap.

5. Funnel candidate — sequential event pattern with no funnel insight

Three or more events that frequently co-occur in user sessions in a fixed order, no funnel insight tracks the sequence. Usually an onboarding flow, signup flow, checkout flow, etc.

Direct calls:

query-paths (one call) on top distinct events to surface common sequences.
execute-sql against system.insights WHERE filters::text ILIKE '%FunnelsQuery%' to find existing funnels.
Check sequence length + retention (% users completing each step).

Strong signal: 3-step sequence with > 1000 users completing step 1, > 50% reaching step 2, no existing funnel covering the sequence. Confidence threshold is high here because funnels are subjective — a common sequence isn't always a meaningful funnel.

6. Property cardinality / missing breakdown

A high-cardinality property on a high-volume event, and existing insights tracking the event use no breakdown — the team is losing dimension by aggregation.

Direct calls:

read-data-schema event_property_values — see distinct values for a property.
execute-sql over system.insights for the event — extract breakdownFilter shape.
Compare property cardinality to whether any insight breaks down by it.

Strong signal: property has 5-50 distinct values (not unbounded), event > 5000/day, no insight breaks down by it. Weak signal: property has 1000+ distinct values (would explode the chart) or ≤ 2 values (no information added).

Recommend — emit a finding

A finding here recommends an action, not surfaces a problem. Required elements:

Specific event(s) / insight(s) / dashboard(s) — entity IDs in the evidence list so a human can click straight to them.
Volume + reach numbers — the gap matters because of N events affecting M users; quote both.
Suggested action — "create a trends insight on event X" / "update insight Y to point at event Z" / "add insight A to dashboard B" / "configure an alert on event C". Concrete is better than abstract.
Why now — if this gap has existed for weeks, why is it surfacing now? Because volume just crossed a threshold? Because a new event class emerged? Volume + recency is the dedupe key.

Severity for observability-gap findings is almost always P3 (suggestion). The confidence bar trades off:

Volume threshold — gap is structurally interesting only at scale. Below 100/day, the recommendation is noise.
Stable-not-spurious — gap has been present for at least 7 complete days in the project timezone. Avoid flagging events that just appeared yesterday; a partial current day or a deploy-day spike can fake stability.
No prior coverage — search popular_insights and existing_inbox_reports before emitting. If a previous run already recommended this gap, don't re-emit.

Park, then emit — the watch lifecycle

Most good recommendations are not emitted the run they're spotted — they're parked until the stability bar crosses. The lifecycle:

Park — write a watch:observability_gaps:<gap> entry carrying the discriminating conditions (the exact checks that make this a real gap), the volume evidence so far, and the earliest emit time (when the 7th complete project-timezone day closes). Future runs inherit the candidate instead of re-deriving it.
Re-verify live, then emit — the run that crosses the bar must re-check every discriminating condition against live data before emitting (coverage can appear, volume can collapse). Never emit off the watch entry alone.
Guard — after emitting, update the watch entry with the finding id and a ~30-day dedupe: no re-emit before then unless a materially new angle appears.
Retire — the entry doesn't live forever. When coverage appears, the recommendation was actioned: delete the entry (or convert it to addressed:). If ~30 days pass and nobody built coverage, that's "recommended but ignored" — convert it to a noise: skip note rather than re-emitting.

Close out

Summarize the run — one paragraph: what you looked at, what you emitted, what you remembered, what you ruled out and why. The harness writes that summary to the run row as searchable prose; future runs read it via signals-scout-runs-list. Do not write a separate "run metadata" scratchpad entry — the run summary already serves that role.

Disqualifiers (skip these)

Builtin events without saved insights — $pageview, $autocapture, $identify, $set, $opt_in, $groupidentify, $feature_flag_called are surfaced through PostHog's product views (Web Analytics, Feature Flags) without needing a custom insight. Don't recommend creating one.
Test events from internal users — pin a noise:observability_gaps:internal-distinct-ids scratchpad entry for known internal distinct_ids and skip them in volume counts.
Events from disabled feature flags — if the event only fires when a flag is disabled or only for a tiny rollout %, the volume is artificially low.
Events on ad-hoc one-off dashboards — a private dashboard with one viewer doesn't count as "covered." Use the popular_insights viewer-count threshold.
Ambient app-shell telemetry — an event whose distinct-user reach is roughly equal to $pageview's fires for nearly every user as part of the app shell, not as a discrete feature metric. Zero saved insights on it is usually intentional; compare reach against $pageview before calling it a gap.
Deliberate engineering firehoses — high-volume internal perf/telemetry events the team consumes via ad-hoc SQL or notebooks rather than saved insights. Before declaring zero coverage, check whether notebooks reference the event — covered by choice is not a gap.
Experiment-exposure events — events that exist to drive an experiment's metrics are covered by the experiment itself. Don't recommend standalone insights for them while the experiment runs.
One-per-user lifecycle events — onboarding, wizard, and setup events fire once per user; their volume is just signup flow-through and rarely deserves a standalone insight.
Time-boxed promotion / campaign events — campaign-shaped events appear, spike, and end by design. Going quiet is not drift, and lacking coverage is not a gap unless the underlying surface (impressions + conversions) persists.
Incident-investigation scaffolding — short-lived events created during an incident, often with incident-named insights attached. They stop firing when the incident closes; flagging the stoppage as drift is a false positive.
One-time backfills / deploy spikes — a newly-instrumented event can dump its whole history in a single ingest, faking a high-reach "stable" metric. Before trusting volume, bucket the candidate by hour (toStartOfHour): if nearly all events and distinct users land in one hour, it's a backfill, not a stable metric — disqualify it (it fails the 7-complete-day bar regardless of raw reach).
Legacy event-name variants — insights that deliberately union an old and a new event name for historical continuity are well-maintained, not drifted. Read the insight's query JSON before declaring a dead event "still referenced."

When in doubt, write a scratchpad entry instead of emitting. Recommendations have a high panic radius for whoever owns the observability surface — false positives erode trust fast.

MCP tools

Direct calls (read-only):

read-data-schema — kind=events for volumes, kind=event_properties / event_property_values for cardinality and breakdowns.
query-trends — confirm recent-window volume + reach numbers cited in evidence.
query-paths — sequence detection for funnel candidates.
insights-list — paginated insight catalog (use sparingly; SQL is faster).
dashboards-get-all — active dashboards + tags.
event-definitions-list — event-definition metadata: verified flag, last_seen_at, created_at, custom-vs-builtin marker.
alerts-list — existing alert configurations and what events they target.
execute-sql over system.insights / system.dashboards / system.cohorts — the fast path for "does an insight reference event X?" type queries.

Harness-level:

signals-scout-project-profile-get — cold orientation snapshot. Has top_events, popular_insights[13], recent_dashboards, existing_inbox_reports already.
signals-scout-scratchpad-search / signals-scout-scratchpad-remember — durable steering.
signals-scout-runs-list / signals-scout-runs-retrieve — what prior runs found.
signals-scout-emit-signal — emit a recommendation finding.

For deeper investigation playbooks, the sandbox image bakes upstream PostHog skills: posthog:querying-posthog-data (HogQL syntax + system.* search patterns) and posthog:exploring-autocapture-events (custom-event vs autocapture distinctions, when each lens applies).

When to stop

Scratchpad + recent runs + profile show every domain you've considered already has coverage or has been recommended → close out empty.
A candidate matches a scratchpad entry with addressed: (recommendation actioned) or noise: (recommended but ignored) key prefix → skip with a one-line note.
You've validated 1-2 high-confidence gaps and emitted them → close out, even if there's more you could look at. Quality over volume — recommendations are a budget, not a target.

"Looked but found nothing meaningful" is a real outcome, not a failure. Every recommendation that doesn't ship is one fewer false positive eroding the inbox.

signals-scout-observability-gaps

Popularity

Invocation

Context Preview

SKILL.md

signals-scout-observability-gaps

Popularity

Invocation

Context Preview

SKILL.md

Signals scout: observability gaps

Quick close-out: is this team big enough to have gaps?

Quick close-out: is this team already saturated?

How a run works

Get oriented

Explore — what good observability gaps look like

1. High-volume custom event with no insight coverage

2. Insight drift — saved insights pointing at zero-volume events

3. Critical event with no alerts configured

4. Dashboard scope gap

5. Funnel candidate — sequential event pattern with no funnel insight

6. Property cardinality / missing breakdown

Recommend — emit a finding

Park, then emit — the watch lifecycle

Close out

Disqualifiers (skip these)

MCP tools

When to stop

Similar Skills

Signals scout: observability gaps

Quick close-out: is this team big enough to have gaps?

Quick close-out: is this team already saturated?

How a run works

Get oriented

Explore — what good observability gaps look like

1. High-volume custom event with no insight coverage

2. Insight drift — saved insights pointing at zero-volume events

3. Critical event with no alerts configured

4. Dashboard scope gap

5. Funnel candidate — sequential event pattern with no funnel insight

6. Property cardinality / missing breakdown

Recommend — emit a finding

Park, then emit — the watch lifecycle

Close out

Disqualifiers (skip these)

MCP tools

When to stop

Similar Skills