From posthog
Finds observability gaps in PostHog by comparing event streams against saved insights, dashboards, and alerts. Emits recommendations when signal gaps clear a confidence bar.
How this skill is triggered — by the user, by Claude, or both
Slash command
/posthog:signals-scout-observability-gapsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are a focused observability-gaps scout. Spot meaningful gaps between **what events
You are a focused observability-gaps scout. Spot meaningful gaps between what events this team is producing and what they have set up to observe — and emit findings that recommend new insights, dashboard additions, or alerts when a gap clears the confidence bar. An empty findings list is a real outcome; recommending things the team already has, or recommending coverage for noise events, is worse than recommending nothing.
The shape of this scout is different from the other specialists: the findings are recommendations, not problems. The confidence bar is correspondingly higher — a noisy "you should track X" stream destroys the inbox's signal-to-noise ratio. Prefer fewer, well-evidenced recommendations.
If top_events in the project profile is null or shows fewer than ~5 events firing
above 100/day, the project is too quiet for observability-gap analysis to surface real
recommendations. Write one scratchpad entry:
not-applicable:observability_gaps:team{team_id}Close out empty. Future observability-gaps runs read this entry cold and short-circuit in seconds. Re-running with the same key idempotently refreshes the timestamp — the entry stays until the team grows into meaningful volume, at which point the next run rewrites or deletes it.
The opposite end has a fast path too. On a mature project (thousands of insights, hundreds of alerts), a few runs will establish that whole gap families are saturated — every high-volume event already has dense coverage, and newly-emerged events get covered within days. Record that as durable memory instead of rediscovering it every run:
pattern:observability_gaps:<family>-saturated (or one coverage-saturated
entry spanning families)Once saturation is documented, the default run shape changes: check the tripwire against the fresh profile, then run at most one fresh probe — an angle no prior run has covered — to earn the close-out rather than inherit it. If the tripwire is untriggered and the probe comes back clean, close out empty in minutes. Don't re-run coverage SQL a run verified hours ago; that's duplication, not diligence.
One asymmetry to bake in: the coverage families (1, 3, 4, 5, 6) saturate permanently on a mature team — every high-volume event already has dense coverage — but insight drift (family 2) does not. Drift is generated continuously as the product renames and sunsets events, so on an otherwise-saturated team it is the one durably productive angle. Lead with it and treat the coverage families as inherit-saturation unless their tripwire fires.
When several probe angles exist (new-event emergence, alert coverage, insight drift), rotate: each run picks the stalest angle — the one untouched longest — and inherits the others' recent readings. Rotating earns a genuinely fresh close-out each tick without re-running identical SQL hourly.
Cycle between these moves; skip what's not useful, revisit what is.
Three cheap reads cold-start a run:
signals-scout-scratchpad-search (text=gap or text=observability) — durable team
steering inherited from past observability runs. Entries with pattern:, noise:,
addressed:, or dedupe: key prefixes tell you what's normal, what's already
surfaced, what to skip. Critical here because the same gap should never be re-emitted
across runs.signals-scout-runs-list (last 14d) — what prior observability-gap scouts found and
what was ruled out. Skim summaries; pull signals-scout-runs-retrieve only when a
summary mentions a recommendation you're considering.signals-scout-project-profile-get — top_events for volume + reach, popular_insights
for what's already saved, recent_dashboards for the dashboards in active use. This
one read tells you most of what you need to detect gaps.Six families of gap, ordered by typical signal density. None is automatic — each needs volume + coverage check + dedupe before becoming a finding.
Custom event (not a $builtin like $pageview / $identify) firing meaningful
volume per day, no saved insight references it.
Direct calls:
read-data-schema events — surface event names + 24h volumes.execute-sql against system.insights — find insights mentioning the event name in
name, description, or query JSON. Pattern: query::text ILIKE '%{event_name}%'.event-definitions-list for last_seen_at recency and the verified flag —
the team flagged it as worth tracking.Strong signal: event > 1000/day, no insight, verified=true. Weak signal: event
< 100/day, untyped, sporadic.
Volume ranking has a blind spot: a recently-born event with broad reach but low
per-user frequency may never rank into the count-ranked top_events, and a 7-day
query window clamps min(timestamp) so it cannot tell new events from old ones.
Probe emergence directly with a wide window — events table, last 60 days,
event NOT LIKE '$%', grouped by event, keeping only groups where
min(timestamp) >= now() - 14d (genuinely new) and distinct users in the last 7
days clear a reach floor (~500+), ordered by that reach. Each hit is a candidate the
top-events lens structurally cannot see; run it through the same coverage check and
disqualifiers as any other candidate.
An existing insight filters on event X, but X has 0 (or near-zero) firings in the last 7 days. Often a sign of:
signed_up → sign_up_completed) and the insight wasn't updated.Direct calls:
execute-sql over system.insights to extract the events series each insight
filters on.query-trends to measure recent volume of those events.event-definitions-list for similar names suggesting
a rename (Levenshtein-close, same prefix, same property shape).Strong signal: the insight is live (recent last_modified_at, or pinned to a live
dashboard via system.dashboard_tiles) AND its primary event has 0 firings in 7d AND
a similar-named event is firing > 100/day. Note system.insights exposes
last_modified_at but has no last_viewed_at column — prove "live" by
modification recency or a live dashboard tile, not view recency.
Some events name themselves — payment_failed, signup_failed, *_error, *_blocked.
If they fire at all and no alert exists, that's a gap. Use the project's own
patterns: search the event vocabulary for terms like failed, error, blocked,
denied, rejected, timeout, crashed.
Direct calls:
read-data-schema events filtered by name pattern (failed, error, etc).alerts-list — what alerts exist and what they target.query-trends to confirm volume is non-trivial (not just one-off).Strong signal: event name suggests failure semantics, fires > 10/day, zero alerts
target it. Weak signal: name has error but the event is benign developer telemetry.
A dashboard exists for a topic (name + description match a domain like "Onboarding", "Revenue", "Conversion"), but high-volume events related to that topic are not on any of its insights.
Direct calls:
dashboards-get-all — current dashboards + tags + descriptions.system.insights WHERE id IN (dashboard.insight_ids).Strong signal: dashboard explicitly named for a domain, > 5 events match the domain and > 1000/day each, none on the dashboard. Weak signal: arbitrary keyword overlap.
Three or more events that frequently co-occur in user sessions in a fixed order, no funnel insight tracks the sequence. Usually an onboarding flow, signup flow, checkout flow, etc.
Direct calls:
query-paths (one call) on top distinct events to surface common sequences.execute-sql against system.insights WHERE filters::text ILIKE '%FunnelsQuery%'
to find existing funnels.Strong signal: 3-step sequence with > 1000 users completing step 1, > 50% reaching step 2, no existing funnel covering the sequence. Confidence threshold is high here because funnels are subjective — a common sequence isn't always a meaningful funnel.
A high-cardinality property on a high-volume event, and existing insights tracking the event use no breakdown — the team is losing dimension by aggregation.
Direct calls:
read-data-schema event_property_values — see distinct values for a property.execute-sql over system.insights for the event — extract breakdownFilter shape.Strong signal: property has 5-50 distinct values (not unbounded), event > 5000/day, no insight breaks down by it. Weak signal: property has 1000+ distinct values (would explode the chart) or ≤ 2 values (no information added).
A finding here recommends an action, not surfaces a problem. Required elements:
Severity for observability-gap findings is almost always P3 (suggestion). The confidence bar trades off:
popular_insights and existing_inbox_reports
before emitting. If a previous run already recommended this gap, don't re-emit.Most good recommendations are not emitted the run they're spotted — they're parked until the stability bar crosses. The lifecycle:
watch:observability_gaps:<gap> entry carrying the
discriminating conditions (the exact checks that make this a real gap), the
volume evidence so far, and the earliest emit time (when the 7th complete
project-timezone day closes). Future runs inherit the candidate instead of
re-deriving it.addressed:).
If ~30 days pass and nobody built coverage, that's "recommended but ignored" —
convert it to a noise: skip note rather than re-emitting.Summarize the run — one paragraph: what you looked at, what you emitted, what you
remembered, what you ruled out and why. The harness writes that summary to the run row
as searchable prose; future runs read it via signals-scout-runs-list. Do not write
a separate "run metadata" scratchpad entry — the run summary already serves that role.
$pageview, $autocapture, $identify,
$set, $opt_in, $groupidentify, $feature_flag_called are surfaced through
PostHog's product views (Web Analytics, Feature Flags) without needing a custom
insight. Don't recommend creating one.noise:observability_gaps:internal-distinct-ids
scratchpad entry for known internal distinct_ids and skip them in volume counts.popular_insights viewer-count threshold.$pageview's fires for nearly every user as part of the app shell, not
as a discrete feature metric. Zero saved insights on it is usually intentional;
compare reach against $pageview before calling it a gap.toStartOfHour): if nearly all
events and distinct users land in one hour, it's a backfill, not a stable metric —
disqualify it (it fails the 7-complete-day bar regardless of raw reach).When in doubt, write a scratchpad entry instead of emitting. Recommendations have a high panic radius for whoever owns the observability surface — false positives erode trust fast.
Direct calls (read-only):
read-data-schema — kind=events for volumes, kind=event_properties /
event_property_values for cardinality and breakdowns.query-trends — confirm recent-window volume + reach numbers cited in evidence.query-paths — sequence detection for funnel candidates.insights-list — paginated insight catalog (use sparingly; SQL is faster).dashboards-get-all — active dashboards + tags.event-definitions-list — event-definition metadata: verified flag, last_seen_at,
created_at, custom-vs-builtin marker.alerts-list — existing alert configurations and what events they target.execute-sql over system.insights / system.dashboards / system.cohorts —
the fast path for "does an insight reference event X?" type queries.Harness-level:
signals-scout-project-profile-get — cold orientation snapshot. Has top_events,
popular_insights[13], recent_dashboards, existing_inbox_reports already.signals-scout-scratchpad-search / signals-scout-scratchpad-remember — durable steering.signals-scout-runs-list / signals-scout-runs-retrieve — what prior runs found.signals-scout-emit-signal — emit a recommendation finding.For deeper investigation playbooks, the sandbox image bakes upstream PostHog skills:
posthog:querying-posthog-data (HogQL syntax + system.* search patterns) and
posthog:exploring-autocapture-events (custom-event vs autocapture distinctions, when
each lens applies).
addressed: (recommendation actioned) or
noise: (recommended but ignored) key prefix → skip with a one-line note."Looked but found nothing meaningful" is a real outcome, not a failure. Every recommendation that doesn't ship is one fewer false positive eroding the inbox.
npx claudepluginhub anthropics/claude-plugins-official --plugin posthogWatches PostHog dashboards and insights for recent anomalies (spikes, drops, flat-lines, trend breaks) using PostHog's alert-simulate or a MAD-based fallback. Builds a durable watchlist and balances exploit vs. explore across runs.
Delivers a daily briefing of recent changes across an Amplitude instance, surfacing anomalies, trends, and experiments from the last 1-2 days.
Discovers product opportunities by analyzing Amplitude analytics, experiments, session replays, and customer feedback. Synthesizes evidence into RICE-scored, actionable priorities.