From posthog
Analyzes session replay patterns across experiment variants to identify user behavior differences between control and test groups.
How this skill is triggered — by the user, by Claude, or both
Slash command
/posthog:analyzing-experiment-session-replaysThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill guides you through analyzing session recordings for experiment variants to understand behavioral differences between control and test groups.
This skill guides you through analyzing session recordings for experiment variants to understand behavioral differences between control and test groups.
Use this skill when:
Before analyzing session replays:
First, retrieve the experiment information and the feature flag variants (source of truth).
Step 1a: Get experiment metadata
You can either:
experiment-get tool if you already have the experiment ID from contextSELECT
e.id,
e.name,
f.key AS feature_flag_key,
e.start_date,
e.end_date
FROM system.experiments e
JOIN system.feature_flags f ON f.id = e.feature_flag_id
WHERE e.id = <experiment_id>
From the experiment data, extract:
feature_flag_key: The feature flag controlling the experimentstart_date and end_date: The experiment's time rangeStep 1b: Get variants from the feature flag
IMPORTANT: Always get variants from the feature flag, NOT from experiment.parameters.feature_flag_variants.
The parameters can be out of sync or deprecated. The feature flag is the source of truth.
Query the feature flag to get the current variants:
SELECT filters.multivariate.variants AS variants
FROM system.feature_flags
WHERE key = '<feature_flag_key>'
Select the variants path directly — selecting the whole filters object gets truncated in results for flags with large targeting configs.
Example structure: [{"key": "control", "name": "Control", "rollout_percentage": 50}, {"key": "test", ...}]
The variant key values (e.g., "control", "test", "variant_a") are what you'll use to filter session recordings.
For each variant in the experiment, construct recording filters that match users exposed to that variant.
Filter structure for a variant (input to query-session-recordings-list):
{
"date_from": "<experiment.start_date>",
"date_to": "<experiment.end_date or current time>",
"filter_test_accounts": true,
"properties": [
{
"type": "event",
"key": "$feature/<feature_flag_key>",
"operator": "exact",
"value": ["<variant_key>"]
}
]
}
Key points:
$feature/<flag_key> event property records which variant the user saw — filtering on it matches recordings containing at least one event from that variantvalue is an array of variant key strings (e.g. ["control"]); for boolean flags use ["true"] or ["false"]type: "flag" / flag_evaluates_to property filter for variant scoping — the recordings query accepts it but silently ignores it, returning unfiltered results (last verified 2026-06-10). If you want to try it anyway, verify it actually filters first: a query with a nonexistent flag key should return zero recordingsfilter_test_accounts: true to exclude test usersUse the query-session-recordings-list tool with the filters constructed in step 2.
Call the tool once per variant to get recordings for each group:
The tool returns a list of recordings with metadata including:
distinct_id — the person's distinct IDrecording_duration, active_seconds, inactive_secondsclick_count, keypress_count, mouse_activity_countconsole_log_count, console_warn_count, console_error_countstart_url — first page URL visitedstart_time / end_time, activity_scoreCompare the recordings between variants by looking for:
Quantitative patterns:
Qualitative insights:
Summarize the behavioral differences between variants, highlighting:
User: "How are users behaving in my checkout experiment?"
Agent steps:
1. Query experiment details (ID: 123, feature_flag_key: "checkout-flow-test", date range: 2025-01-01 to 2025-01-31)
2. Query feature flag "checkout-flow-test" to get variants from filters.multivariate.variants
3. Extract variant keys: "control" and "new-checkout"
4. Build filters for control variant:
- Property filter: { type: "event", key: "$feature/checkout-flow-test", operator: "exact", value: ["control"] }
- Date range: 2025-01-01 to 2025-01-31
5. Call query-session-recordings-list with control filters → 147 recordings found
6. Build filters for new-checkout variant and call query-session-recordings-list → 152 recordings found
7. Compare patterns:
- Control: Average 3m 45s session duration, 12% console errors
- New-checkout: Average 2m 30s session duration, 5% console errors
8. Present findings:
"I analyzed session replays for your checkout experiment. The new checkout flow shows:
- 33% faster completion (2m 30s vs 3m 45s)
- 58% fewer console errors (5% vs 12%)
- Users in the new variant navigate directly to payment, while control users often backtrack to review cart
- Recommendation: The new checkout flow reduces friction and errors"
Do not make assumptions:
Filter construction:
$feature/<flag_key> event property is how you scope recordings to a variant["true"]/["false"] as the value instead of a variant keyError handling:
query-session-recordings-list: Core tool for retrieving session recordings with filtersexperiment-get: Get experiment metadata; experiment-results-get for statistical resultsexecute-sql: Query experiments table for details via HogQLnpx claudepluginhub anthropics/claude-plugins-official --plugin posthogMonitors PostHog A/B experiments for validity threats (SRM, contamination, exposure stalls, flag mutations) and lifecycle drift (zombie experiments, decided-yet-running, stale flag variants).
Designs A/B tests with metrics and variants, analyzes running/completed Amplitude experiments, interprets results statistically for ship decisions.
Analyzes A/B tests: designs experiments with proper metrics and variants, checks running/completed experiments, and interprets results with statistical rigor for ship decisions.