Search everything...

Stats

Actions

Available In

qa-experimentation

Name: qa-experimentation
Author: testland

By testland

Experimentation harness testing: SDK-specific testing for Statsig, Optimizely, VWO, Amplitude Experiment; sample-ratio-mismatch (SRM) detection; AB-test validity checklist; guardrail-metrics + peeking-problem references. Distinct from qa-shift-right/feature-flag-experiment-validator (validates experiment results); this plugin tests the experimentation harness itself (SDK behaviour, assignment integrity, statistical-validity gates).

npx claudepluginhub testland/qa --plugin qa-experimentation

Popularity

Stars

Med: 0·Avg: 285

Installs

Med: 0·Avg: 1

What's Inside

Agents1

sample-ratio-mismatch-detector

/sample-ratio-mismatch-detector

Read-only specialist that detects Sample Ratio Mismatch (SRM) in an A/B test by running a chi-square test against the observed-vs-expected allocation. Returns a verdict (clean / SRM detected) and, if SRM detected, a taxonomy of likely root causes per the Microsoft Research KDD 2019 paper 'Diagnosing Sample Ratio Mismatch' (logging bugs, bot filtering, redirects, telemetry drops, randomization bugs). Use proactively at experiment-end before any ship decision, or when investigating surprising results. Preloads guardrail-metrics-reference + peeking-problem-reference.

Skills9

amplitude-experiment-test

/amplitude-experiment-test

Wraps Amplitude Experiment SDK testing patterns: client initialization with API key (or local-flags JSON), the fetch / variant API, exposure-event suppression in tests, and assignment-integrity tests. Use when writing tests for code that uses Amplitude Experiment for A/B testing or flag management. Composes guardrail-metrics-reference + peeking-problem-reference + ab-test-validity-checklist.

ab-test-validity-checklist

/ab-test-validity-checklist

Workflow-driven skill that builds an A/B test validity checklist from an experiment proposal. Walks through the canonical validity gates (pre-registration of OEC + power calc + guardrails, randomization unit + SRM check, assignment integrity, telemetry correctness, peeking discipline per peeking-problem-reference, novelty/primacy assessment, post-experiment SRM re-check, results-interpretation guardrails per Kohavi et al.) and emits a per-experiment checklist + a sign-off form. Use when launching a new experiment, auditing an existing one, or building experimentation governance. Composes guardrail-metrics-reference + peeking-problem-reference.

experiment-results-interpreter

/experiment-results-interpreter

Pure-reference catalog for interpreting the results of an online controlled experiment after harness validity is confirmed. Covers the distinction between practical and statistical significance, reading confidence intervals instead of binary p-values, novelty and primacy effects that cause post-ship reversion, interaction effects from concurrent experiments, Simpson's paradox in segmented results, and the ordered guardrail-check sequence required before a ship decision. Use when a data scientist or PM is ready to draw conclusions from an experiment whose telemetry and randomisation have already passed the ab-test-validity-checklist.

guardrail-metrics-reference

/guardrail-metrics-reference

Pure-reference catalog of guardrail-metric methodology for online controlled experiments. Defines guardrail metrics (metrics that must NOT degrade for an experiment to ship, even if the primary metric improves), the standard guardrail set (latency / errors / engagement / opt-out), the relationship to OEC (Overall Evaluation Criterion) per Kohavi et al., and the trustworthy-experiments framework (Microsoft Experimentation Platform). Use when designing the metric set for a new experiment, auditing existing experiment configs, or reviewing experiment results before ship-decisions. Composes peeking-problem-reference + ab-test-validity-checklist.

optimizely-test

/optimizely-test

Wraps Optimizely Feature Experimentation SDK testing patterns: client initialization with a datafile (offline-friendly), the decide / decideAll API (Optimizely Feature Experimentation, the v5 API), forced-decisions for per-test arm pinning, OptimizelyUserContext + activate / track events, and assignment-integrity tests. Use when writing tests for Optimizely-instrumented application code. Composes guardrail-metrics-reference + peeking-problem-reference + ab-test-validity-checklist.

Stats

Version1.1.0

LanguagePython

Stars0

MaintenanceExcellent

LicenseMIT

Last CommitJun 4, 2026

AddedJun 9, 2026

Actions

View on GitHub View README Plugin Marketplace JSON Homepage

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

Available In

testland-qa

Safety Signals

Caution

Uses power tools

Uses Bash, Write, or Edit tools

README

testland-qa

A rigorously curated quality-engineering plugin marketplace for Claude Code. 77 plugins, 695 components, every one rating-gated before merge.

Why testland-qa

6-dimension quality rubric (D1–D6) before merge, with a hard-reject for uncited claims (citation theater) via the d6 floor
CI-validated composition: every agent's preloaded skills are reference-checked, no dangling deps
Differentiation required: every component must articulate how it differs from its nearest neighbors. Generic, persona-shaped scopes that can't name a trigger condition get sent back for reshaping
Reviewer-calibrated: two-evaluator rubric, A/C/F-grade exemplars in docs/REVIEWER_TRAINING.md

See Quality bar and docs/REVIEWER_CHECKLIST.md.

How it works

The marketplace ships three kinds of building block:

Plugin — an installable bundle scoped to one QA area (e.g. qa-api-testing, qa-load-testing). You install only the plugins your stack needs.
Skill — an atomic, self-contained capability inside a plugin, usually wrapping one tool or one technique (e.g. great-expectations, oauth-flow-test-author). Claude loads a skill when your request matches its trigger; you can also ask for it by name.
Agent — a task-scoped subagent that runs one focused job (e.g. schema-diff-reviewer reviews a migration diff and returns a findings table). An agent may preload one or more skills to do its work.

Installed components stay dormant until a matching task comes up, so adding a plugin doesn't add noise — it adds capability that activates on demand.

Install

Claude Code marketplace (recommended)

/plugin marketplace add testland/qa
/plugin install <plugin-name>@testland-qa

For example:

/plugin install qa-data-quality@testland-qa

Direct URL

/plugin marketplace add https://github.com/testland/qa

Manual / hermetic environments

git clone https://github.com/testland/qa ~/.claude/marketplaces/testland-qa

Before you install: plugins run inside your Claude Code session and ship agent instructions and tool wrappers. Anthropic doesn't vet marketplace contents — review a plugin's components before installing it into a sensitive project. Every component here is rating-gated (see Quality bar), but you remain in control of what runs.

Start here

New to the marketplace? Install one or two plugins for your role rather than everything — components activate on demand, so a focused set keeps things sharp.

If you're a…	Try first
Manual / exploratory tester	qa-manual-testing · qa-bdd · qa-bug-repro
Test automation engineer	qa-web-e2e · qa-api-testing · qa-unit-tests-js
Performance engineer	qa-load-testing · qa-chaos-resilience
Security tester	qa-sast · qa-secrets · qa-dast
Lead / manager / head of quality	qa-roles · qa-test-management · qa-process

The full catalog is below; for versions and component counts see CATALOG.md.

Using an installed plugin

Once a plugin is installed, its skills and agents are available to Claude Code — invoke them by describing the task in plain language. Example with qa-data-quality:

/plugin install qa-data-quality@testland-qa

Ask "add Great Expectations checks to my orders pipeline" → the great-expectations skill scaffolds an ExpectationSuite + Checkpoint and wires the results into a CI gate.
On a database change, ask "review this migration's schema diff" → the schema-diff-reviewer agent returns a Critical / Warning / Info findings table covering breaking-vs-additive changes and downstream impact.

Each plugin's README.md lists its skills and agents and what each one does.

Plugin catalog

View full README on GitHub

qa-experimentation

Popularity

What's Inside

Confidence

README

testland-qa

Why testland-qa

How it works

Install

Claude Code marketplace (recommended)

Direct URL

Manual / hermetic environments

Start here

Using an installed plugin

Plugin catalog

Similar Plugins

fullstack-dev-skills

godot-skills

pr-review-toolkit

feature-dev

nature-skills

unity-dev-toolkit

More by testland

qa-visual-regression

qa-contract-testing

qa-flake-triage

qa-bug-repro

qa-data-quality

testland-qa

Why testland-qa

How it works

Install

Claude Code marketplace (recommended)

Direct URL

Manual / hermetic environments

Start here

Using an installed plugin

Plugin catalog

Popularity

Health & Quality

More by testland

qa-visual-regression

qa-contract-testing

qa-flake-triage

qa-bug-repro

qa-data-quality

Similar Plugins

fullstack-dev-skills

godot-skills

pr-review-toolkit

feature-dev

nature-skills

unity-dev-toolkit