By testland
ML model testing: 5 skills (giskard-tests, deepchecks-tests, evidently-monitoring, fairlearn-fairness, alibi-explainability) and 1 agent (model-fairness-reviewer). Covers tabular + NLP + vision validation, drift monitoring, group fairness, and per-prediction explainability.
Receives a live Evidently drift alert (HTML or JSON report) and produces a ranked root-cause hypothesis list plus a remediation checklist. Distinguishes upstream schema change, seasonality, training-serving skew, pipeline bug, and genuine population shift; recommends rollback, retrain, quarantine, feature investigation, or alert re-tuning as appropriate. Use when a DataDriftPreset or TestColumnDrift alert fires in production monitoring and the on-call engineer needs a structured triage before acting.
Adversarial reviewer of ML model fairness + explainability evidence before promotion. Validates that fairness metrics (Fairlearn MetricFrame), drift detectors (Evidently/Deepchecks), vulnerability scans (Giskard), and per-prediction explanations (Alibi) collectively cover the model's risk class. Refuses to ✅ when sensitive features are missing, when intersectional analysis is absent, or when a high-risk model lacks per-prediction explanation logging.
Use Alibi Explain to generate model explanations - Anchors, Integrated Gradients, Kernel/Tree SHAP, ALE, Counterfactual Instances. Wires explainer.fit + explainer.explain into model-evaluation pipelines so that every flagged prediction ships with a "why" record auditors can reason about.
Run Deepchecks suites (data integrity, train-test validation, model evaluation) on tabular / NLP / vision data + models. Pass `result.passed_conditions()` to CI to gate on regressions; the same checks run during research, CI, and production monitoring per the Deepchecks lifecycle posture.
Use Evidently OSS (100+ evaluation metrics, declarative testing API) to detect data drift, target drift, and model performance regression - wired into CI as a gate and into production monitoring as a continuous check. Reports as HTML + JSON for both human review and pipeline assertions.
Compute group fairness metrics (selection rate, demographic parity, equalized odds) per sensitive feature with `MetricFrame`, then mitigate disparities using Reductions algorithms (`ExponentiatedGradient` with constraint = `DemographicParity`/`EqualizedOdds`). Wire group-disaggregated assertions into the model-evaluation gate.
Test ML models with Giskard's scan() vulnerability detector + test catalog (performance, robustness, fairness, data leakage, ethical issues) for tabular and NLP models. Wrap a prediction function in giskard.Model + a DataFrame in giskard.Dataset; emit test suites that pass/fail in CI.
Uses power tools
Uses Bash, Write, or Edit tools
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
A rigorously curated quality-engineering plugin marketplace for Claude Code. 77 plugins, 695 components, every one rating-gated before merge.
d6 floordocs/REVIEWER_TRAINING.mdSee Quality bar and docs/REVIEWER_CHECKLIST.md.
The marketplace ships three kinds of building block:
qa-api-testing, qa-load-testing). You install only the plugins your
stack needs.great-expectations,
oauth-flow-test-author). Claude loads a skill when your request matches
its trigger; you can also ask for it by name.schema-diff-reviewer reviews a migration diff and returns a findings
table). An agent may preload one or more skills to do its work.Installed components stay dormant until a matching task comes up, so adding a plugin doesn't add noise — it adds capability that activates on demand.
/plugin marketplace add testland/qa
/plugin install <plugin-name>@testland-qa
For example:
/plugin install qa-data-quality@testland-qa
/plugin marketplace add https://github.com/testland/qa
git clone https://github.com/testland/qa ~/.claude/marketplaces/testland-qa
Before you install: plugins run inside your Claude Code session and ship agent instructions and tool wrappers. Anthropic doesn't vet marketplace contents — review a plugin's components before installing it into a sensitive project. Every component here is rating-gated (see Quality bar), but you remain in control of what runs.
New to the marketplace? Install one or two plugins for your role rather than everything — components activate on demand, so a focused set keeps things sharp.
| If you're a… | Try first |
|---|---|
| Manual / exploratory tester | qa-manual-testing · qa-bdd · qa-bug-repro |
| Test automation engineer | qa-web-e2e · qa-api-testing · qa-unit-tests-js |
| Performance engineer | qa-load-testing · qa-chaos-resilience |
| Security tester | qa-sast · qa-secrets · qa-dast |
| Lead / manager / head of quality | qa-roles · qa-test-management · qa-process |
The full catalog is below; for versions and component counts see
CATALOG.md.
Once a plugin is installed, its skills and agents are available to Claude
Code — invoke them by describing the task in plain language. Example with
qa-data-quality:
/plugin install qa-data-quality@testland-qa
great-expectations skill scaffolds an ExpectationSuite + Checkpoint and
wires the results into a CI gate.schema-diff-reviewer agent returns a Critical / Warning / Info findings
table covering breaking-vs-additive changes and downstream impact.Each plugin's README.md lists its skills and agents and what each one does.
npx claudepluginhub testland/qa --plugin qa-ml-modelsVisual regression testing: 7 skills (percy-visual-regression-testing, chromatic-visual-regression-testing, playwright-snapshots, storybook-visual-regression-testing, responsive-breakpoint-runner, visual-baseline-conventions, visual-baseline-gate) and 2 agents (visual-diff-classifier, visual-baseline-curator).
Contract testing for microservices: 5 skills (pact-contract-testing, openapi-contract-diff, graphql-schema-regression, protobuf-compat-checking, contract-compatibility-gate) and 2 agents (contract-drift-investigator, contract-test-scaffolder).
Flake triage: 2 skills (flaky-test-quarantine, flake-pattern-reference) and 5 agents (e2e-flake-bisector, parallel-isolation-checker, regression-bisector, ai-flake-detector, e2e-test-trend-reporter).
Bug reproduction workflow: 1 skill (bug-report-template) and 8 agents (bug-report-from-recording, bug-repro-builder, crash-stack-trace-analyzer, defect-clusterer, defect-trend-narrator, escape-defect-analyzer, failure-classifier, test-failure-debugger).
Data quality testing for analytical pipelines: 5 skills (dbt-testing, great-expectations, soda-checks, data-quality-gate, data-quality-conventions) and 2 agents (schema-diff-reviewer, data-anomaly-triager).
Unity Development Toolkit - Expert agents for scripting/refactoring/optimization, script templates, and Agent Skills for Unity C# development
Comprehensive .NET development skills for modern C#, ASP.NET, MAUI, Blazor, Aspire, EF Core, Native AOT, testing, security, performance optimization, CI/CD, and cloud-native applications
Modern R development skills for Claude Code - tidyverse patterns, rlang metaprogramming, Bayesian inference, performance optimization, and more
Complete creative writing suite with 10 specialized agents covering the full writing process: research gathering, character development, story architecture, world-building, dialogue coaching, editing/review, outlining, content strategy, believability auditing, and prose style/voice analysis. Includes genre-specific guides, templates, and quality checklists.
Complete collection of battle-tested Claude Code configs from an Anthropic hackathon winner - agents, skills, hooks, and rules evolved over 10+ months of intensive daily use
Comprehensive SEO analysis plugin for Claude Code. 25 sub-skills (21 core + 1 orchestrator + 1 framework + 2 extension mirrors) and 18 sub-agents cover technical SEO, content quality, schema, sitemaps, Core Web Vitals, local SEO, backlinks, AI/GEO, ecommerce, hreflang, SXO, clustering, drift monitoring, and Google APIs. Includes optional MCP extensions, SPA-aware rendering, portability, and hardened SSRF/DNS-rebinding safe fetchers.