By testland
Search relevance testing: 3 skills (elasticsearch-relevance-tests, opensearch-relevance-tests, vector-search-precision-tests) and 1 agent (relevance-regression-reviewer). IR-metrics-driven term + vector + hybrid relevance regression detection.
Author Elasticsearch relevance regression tests using the Ranking Evaluation API (`POST <index>/_rank_eval`) - judgment lists (query + expected docs at ranks), per-query metrics (Precision@K, Recall@K, MRR, DCG, ERR), reproducible test corpora; pair with Quepid + Splainer for interactive judgment authoring.
Evaluates hybrid retrieval pipelines (BM25 + vector + reranker) end-to-end: authors ground-truth judgment sets, computes nDCG@k and MRR over fused results, measures the lift from Reciprocal Rank Fusion vs weighted fusion vs single-stage retrieval, and quantifies reranker (cross-encoder/Cohere/bge) impact. Use when a production system combines lexical and semantic retrieval and you need a numeric relevance baseline, fusion-strategy comparison, or evidence that a reranker is earning its latency cost.
Bootstraps human-relevance judgment lists (query sets, grading scales, rater guidelines, inter-rater agreement, Quepid tooling, TREC-style pooling, and refresh cadence) that serve as ground truth for all three search-relevance skills and the relevance-regression-reviewer agent. Use when a team needs to create or refresh the judgment corpus before running NDCG / MRR / Recall@k evaluations.
Author OpenSearch relevance tests with Search Relevance Workbench (judgment lists, query sets, experiments), `_rank_eval` API (Elasticsearch-fork-compatible), and hybrid BM25 + neural ranking eval. Reuse Elasticsearch judgment list format; document the differences (neural search query DSL, hybrid weighting via `neural_query_enricher`).
Tests Apache Solr search relevance by querying a test core, asserting ranking and score expectations, uploading LTR feature stores and models via the `/schema/feature-store` and `/schema/model-store` REST APIs, using `debugQuery` for per-document score explain, tuning eDisMax parameters (`qf`, `pf`, `mm`, `bq`), and computing judgment-driven nDCG checks against pinned corpora. Use when the search stack runs Apache Solr (enterprise, SolrCloud, or embedded) and you need a pre-deploy relevance gate or LTR model verification.
Uses power tools
Uses Bash, Write, or Edit tools
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
A rigorously curated quality-engineering plugin marketplace for Claude Code. 77 plugins, 695 components, every one rating-gated before merge.
d6 floordocs/REVIEWER_TRAINING.mdSee Quality bar and docs/REVIEWER_CHECKLIST.md.
The marketplace ships three kinds of building block:
qa-api-testing, qa-load-testing). You install only the plugins your
stack needs.great-expectations,
oauth-flow-test-author). Claude loads a skill when your request matches
its trigger; you can also ask for it by name.schema-diff-reviewer reviews a migration diff and returns a findings
table). An agent may preload one or more skills to do its work.Installed components stay dormant until a matching task comes up, so adding a plugin doesn't add noise — it adds capability that activates on demand.
/plugin marketplace add testland/qa
/plugin install <plugin-name>@testland-qa
For example:
/plugin install qa-data-quality@testland-qa
/plugin marketplace add https://github.com/testland/qa
git clone https://github.com/testland/qa ~/.claude/marketplaces/testland-qa
Before you install: plugins run inside your Claude Code session and ship agent instructions and tool wrappers. Anthropic doesn't vet marketplace contents — review a plugin's components before installing it into a sensitive project. Every component here is rating-gated (see Quality bar), but you remain in control of what runs.
New to the marketplace? Install one or two plugins for your role rather than everything — components activate on demand, so a focused set keeps things sharp.
| If you're a… | Try first |
|---|---|
| Manual / exploratory tester | qa-manual-testing · qa-bdd · qa-bug-repro |
| Test automation engineer | qa-web-e2e · qa-api-testing · qa-unit-tests-js |
| Performance engineer | qa-load-testing · qa-chaos-resilience |
| Security tester | qa-sast · qa-secrets · qa-dast |
| Lead / manager / head of quality | qa-roles · qa-test-management · qa-process |
The full catalog is below; for versions and component counts see
CATALOG.md.
Once a plugin is installed, its skills and agents are available to Claude
Code — invoke them by describing the task in plain language. Example with
qa-data-quality:
/plugin install qa-data-quality@testland-qa
great-expectations skill scaffolds an ExpectationSuite + Checkpoint and
wires the results into a CI gate.schema-diff-reviewer agent returns a Critical / Warning / Info findings
table covering breaking-vs-additive changes and downstream impact.Each plugin's README.md lists its skills and agents and what each one does.
Visual regression testing: 7 skills (percy-visual-regression-testing, chromatic-visual-regression-testing, playwright-snapshots, storybook-visual-regression-testing, responsive-breakpoint-runner, visual-baseline-conventions, visual-baseline-gate) and 2 agents (visual-diff-classifier, visual-baseline-curator).
Contract testing for microservices: 5 skills (pact-contract-testing, openapi-contract-diff, graphql-schema-regression, protobuf-compat-checking, contract-compatibility-gate) and 2 agents (contract-drift-investigator, contract-test-scaffolder).
Flake triage: 2 skills (flaky-test-quarantine, flake-pattern-reference) and 5 agents (e2e-flake-bisector, parallel-isolation-checker, regression-bisector, ai-flake-detector, e2e-test-trend-reporter).
Bug reproduction workflow: 1 skill (bug-report-template) and 8 agents (bug-report-from-recording, bug-repro-builder, crash-stack-trace-analyzer, defect-clusterer, defect-trend-narrator, escape-defect-analyzer, failure-classifier, test-failure-debugger).
Data quality testing for analytical pipelines: 5 skills (dbt-testing, great-expectations, soda-checks, data-quality-gate, data-quality-conventions) and 2 agents (schema-diff-reviewer, data-anomaly-triager).
npx claudepluginhub testland/qa --plugin qa-search-relevanceComprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.
Develop, test, build, and deploy Godot 4.x games with Claude Code. Includes GdUnit4 testing, web/desktop exports, CI/CD pipelines, and deployment to Vercel/GitHub Pages/itch.io.
Comprehensive PR review agents specializing in comments, tests, error handling, type design, code quality, and code simplification
Comprehensive feature development workflow with specialized agents for codebase exploration, architecture design, and quality review
A growing collection of Claude-compatible academic workflow bundles. Covers scientific figures, manuscript writing and polishing, reviewer assessment, citation retrieval, data availability, paper reading, literature search, response letters, paper-to-PPTX conversion, and evidence-grounded Chinese invention patent drafting. Rules are organized as reusable skill folders with explicit workflows and quality checks.
Unity Development Toolkit - Expert agents for scripting/refactoring/optimization, script templates, and Agent Skills for Unity C# development