Standalone QE library — 5-core testing surface (plan/authoring/execution/review/insight), SQLite ledger with fixed-SQL oracle, optional wicked-bus + wicked-brain integration.
Run the 3-agent acceptance pipeline (Writer → Executor → Reviewer) on a scenario file
Author scenarios and generate test code — dispatches the authoring skill
Run tests and capture evidence — dispatches the execution skill
Query the ledger — stats, reports, flake detection, coverage gaps
Generate a test strategy — dispatches the plan skill's 4-way router (strategist / risk / testability / AC-quality)
Follows structured wicked-testing test plans step-by-step, collecting evidence artifacts. Executes and captures only — does not judge or grade pass/fail. Writes evidence files to .wicked-testing/evidence/{run-id}/. Use when: acceptance test execution, evidence collection, test plan execution <example> Context: Test plan is ready and needs to be executed step by step. user: "Execute the acceptance test plan for the file upload feature." <commentary>Use acceptance-test-executor for mechanical step execution and evidence capture without judging results.</commentary> </example>
Evaluates evidence artifacts against test plan assertions independently. CRITICAL ISOLATION: Receives ONLY evidence file paths. Never sees execution context. Catches semantic bugs that self-grading misses. Use when: acceptance test review, evidence evaluation, test verdict <example> Context: Executor produced evidence and it needs independent evaluation. user: "Review the evidence from the file upload acceptance tests and render a verdict." <commentary>Use acceptance-test-reviewer for independent, unbiased verdict on test evidence.</commentary> </example>
Reads wicked-testing acceptance scenarios and produces structured, evidence-gated test plans. Transforms qualitative criteria into concrete, verifiable artifact requirements. Use when: acceptance testing, test plan generation, scenario verification design <example> Context: New feature scenario needs a structured test plan. user: "Write an acceptance test plan for the 'user can export data as CSV' scenario." <commentary>Use acceptance-test-writer to produce structured, evidence-gated test plans from scenarios.</commentary> </example>
Static code analysis for testability, quality, and maintainability. Reviews code structure, identifies test-coverage gaps, and flags risky areas. Use when: static analysis, code-quality metrics, testability assessment, maintainability review, coverage-gap identification. Runs on arbitrary source code, anytime — does not require a spec or an active build phase. NOT THIS WHEN: - Reviewing acceptance criteria for SMART+T (pre-code, no implementation yet) — use `requirements-quality-analyst` - Judging whether the implementation matches a spec (post-code divergence detection) — use `semantic-reviewer` - Rendering a full acceptance verdict (writer + reviewer + executor pipeline) — use `/wicked-testing:acceptance`
API contract testing specialist. Designs and reviews consumer-driven contracts, Pact-style tests, OpenAPI contract verification, schema versioning, and breaking-change detection across service boundaries. Use when: API contract tests, CDC, Pact, OpenAPI verification, schema versioning, breaking-change detection, provider/consumer negotiation.
Evidence-gated acceptance testing with three-agent separation of concerns. Writer designs test plans, Executor collects artifacts, Reviewer evaluates independently. Eliminates false positives from self-grading. Use when: "acceptance test", "verify it works", "did it pass", "run acceptance", "test this scenario", "acceptance criteria", "validate the feature", "/wicked-testing:acceptance"
Tier-1 orchestrator for producing tests. Writes scenario files, generates test code (unit / integration / E2E), creates fixtures and test data. The "make me tests" skill. Use when: "write tests", "generate test code", "author scenarios", "create a scenario file", "add fixtures", "test data setup", "automate this scenario".
Tier-1 orchestrator for running tests and capturing evidence. Executes scenarios, invokes framework runners, collects artifacts, and writes the run + verdict to the ledger. Use when: "run the test", "execute this scenario", "run the suite", "acceptance test this", "capture evidence", "prove it works".
Tier-1 orchestrator for reading the ledger. Stats, reports, flake detection, coverage gaps, historical queries. Never writes — only reads. Use when: "has this passed recently", "flake rate", "show me the last N runs", "coverage gaps", "generate a report", "stats", "exploratory session".
Tier-1 orchestrator for test planning. Covers test strategy, risk, testability review, and requirements quality. Dispatches specialist agents based on what the target needs. Use when: "what should I test", "test strategy", "test plan", "risk matrix", "is this testable", "are these requirements testable", "coverage strategy", "shift-left testing".
Uses power tools
Uses Bash, Write, or Edit tools
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
_ _ _ _ _ _
__ _(_) ___| | _____ __| | | |_ ___ ___| |_(_)_ __ __ _
\ \ /\ / / |/ __| |/ / _ \/ _` |_____| __/ _ \/ __| __| | '_ \ / _` |
\ V V /| | (__| < __/ (_| |_____| || __/\__ \ |_| | | | | (_| |
\_/\_/ |_|\___|_|\_\___|\__,_| \__\___||___/\__|_|_| |_|\__, |
|___/
40 specialist agents. 5 coordinating skills. A 3-agent acceptance pipeline that eliminates self-grading.
npx wicked-testing install
Works with Claude Code, Gemini CLI, Cursor, Codex, and Kiro.
When you ask an AI agent to test its own work, it grades its own homework. Self-reported PASS rates on agentic test runs sit 80%+ above human-reviewed rates. The agent that wrote the code also runs the tests and evaluates the results — there is no independence at any layer.
The industry answer has been scripted test frameworks: Playwright, pytest, k6, axe-core. But those only run what you already thought to test. They don't tell you what to test, whether the tests are any good, whether the results mean anything, or why the suite keeps failing intermittently on CI.
wicked-testing gives your AI CLI a complete QE team — from planning through execution through judgment — with enforced separation between the agent that runs tests and the agent that evaluates them.
claude plugins marketplace add mikeparcewski/wicked-testing
claude plugins install wicked-testing
Then:
# Generate a shift-left test strategy from your codebase
/wicked-testing:plan src/auth/ --project auth-service
# Run the 3-agent acceptance pipeline with enforced reviewer isolation
/wicked-testing:acceptance scenarios/login-positive.md
# Ask plain-English questions about your test history
/wicked-testing:insight "what was the last verdict for the login scenario?"
Under the hood: a project-local SQLite ledger, 40 specialist agents grouped into 5 Tier-1 skills, and a public event contract for wicked-garden integration.
The 15 Tier-1 agents form the stable integration surface. wicked-garden and other consumers depend only on these.
| Agent | Invoked By | What It Does |
|---|---|---|
test-strategist | plan | Maps codebase to test scenarios — positive, negative, edge cases |
testability-reviewer | plan | Blocks designs that will be hard to test before a line is written |
requirements-quality-analyst | plan | Applies SMART+T to acceptance criteria — ready-for-design or needs-iteration |
risk-assessor | plan | Scores risks by likelihood × impact, produces a mitigation matrix |
test-designer | authoring | Full write→execute→analyze→verdict loop from a scenario file |
test-automation-engineer | authoring | Generates test code in the project's detected framework |
contract-testing-engineer | authoring | Consumer-driven contract tests (Pact-style), breaking-change detection |
code-analyzer | authoring | Static quality + testability signals, ship/fix/refactor verdict |
acceptance-test-writer | execution | Evidence-gated test plan — every step declares expected evidence and an assertion |
acceptance-test-executor | execution | Executes plan mechanically, captures artifacts, makes no judgment |
acceptance-test-reviewer | review | Reads cold evidence only (allowed-tools: Read) — never sees executor context |
scenario-executor | execution | Runs a scenario markdown file step-by-step |
semantic-reviewer | review | Gap Report per AC: aligned / divergent / missing |
production-quality-engineer | insight | Post-deploy health: healthy / degraded / unhealthy + next action |
test-oracle | insight | Plain-English questions → 12 named parameterized SQL queries. No ad-hoc SQL. |
25 domain specialists routed by the Tier-1 skills. Never break downstream consumers because they are not part of the public contract.
npx claudepluginhub mikeparcewski/wicked-testing --plugin wicked-testingYour coding agent's harness already plans and swarms; Wicked Garden is the curated toolkit for what it can't do alone. Proof, not claims — gates re-derive 'done' through wicked-loom/wicked-vault (fail-closed, independently attested), so a self-asserted pass can't lie its way green. Relationships grep can't see — codegraph + injected bus/dispatch/capability edges power blast-radius & lineage. Plus deterministic multi-file refactor (wicked-patch), cross-session memory (wicked-brain), real multi-model second opinions (jam:council), portable expertise as on-demand skill-refs, and evidence-gated testing (wicked-testing). It reads the shape of your work to apply the right rigor — steering, not a pipeline — then gets out of the way. The compiler stamps a self-contained evidence gate into any repo that runs with no wicked-garden installed. Don't fight the harness; fill its gaps. The evidence gate requires two peers (wicked-vault + wicked-loom); wicked-testing, wicked-brain, wicked-understanding, and wicked-bus are opt-in layers — install what you'll use, the toolkit works without the rest.
Registers the wicked-estate MCP server: a code + infrastructure estate graph for agents — definitions, who-calls-X, blast-radius, scoped context, across 91 languages plus mainframe/IaC estate.
Interactive HTML & presentation builder with an in-browser feedback loop for non-technical business users. Build a draft, review it in the browser, highlight any block, attach plain-language feedback, and watch it regenerate live — with the supervising Claude session as the intelligence in the loop. Versioned, forkable, and exportable to self-contained HTML or PDF.
Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.
Complete creative writing suite with 10 specialized agents covering the full writing process: research gathering, character development, story architecture, world-building, dialogue coaching, editing/review, outlining, content strategy, believability auditing, and prose style/voice analysis. Includes genre-specific guides, templates, and quality checklists.
Comprehensive .NET development skills for modern C#, ASP.NET, MAUI, Blazor, Aspire, EF Core, Native AOT, testing, security, performance optimization, CI/CD, and cloud-native applications
Harness-native ECC operator layer - 67 agents, 271 skills, 92 legacy command shims, reusable hooks, rules, selective install profiles, and production-ready workflows for Claude Code, Codex, OpenCode, Cursor, and related agent harnesses
Intelligent draw.io diagramming plugin with AI-powered diagram generation, multi-platform embedding (GitHub, Confluence, Azure DevOps, Notion, Teams, Harness), conditional formatting, live data binding, and MCP server integration for programmatic diagram creation and management.
v9.44.1 — Patch release for Gemini environment/version detection and qwen auth gating. Run /octo:setup.