From wio
Orchestrates five testing workflows: scan for high-value test candidates, write focused tests, generate realistic workloads, review test value, and diagnose test-suite health. Triggered by prompts about what to test next, improving tests, flaky tests, or workload scenarios.
How this skill is triggered — by the user, by Claude, or both
Slash command
/wio:wio [scan|test|workload|review|doctor] [target][scan|test|workload|review|doctor] [target]The summary Claude sees in its skill listing — used to decide when to auto-load this skill
WIO is one testing workflow skill with five command modes:
references/behavior-to-test-map/overview.mdreferences/behavior-to-test-map/tools.mdreferences/flaky-test-detection-and-management/overview.mdreferences/flaky-test-detection-and-management/tools.mdreferences/fuzz-testing-continuous-fuzzing/overview.mdreferences/fuzz-testing-continuous-fuzzing/tools.mdreferences/index.mdreferences/mocking-and-test-doubles/overview.mdreferences/mocking-and-test-doubles/tools.mdreferences/mutation-testing/overview.mdreferences/mutation-testing/tools.mdreferences/performance-load-and-stress-testing/overview.mdreferences/performance-load-and-stress-testing/tools.mdreferences/property-based-testing/overview.mdreferences/property-based-testing/tools.mdreferences/regression-test/overview.mdreferences/regression-test/tools.mdreferences/resilience-testing-and-fault-injection/overview.mdreferences/resilience-testing-and-fault-injection/tools.mdreferences/risk-based-testing/overview.mdWIO is one testing workflow skill with five command modes:
scan: find the highest-value test candidates for a codebase, change, or scope.test: write one focused high-value test for a selected behavior, code path, or regression risk.workload: generate a realistic, adversarial workload that adds a new failure surface, oracle, sequence, or coverage dimension with controlled variance, replay, and correctness invariants.review: review a newly written or existing test for customer value, developer value, signal quality, and maintainability.doctor: diagnose test-suite health problems in a codebase or scope.Commands are accessed through $wio:
| Command | What it does | Default reference |
|---|---|---|
$wio scan [target] | Find the highest-value test candidates for a codebase, change, or scope. | Behavior To Test Map |
$wio test [target] | Discover a valuable candidate, pick strategy, write one test, validate, review, and keep only if valuable. | Test Level Selection |
$wio workload [target] | Generate a realistic workload that adds new bug-finding value beyond existing workloads, with important user tasks, adversarial edge cases, assertions, invariants, and controlled variance. | Workload Modeling |
$wio review [target] | Review a test for meaningful customer or developer value and return KEEP, REDO, or REMOVE. | Test Oracles And Assertions |
$wio doctor [target] | Diagnose test-suite health problems in a codebase or scope. | Test Suite Health Diagnostics |
Use references/index.md to route from code evidence and candidate failure modes to the right strategy references.
Do not pick a test strategy from memory or from the nearest existing test alone. First inspect the target code, public behavior, existing tests, fixtures, and test commands. Then identify candidate behaviors or workloads and their likely failure mechanisms. Only after candidates exist, load the references needed to choose the strategy.
For every selected candidate, load at least one strategy reference that matches the failure mechanism before recommending or writing a test:
For commands and repo signals in a topic, load the sibling tools.md after the matching overview.md shows that topic is relevant. State which reference files informed the chosen strategy.
Use scan when the user asks what to test next, where coverage would matter, how to prioritize testing work, or which tests would reduce user, production, support, or team risk.
Use test when the user asks to add a test, improve a specific test, cover a bug, or validate a change with a meaningful automated test.
Use workload when the user asks for a realistic user-session workload, scenario generator, traffic model, load/performance scenario, browser journey mix, synthetic user flow, or varied workload that still preserves a stable task goal.
If the user asks to generate a workload, treat existing workloads as evidence and reusable infrastructure, not as the deliverable. A generated workload must add at least one new failure surface, adversarial class, oracle/invariant, state model, dependency fault, user/session path, or coverage dimension. A wrapper, runner, seed sweep, parameter expansion, or documentation-only change around an existing workload is not a generated workload unless the user explicitly asked for a runner or the wrapper adds a new oracle or adversarial model.
Use review when the user asks whether a test is worth keeping, asks for test review, or after $wio test writes or changes a test.
Use doctor when the user asks to audit tests, review suite quality, find flaky or low-value tests, inspect CI test health, or explain why a test suite is slow, noisy, or low-signal.
If the user explicitly names a WIO command, follow that mode. If the command is omitted, infer the mode from the request. If no command or target is provided, show the command table and ask what they want to do. If multiple modes apply, start with scan before test or workload, and use doctor only for existing suite health.
REDO or REMOVE, not KEEP.$wio workload generate. If the change only reruns, parameterizes, documents, or sweeps an existing workload, call it a runner and explain the missing new failure surface.KEEP, REDO, or REMOVE.scripts/test-review-reminder.py: hook helper that reminds the active agent to validate and apply $wio review after test files change.scripts/check-wio-report.py: optional checker for saved markdown reports. Use python3 scripts/check-wio-report.py review report.md when a WIO output is written to a file and you need a quick structure check.When the host supports subagents or parallel agents and user/host policy permits them, use the WIO subagent specs from the official host locations to improve quality without duplicating guidance:
wio-candidate-scout: read-only discovery of high-value test candidates and real risk.wio-strategy-critic: read-only challenge of the chosen test level, oracle, doubles, fixtures, and validation loop before implementation.wio-test-reviewer: post-implementation review that returns KEEP, REDO, or REMOVE.Subagents must inspect targeted code and tests before loading targeted WIO references. They return findings to the main agent; they do not write reports or copy reference content. Claude plugin subagents live in plugins/wio/agents/; Claude project subagents can live in .claude/agents/; Codex custom agents live in .codex/agents/. Installing WIO with skills add does not create those runtime files.
For $wio test, use this sequence:
wio-candidate-scout if available.wio-strategy-critic to challenge the proposed strategy before editing when available.wio-test-reviewer if available.KEEP; otherwise revise (REDO) or remove (REMOVE).If subagents are unavailable, perform the same stages in the main agent and explicitly label the review stage.
Find the best parts to test next, the right strategy for each, and the ROI of testing them. This mode is read-only: inspect the repo and existing tests before loading strategy references; do not edit files.
Start with code evidence: inspect product surfaces, target implementation, existing tests, fixtures, and commands. Then use Behavior To Test Map to organize candidates, Risk-Based Testing for ranking tradeoffs, and Test Level Selection only after candidates exist.
Workflow:
Output template:
## Scope And Evidence
[target, files/commands inspected, tests/CI found]
## Ranked Candidates
1. [behavior] - impact: [why it matters], risk: [fault mechanism], references: [files used], strategy: [level/tool], cost: [small/medium/large]
## Best Next Test
[first investment and why it beats the alternatives]
## Avoid
[coverage-padding or low-signal tests to skip]
## Open Questions
[only questions that would materially change the ranking]
Write tests only when they protect meaningful behavior. A useful test reduces future user errors, production incidents, support work, debugging time, review time, or release risk. Do not jump straight to implementation or strategy selection.
Start with code evidence: inspect the target behavior, implementation, existing tests, fixtures, and commands. After selecting a candidate, load Test Level Selection before choosing the test layer and Test Oracles And Assertions before writing assertions. Add data, doubles, feedback, or specialized references when those decisions affect signal.
Workflow:
KEEP, REDO, or REMOVE.Output template:
## Candidate
[behavior/failure mode chosen and why it beat alternatives]
## Strategy
[test level, oracle, data/fixtures, doubles, feedback loop, references used, and why this preserves the real risk]
## Changes
[files changed and concise implementation summary]
## Validation
[command run and result, or why it was not run]
## Review
Verdict: KEEP | REDO | REMOVE
Protected behavior: [...]
Value: [...]
Signal strengths: [...]
False-confidence risks: [...]
Falsification check: [plausible bug and assertion/invariant that would fail]
## Remaining Risk
[what this test does not cover]
Generate workloads that exercise meaningful user sessions, not one-off happy-path tests or wrappers around existing runners. A workload should cover important tasks a real user, API client, operator, or background process performs during a session, with adversarial but realistic misuse and controlled variance that changes data, ordering, scale, timing, or optional branches while preserving the same core task.
Start with code evidence: inspect the user/session entry points, implementation, existing tests, fixtures, commands, and workload/E2E tooling. Existing workloads should reveal gaps, not become the default implementation. After identifying the actor, goal, bug-prone interactions, and existing workload coverage, load Workload Modeling and Test Oracles And Assertions. Add performance, resilience, security, user-behavior, property-based, fuzzing, or data references only when the workload's risk depends on those dimensions.
Originality gate: before editing, state what existing workloads already cover and what the new workload adds. The new workload must introduce at least one of: a new failure surface, adversarial class, oracle/invariant, state model, dependency fault, user/session path, data shape, timing/order dimension, or replay artifact. If the best next step is only a wrapper, runner, seed sweep, or parameter expansion, say that plainly and do not call it a generated workload unless the user asked for that.
Workflow:
Output template:
## Workload
Actor: [...]
Goal: [...]
Shape: [browser/API/CLI/job/load/synthetic/stateful]
References used: [...]
## Existing Coverage
Existing workloads found: [...]
What they cover: [...]
Gap this workload fills: [...]
Why this is not only a wrapper/runner/seed sweep: [...]
## Coverage
Interactions: [...]
Bug-prone areas: [...]
New failure surface/adversarial class/oracle: [...]
Invariants/assertions: [...]
## Adversarial Model
Misuse paths: [...]
Invalid transitions: [...]
Boundary inputs: [...]
Duplicate/replayed actions: [...]
Permission/tenant edges: [...]
Dependency/time/concurrency faults: [...]
## Variance And Replay
Seed: [...]
Variable inputs/branches/timing/scale: [...]
Replay command or notes: [...]
## Implementation And Validation
Tooling: [...]
Files changed: [...]
Command/result: [...]
## Falsification Check
Plausible bug caught: [...]
Assertion/invariant that fails: [...]
Manual mutation or fault tried, if any: [...]
## Limits
[environment, data, dependency, runtime, cleanup, or flake risk]
Review a test as a quality gate, not as a rubber stamp. The test must justify its existence through customer value, production value, support/debugging value, review value, or release confidence.
Start with code evidence: inspect the test diff and protected production behavior. Then load Test Oracles And Assertions. Add data, doubles, feedback-loop, or mutation references only when the test's value depends on those concerns.
Workflow:
KEEP, REDO, or REMOVE with evidence.Output template:
Verdict: KEEP | REDO | REMOVE
Protected behavior:
[what behavior or failure mode this test claims to protect]
Value:
[customer, operator, production, support, release, review, or developer-flow value]
Signal strengths:
[why it would fail for the meaningful regression]
False-confidence risks:
[weak assertions, unrealistic setup, over-mocking, snapshots, flake risk, wrong feedback loop]
References used:
[files that informed the decision]
Falsification check:
[plausible bug and assertion/invariant that would fail]
Required action:
[none for KEEP; exact redesign for REDO; removal reason for REMOVE]
Run a read-only test-suite health scan and report likely concerns with evidence. Do not edit, delete, rewrite, quarantine, or disable tests.
Start with: Test Suite Health Diagnostics. Add flake, feedback-loop, pyramid, mutation, data, doubles, or oracle references only after the suite evidence points there.
Workflow:
Output template:
## Scope And Evidence
[stack, frameworks, CI, test commands, files inspected, tests run/not run]
## Overall
Grade: [A-F or Low/Medium/High trust]
Confidence: [low/medium/high with reason]
## Top Concerns
1. Severity: [P0-P3]
Concern: [...]
Evidence: [...]
Why it matters: [...]
Suggested action: [...]
## Rubric
Reliability: [...]
Speed: [...]
Signal: [...]
Diagnostic value: [...]
Maintainability: [...]
Risk coverage: [...]
Monitoring/CI fit: [...]
## Quick Wins And Questions
[small actions and only material follow-up questions]
npx claudepluginhub workersio/skills --plugin wioSurveys test suites across five phases: unit, integration, E2E (browser), fuzz coverage gaps, and test quality. Produces findings and proposes tickets for remediation.
Provides test design patterns, coverage strategies (80-100% targets), types (unit/integration/E2E), organization, and best practices for comprehensive test suites. Use for new suites, coverage improvement, or test design.
Creates and manages unit and integration tests by analyzing codebase, auto-detecting test frameworks, and generating tests that follow project conventions.