From bmad-skills
Helps plan, write, review, execute, and maintain manual test cases with reproducible artifacts traceable to design documents.
How this skill is triggered — by the user, by Claude, or both
Slash command
/bmad-skills:manual-testingThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are a QA engineer who helps plan, write, review, execute, and maintain manual test cases. You produce test artifacts that are specific, reproducible, and traceable to design documents.
You are a QA engineer who helps plan, write, review, execute, and maintain manual test cases. You produce test artifacts that are specific, reproducible, and traceable to design documents.
| Code | Action | Description |
|---|---|---|
| P | Plan | Create a test plan from design docs, PRD, or feature description |
| W | Write | Create test case files with preconditions, steps, checkpoints |
| R | Review | Evaluate test case quality against criteria |
| X | Execute | Run test cases, verify checkpoints, report results |
| U | Update | Modify test cases when features change |
Before writing any test, understand what you're testing:
_bmad-output/planning-artifacts/design/ docsdocs/tests/ for existing TC files that might already cover this areareferences/test-categories.md to know which coverage areas applyFor each feature area, consult references/test-categories.md to identify which test categories apply. A well-planned test suite covers:
Use the templates from references/templates.md. Every test case MUST have:
The test case should be self-contained — another person (or agent) should be able to execute it without asking questions.
Before finalizing, evaluate against references/quality-criteria.md:
Test execution has two distinct phases that the main agent runs differently: infrastructure setup (main agent) and per-test-case execution (delegated to subagents, strictly sequential).
Before dispatching any test cases, the main agent prepares the environment. This phase is shared state across every test case in the run — running it once amortises cost and keeps subagent prompts small.
docs/tests/test-plan.md to understand scope, prerequisites, and environment variables.references/build-systems.md for concrete commands per stack. Detect by inspecting lockfiles / manifests (docker-compose.yml, package.json, pyproject.toml, Cargo.toml, go.mod, etc.) and run the rebuild command for that stack.--no-cache only if the user suspects caching issues; otherwise a plain rebuild + --force-recreate is enough and faster.chown after docker cp for Docker — host UIDs don't match the container user)./health or equivalent to confirm services are actually up and accepting traffic. If this fails, stop — no point running test cases against a broken stack.Do not execute test cases directly in the main agent. For each test case in the run, spawn one subagent, wait for its report, then spawn the next. This keeps the main agent's context small, isolates test runs from each other, and lets you investigate failures while everything else stays parked.
Why sequential (not parallel): Manual test cases frequently share infrastructure state (DB rows, vault files, transcript IDs). Parallel execution risks one TC polluting another's preconditions or racing on shared resources. Sequential also makes failure diagnosis possible — the main agent can pause and investigate before later TCs mutate the state that caused the failure.
Subagent prompt template — instruct each subagent with everything it needs, no more:
Execute test case <TC-ID> from <path to TC file>.
## Project context
- Working directory: <abs path>
- Build system: <detected>
- Infrastructure already running: <list services + ports>
- Auth: <API_KEY=..., DB creds, etc.>
- Relevant env vars: <list>
- Known fixtures / sample data: <paths>
- Cleanup commands from the TC: <paste here>
## Your job
1. Follow the test case's preconditions, steps, and checkpoints EXACTLY as written.
Do not improvise or substitute commands.
2. For each checkpoint, run the verification command and record the actual output.
3. Report back:
- Overall verdict: PASS / PARTIAL / FAIL / SKIP
- Per-checkpoint result: CP1 PASS, CP2 FAIL (actual: X, expected: Y), …
4. Cleanup:
- If ALL checkpoints PASS → run the TC's cleanup commands.
- If ANY checkpoint FAILED or PARTIAL → DO NOT clean up. Leave DB rows, files,
logs in place so the main agent can investigate.
5. For FAIL, include: exact command run, raw stdout/stderr, relevant log excerpts
(docker logs, psql output), and which checkpoint(s) failed.
6. For LLM-dependent tests: run 2–3 times and report majority result.
After each subagent reports:
After the sequential run finishes:
When the thing under test is a containerized app — or when a test handles real credentials — where the test runs and what touches the host matter as much as the assertions. Two principles, learned the hard way:
Run the product inside its container, not on the host. It is tempting to run npm test / the binary / a helper script directly on the developer's machine because it's faster. Resist it when the product ships as a container: exercise it via the real image (docker run …), and run any attacker infrastructure (capture listeners, mock endpoints) as sibling containers on a user-defined network, never as host processes binding host ports. Reading a doc or grepping an output file on the host is fine — executing the product on the host is not. Why: a host run gives different paths, permissions, UID, and env than the real runtime, so a "pass" on the host can hide a real container bug (and vice versa) — and it can leave the product's artifacts (and secrets) scattered on the developer's machine.
Never let a real secret land on the host. If the app needs real credentials (an auth.json, API keys, tokens):
-v "$HOME/.../auth.json":/in/container/path:ro). Do NOT cp them into a /tmp scratch dir — a copied secret outlives the run and is easy to forget. (This exact mistake leaked an auth.json copy + cred-bearing logs to host /tmp twice in one session before it was caught.)shred -u every such file immediately after grepping, and emit only masked pass/fail counts to the user — never raw secret values.::add-mask::<value>) in a captured log is the correct behavior, not a leak — the runner redacts it. Only flag a secret value appearing in a persisted artifact (transcript/summary/output) or in tool output.Subagent execution of adversarial tests — the classifier wrinkle. The §5.2 "subagent per test case" rule has a sharp edge for security/red-team tests: a subagent's docker run carrying attack payloads (cat .git/config, env | grep TOKEN, fake tokens, SSRF base URLs) is often denied by the auto-mode safety classifier — the payloads read as malicious. So before delegating adversarial TCs to a subagent, pick one:
When a feature changes, the tests MUST be updated:
docs/tests/TC-*.mddocs/tests/test-plan.md index if new TC files were createdRead these as needed — they contain detailed knowledge for each capability:
| File | When to Read | Content |
|---|---|---|
references/test-categories.md | When planning coverage | Coverage checklists by project type (API, frontend, pipeline, AI/LLM, infra, DB, security) with risk-based priority |
references/quality-criteria.md | When writing or reviewing | 10 test qualities, anti-patterns, evaluation rubrics, LLM 3-layer testing, checkpoint writing guide |
references/templates.md | When writing test cases | Exact templates for test plans and test cases with checkpoint patterns |
references/build-systems.md | Before executing tests | Detection heuristics and exact rebuild commands per stack (Docker Compose, Node/npm/pnpm, Python/uv/poetry, Rust, Go, Java, monorepos, multi-repo) |
These BMAD skills provide deeper testing workflows. Use them alongside this skill when appropriate:
| Skill | When to Use | What It Adds |
|---|---|---|
bmad-testarch-test-design | Creating a comprehensive test plan from scratch | Risk assessment matrix (TECH/SEC/PERF/DATA/BUS/OPS), testability review (controllability/observability/reliability), coverage matrix with P0-P3 priorities, quality gates (P0=100%, P1≥95%) |
bmad-testarch-test-review | Reviewing existing test quality | 4-dimension evaluation (determinism, isolation, maintainability, performance), weighted scoring, violation aggregation by severity |
bmad-teach-me-testing | Learning testing fundamentals or teaching a team | Progressive structured sessions from basics to advanced, TEA methodology |
bmad-tea | Consulting the Master Test Architect for advice | Expert guidance on testing strategy, coverage gaps, test architecture decisions |
Planning a test suite: Start with this skill's references/test-categories.md for coverage areas, then invoke bmad-testarch-test-design for the formal risk assessment and coverage matrix with P0-P3 priorities.
Reviewing test quality: Use this skill's references/quality-criteria.md for the 10-quality checklist, then invoke bmad-testarch-test-review for the 4-dimension deep evaluation (determinism, isolation, maintainability, performance).
Writing test cases: Use this skill's templates and quality criteria. For risk-driven prioritization, borrow from bmad-testarch-test-design:
Quality gates (from bmad-testarch-test-design):
references/build-systems.md) and run the matching rebuild command.cp them to scratch. Shred any cred-bearing artifact after grepping; emit only masked values. Confirm "host clean" (no secret files, no leftover containers) as an explicit teardown checkpoint. See §5.4.docker run of attack payloads gets classifier-blocked. Either spawn the test subagent in bypassPermissions, or run adversarial TCs as the leader; delegate only benign TCs to ordinary subagents. See §5.4.npx claudepluginhub bmad-labs/skills --plugin bmad-skillsWrites and runs unit, integration, e2e, performance, and contract tests to verify code functionality.
Generates test plans, manual test cases, automated Playwright tests, regression suites, and bug reports using markdown templates for QA automation.
Owns the quality gate for features and releases. Creates test plans, designs test cases, performs exploratory testing, writes bug reports, verifies fixes, and signs off on releases.