From atdd
Adds mutation testing as a third validation layer to ATDD workflow, verifying tests actually catch bugs after acceptance and unit tests pass.
How this skill is triggered — by the user, by Claude, or both
Slash command
/atdd:atdd-mutateThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Add a third validation layer to the ATDD two-stream testing approach.
Add a third validation layer to the ATDD two-stream testing approach. Acceptance tests verify WHAT, unit tests verify HOW, mutation testing verifies that the tests actually catch bugs.
Mutation testing introduces deliberate bugs (mutants) into source code, then runs the test suite. If tests fail, the mutant is killed (good). If tests pass despite the bug, the mutant survives (test gap found).
Source code → introduce mutation → run tests
├── tests FAIL → mutant killed ✓
└── tests PASS → mutant survived ✗
A project with 100% code coverage can still have a 60% mutation score — meaning 40% of introduced bugs go undetected by the test suite.
Run mutation testing after both test streams are green:
This is Phase 6 in the team-based ATDD workflow, or a standalone quality check at any point during development.
The preferred approach is to build a custom mutation tool for the project. This follows the methodology Uncle Bob developed for empire-2025 — a project-specific tool that walks the AST/source tree, applies one mutation at a time, runs targeted tests, and reports survivors.
+ → -, true → false,
>= → >) plus matching logic that walks the AST/form treedae_mutmap.py to mutate only changed functions and to update
the manifest after the run. See the Differential Mutation Testing section.| Category | Examples |
|---|---|
| Arithmetic | + ↔ -, * ↔ /, ++ ↔ -- |
| Comparison | > ↔ >=, < ↔ <= |
| Equality | == ↔ != |
| Boolean | true ↔ false, && ↔ ` |
| Conditional | negate conditions, swap if/if-not |
| Constant | 0 ↔ 1, "" ↔ "mutant" |
| Return value | return true → return false |
| Void method | remove method call entirely |
For the full architecture and detailed reference, see
references/frameworks.md.
When speed of setup is more important than tight integration, use an established mutation framework as a secondary option:
| Language | Framework |
|---|---|
| JavaScript/TypeScript | Stryker |
| Python | mutmut |
| Java/JVM | PIT (pitest) |
| C# | Stryker.NET |
| Rust | cargo-mutants |
| Go | go-mutesting |
| Ruby | mutant |
| Scala | Stryker4s |
For install commands, configuration, and CLI reference, see
references/frameworks.md.
Mutation testing is slow — re-running it after a small change re-mutates every function. Differential mutation testing re-mutates only the functions whose code or covering tests changed, reusing cached results for the rest.
dae_mutmap.py (select before the run,
update after). Results live in a committed mutation-manifest.json
beside the tool, so the saving reaches CI and every clone. A function is
re-mutated when its code, its covering tests, or the mutation operator set
changed. See ${CLAUDE_PLUGIN_ROOT}/references/differential-mutation.md.--incremental), PIT (withHistory), and
mutmut have native incremental modes; enable the framework's incremental flag
and commit its history file. Do not build a separate manifest for the
framework path.Before Step 1, create one TodoWrite todo per step of this workflow (Steps 1–6),
all at once — the full list up front, as a roadmap. Flip each todo to
in_progress / completed as you go. See
${CLAUDE_PLUGIN_ROOT}/references/progress-indicator.md.
Before running mutation testing, confirm:
If no mutation tool is configured:
.build/ (generated tests and IR) and the acceptance/ pipeline code from mutationImportant: Configure mutation testing to target source code only. Never mutate test files, spec files, or generated pipeline code.
On the custom-tool path, run dae_mutmap.py select first and mutate only the
functions it returns — or all of them when it returns ALL. On the framework
path, the incremental flag handles this. Then execute and collect results:
For each surviving mutant:
>= → >, removed function call)Equivalent mutants are mutations that don't change observable behavior
(e.g., changing x = x + 0). These can be ignored.
For each real survivor:
On the custom-tool path, run dae_mutmap.py update to refresh
mutation-manifest.json. The report combines this run's fresh results with the
manifest's cached entries for unchanged functions — mark the cached ones
("unchanged since last_mutated"). Present a summary:
Mutation Testing Report
═══════════════════════
Score: 87% → 95% (after killing survivors)
Killed: 190 / 200
Survived: 10 → 5 (5 equivalent mutants ignored)
New tests: 5 unit tests added
Remaining survivors (equivalent mutants):
- src/utils.js:42 — changed `x + 0` to `x + 1` (no-op mutation)
- ...
| Score | Assessment |
|---|---|
| 90%+ | Strong test suite — minor gaps only |
| 70-89% | Moderate — meaningful gaps to address |
| < 70% | Weak — significant untested behavior |
A 100% mutation score is not always practical or necessary. Focus on killing mutants that represent real behavioral gaps, not chasing equivalent mutants.
Mutation testing extends the existing two-stream approach:
1. Write specs (WHAT) ← acceptance tests
2. Implement with TDD (HOW) ← unit tests
3. Verify test quality (REAL?) ← mutation testing
When using the atdd-team skill, mutation testing is part of Phase 6
(Verify & Harden), run by the architect — an agent whose agent_id
is independent of the implementer and the refiner.
No. Fix failing tests first. Mutation testing assumes a green baseline.
Not practical. Equivalent mutants inflate the denominator. Aim for 90%+ and document the equivalent mutants that remain.
Never mutate generated test files or the acceptance pipeline. Only mutate source code under development.
For detailed framework setup and configuration:
references/frameworks.md — Installation, configuration, and CLI
reference for each supported mutation testing frameworknpx claudepluginhub swingerman/disciplined-agentic-engineering --plugin crap-analyzerPerforms mutation testing using Claude as the mutation engine: generates code mutants, runs tests, tracks kill/survive rates, identifies test gaps, and recommends test improvements. No external mutation tools required.
Runs mutation testing to validate test suite quality across multiple stacks (Stryker, Infection, go-mutesting, mutmut, Vitest). Use when verifying test effectiveness or after generating tests.
Runs mutation tests with Stryker, mutmut, PITest, or go-mutesting to evaluate test suite effectiveness by generating code mutants and verifying test detection. Identifies gaps in test coverage.