From agentic-qe-fleet
Validates test suite quality via mutation testing: introduces code mutations, runs tests against them, measures kill rate to identify weak tests and assertion gaps. Use for evaluating test effectiveness or before releases.
How this skill is triggered — by the user, by Claude, or both
Slash command
/agentic-qe-fleet:mutation-testingThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
<default_to_action>
<default_to_action> When validating test quality or improving test effectiveness:
Quick Mutation Metrics:
Critical Success Factors:
| Score | Interpretation |
|---|---|
| 90%+ | Excellent test quality |
| 80-90% | Good, minor improvements |
| 60-80% | Needs attention |
| < 60% | Significant gaps |
| Category | Original | Mutant |
|---|---|---|
| Arithmetic | a + b | a - b |
| Relational | x >= 18 | x > 18 |
| Logical | a && b | a || b |
| Conditional | if (x) | if (true) |
| Statement | return x | (removed) |
// Original code
function isAdult(age) {
return age >= 18; // ← Mutant: change >= to >
}
// Strong test (catches mutation)
test('18 is adult', () => {
expect(isAdult(18)).toBe(true); // Kills mutant!
});
// Weak test (mutation survives)
test('19 is adult', () => {
expect(isAdult(19)).toBe(true); // Doesn't catch >= vs >
});
// Surviving mutant → Test needs boundary value
# Install
npm install --save-dev @stryker-mutator/core @stryker-mutator/jest-runner
# Initialize
npx stryker init
Configuration:
{
"packageManager": "npm",
"reporters": ["html", "clear-text", "progress"],
"testRunner": "jest",
"coverageAnalysis": "perTest",
"mutate": [
"src/**/*.ts",
"!src/**/*.spec.ts"
],
"thresholds": {
"high": 90,
"low": 70,
"break": 60
}
}
Run:
npx stryker run
Output:
Mutation Score: 87.3%
Killed: 124
Survived: 18
No Coverage: 3
Timeout: 1
// Surviving mutant: >= changed to >
function calculateDiscount(quantity) {
if (quantity >= 10) { // Mutant survives!
return 0.1;
}
return 0;
}
// Original weak test
test('large order gets discount', () => {
expect(calculateDiscount(15)).toBe(0.1); // Doesn't test boundary
});
// Fixed: Add boundary test
test('exactly 10 gets discount', () => {
expect(calculateDiscount(10)).toBe(0.1); // Kills mutant!
});
test('9 does not get discount', () => {
expect(calculateDiscount(9)).toBe(0); // Tests below boundary
});
// Analyze mutation score and generate fixes
await Task("Mutation Analysis", {
targetFile: 'src/payment.ts',
generateMissingTests: true,
minScore: 80
}, "qe-test-generator");
// Returns:
// {
// mutationScore: 0.65,
// survivedMutations: [
// { line: 45, operator: '>=', mutant: '>', killedBy: null }
// ],
// generatedTests: [
// 'test for boundary at line 45'
// ]
// }
// Coverage + mutation correlation
await Task("Coverage Quality Analysis", {
coverageData: coverageReport,
mutationData: mutationReport,
identifyWeakCoverage: true
}, "qe-coverage-analyzer");
aqe/mutation-testing/
├── mutation-results/* - Stryker reports
├── surviving/* - Surviving mutants
├── generated-tests/* - Tests to kill mutants
└── trends/* - Mutation score over time
const mutationFleet = await FleetManager.coordinate({
strategy: 'mutation-testing',
agents: [
'qe-test-generator', // Generate tests for survivors
'qe-coverage-analyzer', // Coverage correlation
'qe-quality-analyzer' // Quality assessment
],
topology: 'sequential'
});
High code coverage ≠ good tests. 100% coverage but weak assertions = useless. Mutation testing proves tests actually catch bugs.
Focus on critical paths first. Don't mutation test everything - prioritize payment, authentication, data integrity code.
With Agents: Agents run mutation analysis, identify surviving mutants, and generate missing test cases to kill them. Automated improvement of test quality.
After each mutation test run, append results to run-history.json in this skill directory:
node -e "
const fs = require('fs');
const h = JSON.parse(fs.readFileSync('.claude/skills/mutation-testing/run-history.json'));
h.runs.push({date: new Date().toISOString().split('T')[0], mutation_score_pct: SCORE, killed: KILLED, survived: SURVIVED});
fs.writeFileSync('.claude/skills/mutation-testing/run-history.json', JSON.stringify(h, null, 2));
"
Read run-history.json before each run to track score improvements over time.
/qe-test-generation to ensure tests exist/qe-coverage-analysis to prioritize improvement areas/qe-quality-assessment for ship/no-ship decision--testRunner jest explicitly if both jest and vitest are installed>= to > in date comparisons rarely gets killed — add boundary tests--mutate to target specific functions--concurrency defaults to CPU count which OOMs in containers — set to 2npx claudepluginhub proffesor-for-testing/agentic-qe --plugin agentic-qe-fleetRuns mutation tests with Stryker, mutmut, PITest, or go-mutesting to evaluate test suite effectiveness by generating code mutants and verifying test detection. Identifies gaps in test coverage.
Validates test suite quality by introducing deliberate code mutations to expose weak assertions, missing edge cases, and dead test code.
Measuring test quality by mutating code and checking if tests catch mutations.