From mutation-testing
Run comprehensive mutation testing to audit test quality, find zombie tests, and propose refactoring
How this skill is triggered — by the user, by Claude, or both
Slash command
/mutation-testing:mutation-testThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Run mutation testing to identify weak tests through semantic code mutations and parallel test execution.
Run mutation testing to identify weak tests through semantic code mutations and parallel test execution.
/mutation-test stripe_handler.py # Standard mode (15 mutations)
/mutation-test --quick api/payments/ # Quick mode (5 mutations)
/mutation-test --deep billing/ # Deep mode (30+ mutations)
/mutation-test # Smart mode (auto-detects target)
When invoked without a path (/mutation-test), the agent will:
Example:
User: /mutation-test
Agent: "I found several recently modified files with tests:
1. stripe_handler.py (modified 5 min ago, 200 tests)
2. payment_processor.py (modified 1 hour ago, 50 tests)
Which would you like to mutation test?"
Mutation testing is the gold standard for measuring test quality. It works by:
Traditional coverage is misleading: 100% line coverage ≠ good tests
Mutation score is truth: % of realistic bugs your tests actually catch
Mutation Score: 23%
This means only 23% of realistic bugs would be caught by your tests.
Target: >80% for critical code, >60% for standard code.
Zombie Tests: 183/200 (91%)
These tests run and pass, but don't actually test anything meaningful.
Example:
- test_retry_validation_1 (line 47)
- Passed despite changing retry_count >= 3 to retry_count > 3
- Missing boundary condition test
Before: 200 tests, 23% mutation score, 12s execution time
After: 20 tests, 85% mutation score, 1.5s execution time
Changes:
- Consolidate 150 redundant tests → 1 parameterized test
- Remove 183 zombie tests
- Add 3 boundary condition tests
Apply refactoring? [Y/n]
The skill launches the test-quality-reviewer agent, which orchestrates:
# Original
if retry_count >= 3:
raise MaxRetriesExceeded()
# Mutations
if retry_count > 3: # Catches off-by-one bugs
if retry_count == 3: # Tests exact boundary
# Original
return subscription.status
# Mutations
return None # Do callers validate?
return "" # Do callers check empty?
# Original
if active and subscribed:
# Mutations
if active or subscribed: # Tests logical correctness
if not (active and subscribed): # Tests negation
/mutation-test stripe_handler.py
Output:
Mutation Score: 23%
Zombie Tests: 183
Your test suite has significant quality issues:
- 91% of tests never failed despite code being broken
- Most are redundant Django model validation tests
Consolidate 150 tests → 1 parameterized test?
/mutation-test --quick payments.py
Output:
Quick mutation test (5 mutations):
Mutation Score: 60% (3/5 caught)
Missing boundary test for discount calculation.
Add this test:
```python
def test_discount_at_boundary():
assert calculate_discount(100) == 10
/mutation-test --deep billing/
Output:
Deep mutation test (35 mutations):
Mutation Score: 78% (27/35 caught)
Good coverage! Minor gaps:
- Add test for subscription renewal edge case
- Strengthen payment validation assertions
Estimated improvement: 78% → 85%
# Target specific file or directory
/mutation-test stripe_handler.py
/mutation-test api/payments/
# Choose mutation count
/mutation-test --quick # 5 mutations (fast)
/mutation-test # 15 mutations (default)
/mutation-test --deep # 30+ mutations (thorough)
# Focus on specific areas
/mutation-test --focus=retry_logic api/
# Skip test removal confirmation
/mutation-test --auto-approve
Track mutation testing progress:
# Create tracking issue
bd create --title="Improve test quality - Stripe" --type=task
# Run mutation testing
/mutation-test stripe_handler.py
# Mutation testing completes, updates beads issue automatically:
# Notes: "Mutation score: 23% → 85%, Tests: 200 → 20"
# Close when done
bd close beads-xxx
✅ Mutation Score: 85%
Your tests catch most realistic bugs. Minor improvements possible.
👍 Mutation Score: 67%
Solid test coverage. Focus on boundary conditions and edge cases.
⚠️ Mutation Score: 52%
Moderate coverage. Review zombie tests and add missing assertions.
🚨 Mutation Score: 23%
Significant test quality issues. Many zombie tests detected.
Recommend: Apply proposed refactoring.
# 150 tests that all look like this:
def test_status_is_active():
assert model.status == "active"
# Mutation testing reveals: All redundant!
# Consolidate → 1 parameterized test
# Zombie test (always passes)
def test_process_payment():
result = process_payment(user)
assert result is not None # Too weak!
# Should be:
def test_process_payment():
result = process_payment(user)
assert result.status == "success"
assert result.amount == expected_amount
# 8 mocks - testing mocks, not real behavior
@patch('stripe.Customer')
@patch('stripe.Subscription')
@patch('stripe.Payment')
# ... 5 more mocks
# Mutation testing catches this: Tests pass despite broken logic
# Recommendation: Replace with integration test using test Stripe account
Parallelization: Runs 15 test suites simultaneously (15x speedup vs sequential)
| Tool | Mutation Score | Refactoring | Zombie Detection | Time |
|---|---|---|---|---|
| mutmut | ✅ Yes | ❌ No | ⚠️ Implicit | Hours |
| Stryker | ✅ Yes | ❌ No | ⚠️ Implicit | Hours |
| /mutation-test | ✅ Yes | ✅ Auto-generated | ✅ Explicit | Minutes |
Q: Will this modify my code? A: No. Mutations are in isolated git worktrees. Main working tree is never touched.
Q: What if I disagree with zombie test identification? A: Review the diff and reject specific changes. You have full control.
Q: Can I mutation test my entire codebase? A: Yes, but start with high-risk areas (payments, auth, etc.). Full codebase mutation testing can take hours.
Q: How is this different from code coverage? A: Coverage measures lines executed. Mutation testing measures if tests actually validate correctness. You can have 100% coverage with 0% mutation score (all zombie tests).
Q: Should I aim for 100% mutation score? A: No. 80%+ is excellent. Diminishing returns above 85%. Some mutations are academic, not practical.
✅ Use mutation testing when:
❌ Don't use mutation testing when:
User: /mutation-test stripe_handler.py
Claude: Running mutation testing on stripe_handler.py...
[3 minutes later]
# Test Quality Audit Report
Mutation Score: 23% (Poor)
Zombie Tests: 183/200 (91%)
Critical Finding:
150 tests validate Django model fields - all redundant
Proposed Refactoring:
Before: 200 tests, 23% score, 12s
After: 20 tests, 85% score, 1.5s
Would you like me to apply the refactoring?
User: Yes
Claude: Applied refactoring. Re-running mutation testing...
[2 minutes later]
Verified! New mutation score: 83%
Tests reduced from 200 → 22
All original passing tests still pass ✅
Committed changes and updated beads issue.
This skill launches the test-quality-reviewer agent which orchestrates the full mutation testing workflow using 4 specialized sub-agents.
npx claudepluginhub citadelgrad/scott-cc --plugin mutation-testingPerforms mutation testing using Claude as the mutation engine: generates code mutants, runs tests, tracks kill/survive rates, identifies test gaps, and recommends test improvements. No external mutation tools required.
Runs mutation testing workflow: mutates source code module-by-module, executes tests per mutation, writes tests for survivors, verifies, commits. Tracks multi-session progress.
Runs mutation testing to validate test suite quality across multiple stacks (Stryker, Infection, go-mutesting, mutmut, Vitest). Use when verifying test effectiveness or after generating tests.