From ai-dev-flow
Use when test suite needs cleanup, when tests are brittle or break on refactors without behavior change, when there are tests coupled to implementation details, or when test maintenance burden is high
How this skill is triggered — by the user, by Claude, or both
Slash command
/ai-dev-flow:cleaning-testsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Multi-agent test analysis that identifies low-value tests, presents a categorized report for user approval, and removes approved tests with cleanup of leftover dead code.
Multi-agent test analysis that identifies low-value tests, presents a categorized report for user approval, and removes approved tests with cleanup of leftover dead code.
Core principle: A test's value is determined by whether it verifies observable behavior. Tests coupled to implementation details create maintenance burden without catching real bugs.
Announce at start: "I'm using the cleaning-tests skill to analyze the test suite for low-value tests."
All tools are functional and will work without error. Do not test tools or make exploratory calls. Every tool call should have a clear purpose. Make this clear to every subagent you launch.
digraph process {
rankdir=TB;
"Step 0: Discovery\n(controller)" [shape=box];
"No test files?" [shape=diamond];
"STOP" [shape=box, style=filled, fillcolor=lightyellow];
subgraph cluster_parallel {
label="Parallel Analysis";
"Agent 1: Analyze batch 1" [shape=box];
"Agent 2: Analyze batch 2" [shape=box];
"Agent N: Analyze batch N" [shape=box];
}
"Step 2: Consolidate + Report\n(controller)" [shape=box];
"No issues found?" [shape=diamond];
"STOP: Report clean" [shape=box, style=filled, fillcolor=lightyellow];
"Step 3: User approval\n(AskUserQuestion)" [shape=box];
"User approves?" [shape=diamond];
"STOP: Keep as-is" [shape=box, style=filled, fillcolor=lightyellow];
subgraph cluster_cleanup {
label="Parallel Cleanup";
"Remove flagged tests" [shape=box];
"Clean up dead code" [shape=box];
}
"Step 5: Verify remaining\ntests pass" [shape=box];
"Step 0: Discovery\n(controller)" -> "No test files?";
"No test files?" -> "STOP" [label="yes"];
"No test files?" -> "Agent 1: Analyze batch 1" [label="no"];
"No test files?" -> "Agent 2: Analyze batch 2" [label="no"];
"No test files?" -> "Agent N: Analyze batch N" [label="no"];
"Agent 1: Analyze batch 1" -> "Step 2: Consolidate + Report\n(controller)";
"Agent 2: Analyze batch 2" -> "Step 2: Consolidate + Report\n(controller)";
"Agent N: Analyze batch N" -> "Step 2: Consolidate + Report\n(controller)";
"Step 2: Consolidate + Report\n(controller)" -> "No issues found?";
"No issues found?" -> "STOP: Report clean" [label="yes"];
"No issues found?" -> "Step 3: User approval\n(AskUserQuestion)" [label="no"];
"Step 3: User approval\n(AskUserQuestion)" -> "User approves?";
"User approves?" -> "STOP: Keep as-is" [label="no"];
"User approves?" -> "Remove flagged tests" [label="yes"];
"User approves?" -> "Clean up dead code" [label="yes"];
"Remove flagged tests" -> "Step 5: Verify remaining\ntests pass";
"Clean up dead code" -> "Step 5: Verify remaining\ntests pass";
}
You (the controller) handle discovery directly. Do NOT delegate to a subagent.
jest.config.*, vitest.config.*, pytest.ini, pyproject.toml [tool.pytest], .mocharc.*, karma.conf.*, phpunit.xml, etc.**/*.test.{ts,js}, **/*.spec.{ts,js}, **/test_*.py, **/*_test.go)If no test files found → stop ("No test files found in the project").
Split test files evenly among agents. Each agent receives:
Each agent reads its assigned test files AND their corresponding source code, then returns a structured list:
For each flagged test:
- File path and test name (describe + it text)
- Category (from Flag Criteria below)
- Explanation: WHY this test is low-value (1-2 sentences)
- Verdict: REMOVE or BORDERLINE
Agents must also list tests they reviewed and classified as KEEP (just file path + test name, no explanation needed).
docs/reviews/test-cleanup-YYYY-MM-DD.md using the template belowUse AskUserQuestion to present the summary and ask the user to approve. Options:
docs/reviews/test-cleanup-YYYY-MM-DD.md", then ask again with the same optionsNEVER remove tests without explicit user approval.
After approval, split flagged tests among agents. Each agent:
it(...) blocks)describe blocks left behind// BAD TESTS markers that no longer apply)Run the test suite to confirm remaining tests still pass. If tests fail, investigate — likely a cleanup error (removed a shared helper that was still needed).
Share this with every analysis agent.
| Category | Signal | Example |
|---|---|---|
| Private state access | Accessing internals via as any, reflection, __private, @VisibleForTesting | (service as any).cache.get(id) |
| Mock call counts | Asserting exact call counts on internal dependencies (not side effects) | expect(repo.save).toHaveBeenCalledTimes(1) |
| Call ordering | Tracking sequence of internal method calls | expect(callOrder).toEqual(['findByEmail', 'save']) |
| Internal args | Asserting exact arguments passed to internal collaborators | expect(repo.save).toHaveBeenCalledWith(exactObj) |
| Log format | Asserting exact log message strings or call counts | expect(logger.log).toHaveBeenCalledWith('Created user 1') |
| Tautological | Testing that a mock returns its configured value through a passthrough with no conditional logic or side effects | Mock returns X → assert result is X. If the code path has branching logic (even if output matches the mock), consider whether it exercises a meaningful path — classify as BORDERLINE if unsure |
| Constant testing | Asserting values of exported constants | expect(MAX_LENGTH).toBe(100) |
| Type/existence | Testing typeof fn === 'function' — always flag in typed languages (TS/Java/Go), flag in JS only if other tests already exercise the export | expect(typeof capitalize).toBe('function') |
| Redundant snapshot | Snapshot on simple deterministic output already covered by direct assertions | toMatchInlineSnapshot on a pure function output |
| Redundant coverage | Multiple tests covering the exact same code path with no new branch | Two tests for canCancel that both exercise the true branch |
| No assertions | Test body has no assertions or only trivially-true assertions | it('works', () => { createThing(); }) |
These are legitimate tests even if they use mock assertions:
expect(inventory.release).toHaveBeenCalled() after payment failure)true cases in a boolean return)When in doubt, classify as BORDERLINE, not REMOVE. False positives erode user trust.
# Test Cleanup Report
**Date**: YYYY-MM-DD
**Project**: <project path>
**Test framework**: <detected framework>
**Total test files**: N
**Total tests analyzed**: N
**Tests flagged for removal**: N
**Tests flagged as borderline**: M
## Summary by Category
| Category | Count | Files Affected |
|----------|-------|----------------|
| Implementation detail: private state access | N | file1, file2 |
| Implementation detail: mock call counts | N | ... |
| ... | ... | ... |
## Flagged Tests
### <file-path>
#### REMOVE: <test name>
- **Category**: <category>
- **Why**: <1-2 sentence explanation>
#### BORDERLINE: <test name>
- **Category**: <category>
- **Why**: <explanation>
- **Note**: <why it's borderline>
## Tests Kept
<N tests across M files classified as valuable behavioral tests.>
| Mistake | Fix |
|---|---|
| Removing tests without user approval | ALWAYS use AskUserQuestion before any removal |
| Flagging mock assertions that verify critical side effects | Read the DO NOT FLAG section — rollback/cleanup verification is legitimate |
| Modifying source code during cleanup | Only modify test files — source code is out of scope |
| Not running tests after cleanup | Always verify remaining tests pass in Step 5 |
| Flagging all mock-based assertions | Mock assertions are fine when verifying observable behavior or external contracts |
| Running analysis sequentially | Split test files among parallel agents for large suites |
| Removing test helpers still used by other tests | Check all usages before removing any helper/fixture |
| Flagging internal args tests when no other test verifies persistence | If removing the test would leave a gap where wrong data could be saved undetected, classify as BORDERLINE |
| Not writing the report file | Always write to docs/reviews/test-cleanup-YYYY-MM-DD.md |
STOP if you catch yourself:
npx claudepluginhub dethon/ai-dev-flow --plugin ai-dev-flowEvaluates tests for deletion by checking if removing them would let a real bug reach production. Use on legacy suites, slow CI, or post-refactor sweeps.
Audits existing test suite alignment after code changes, identifying stale assertions, tests for deleted code paths, and coincidence tests. Use after any code modification.
Reviews test code for conformance to the test-designing-guide and test-writing-guide, producing a refinement plan.