Skill

cleaning-tests

Use when test suite needs cleanup, when tests are brittle or break on refactors without behavior change, when there are tests coupled to implementation details, or when test maintenance burden is high

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/ai-dev-flow:cleaning-tests

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Multi-agent test analysis that identifies low-value tests, presents a categorized report for user approval, and removes approved tests with cleanup of leftover dead code.

SKILL.md

237 lines · ~2.9k tokens

Stats

LanguageJavaScript

Stars3

MaintenanceExcellent

Last CommitMar 10, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Cleaning Tests

Overview

Multi-agent test analysis that identifies low-value tests, presents a categorized report for user approval, and removes approved tests with cleanup of leftover dead code.

Core principle: A test's value is determined by whether it verifies observable behavior. Tests coupled to implementation details create maintenance burden without catching real bugs.

Announce at start: "I'm using the cleaning-tests skill to analyze the test suite for low-value tests."

When to Use

Tests frequently break when refactoring without behavior changes
High mock count makes tests hard to maintain
Test suite is slow and bloated
User explicitly asks for test cleanup
Preparing a codebase for major refactoring

Agent Assumptions

All tools are functional and will work without error. Do not test tools or make exploratory calls. Every tool call should have a clear purpose. Make this clear to every subagent you launch.

The Process

digraph process {
    rankdir=TB;

    "Step 0: Discovery\n(controller)" [shape=box];
    "No test files?" [shape=diamond];
    "STOP" [shape=box, style=filled, fillcolor=lightyellow];

    subgraph cluster_parallel {
        label="Parallel Analysis";
        "Agent 1: Analyze batch 1" [shape=box];
        "Agent 2: Analyze batch 2" [shape=box];
        "Agent N: Analyze batch N" [shape=box];
    }

    "Step 2: Consolidate + Report\n(controller)" [shape=box];
    "No issues found?" [shape=diamond];
    "STOP: Report clean" [shape=box, style=filled, fillcolor=lightyellow];
    "Step 3: User approval\n(AskUserQuestion)" [shape=box];
    "User approves?" [shape=diamond];
    "STOP: Keep as-is" [shape=box, style=filled, fillcolor=lightyellow];

    subgraph cluster_cleanup {
        label="Parallel Cleanup";
        "Remove flagged tests" [shape=box];
        "Clean up dead code" [shape=box];
    }

    "Step 5: Verify remaining\ntests pass" [shape=box];

    "Step 0: Discovery\n(controller)" -> "No test files?";
    "No test files?" -> "STOP" [label="yes"];
    "No test files?" -> "Agent 1: Analyze batch 1" [label="no"];
    "No test files?" -> "Agent 2: Analyze batch 2" [label="no"];
    "No test files?" -> "Agent N: Analyze batch N" [label="no"];
    "Agent 1: Analyze batch 1" -> "Step 2: Consolidate + Report\n(controller)";
    "Agent 2: Analyze batch 2" -> "Step 2: Consolidate + Report\n(controller)";
    "Agent N: Analyze batch N" -> "Step 2: Consolidate + Report\n(controller)";
    "Step 2: Consolidate + Report\n(controller)" -> "No issues found?";
    "No issues found?" -> "STOP: Report clean" [label="yes"];
    "No issues found?" -> "Step 3: User approval\n(AskUserQuestion)" [label="no"];
    "Step 3: User approval\n(AskUserQuestion)" -> "User approves?";
    "User approves?" -> "STOP: Keep as-is" [label="no"];
    "User approves?" -> "Remove flagged tests" [label="yes"];
    "User approves?" -> "Clean up dead code" [label="yes"];
    "Remove flagged tests" -> "Step 5: Verify remaining\ntests pass";
    "Clean up dead code" -> "Step 5: Verify remaining\ntests pass";
}

Step 0: Discovery (Controller)

You (the controller) handle discovery directly. Do NOT delegate to a subagent.

Detect test framework: look for jest.config.*, vitest.config.*, pytest.ini, pyproject.toml [tool.pytest], .mocharc.*, karma.conf.*, phpunit.xml, etc.
Find all test files using framework conventions (e.g., **/*.test.{ts,js}, **/*.spec.{ts,js}, **/test_*.py, **/*_test.go)
Count total test files and estimate scope
Find corresponding source files (for agents to understand what's being tested)

If no test files found → stop ("No test files found in the project").

Step 1: Parallel Analysis (N Agents)

Split test files evenly among agents. Each agent receives:

Its batch of test files to analyze
Paths to corresponding source files
The Flag Criteria from this skill (copy the full section below)

Each agent reads its assigned test files AND their corresponding source code, then returns a structured list:

For each flagged test:
- File path and test name (describe + it text)
- Category (from Flag Criteria below)
- Explanation: WHY this test is low-value (1-2 sentences)
- Verdict: REMOVE or BORDERLINE

Agents must also list tests they reviewed and classified as KEEP (just file path + test name, no explanation needed).

Step 2: Consolidate + Report (Controller)

Merge findings from all agents
Deduplicate (same test flagged by multiple agents = one entry)
Count by category
Write report to docs/reviews/test-cleanup-YYYY-MM-DD.md using the template below
Present summary to user in terminal

Step 3: User Approval

Use AskUserQuestion to present the summary and ask the user to approve. Options:

"Remove all flagged tests" — proceed with full cleanup (only REMOVE tests are deleted, BORDERLINE tests are kept)
"Let me choose from the borderline tests too" — show BORDERLINE tests one by one or in a list, let user decide which to also remove
"Show me the full report" — tell user: "Full report written to docs/reviews/test-cleanup-YYYY-MM-DD.md", then ask again with the same options
"Cancel" — keep everything as-is

NEVER remove tests without explicit user approval.

Step 4: Parallel Cleanup (Agents)

After approval, split flagged tests among agents. Each agent:

Removes the flagged test cases (the it(...) blocks)
Removes empty describe blocks left behind
Removes test helper functions/fixtures only used by removed tests — verify zero remaining usages with grep/find-references before removing any helper
Removes unused imports left behind
Removes orphaned section comments (e.g., // BAD TESTS markers that no longer apply)
Deletes test files entirely if all their tests were removed
Does NOT modify any source code — only test files

Step 5: Verify (Controller)

Run the test suite to confirm remaining tests still pass. If tests fail, investigate — likely a cleanup error (removed a shared helper that was still needed).

Flag Criteria

Share this with every analysis agent.

DO Flag

Category	Signal	Example
Private state access	Accessing internals via `as any`, reflection, `__private`, `@VisibleForTesting`	`(service as any).cache.get(id)`
Mock call counts	Asserting exact call counts on internal dependencies (not side effects)	`expect(repo.save).toHaveBeenCalledTimes(1)`
Call ordering	Tracking sequence of internal method calls	`expect(callOrder).toEqual(['findByEmail', 'save'])`
Internal args	Asserting exact arguments passed to internal collaborators	`expect(repo.save).toHaveBeenCalledWith(exactObj)`
Log format	Asserting exact log message strings or call counts	`expect(logger.log).toHaveBeenCalledWith('Created user 1')`
Tautological	Testing that a mock returns its configured value through a passthrough with no conditional logic or side effects	Mock returns X → assert result is X. If the code path has branching logic (even if output matches the mock), consider whether it exercises a meaningful path — classify as BORDERLINE if unsure
Constant testing	Asserting values of exported constants	`expect(MAX_LENGTH).toBe(100)`
Type/existence	Testing `typeof fn === 'function'` — always flag in typed languages (TS/Java/Go), flag in JS only if other tests already exercise the export	`expect(typeof capitalize).toBe('function')`
Redundant snapshot	Snapshot on simple deterministic output already covered by direct assertions	`toMatchInlineSnapshot` on a pure function output
Redundant coverage	Multiple tests covering the exact same code path with no new branch	Two tests for `canCancel` that both exercise the `true` branch
No assertions	Test body has no assertions or only trivially-true assertions	`it('works', () => { createThing(); })`

Do NOT Flag

These are legitimate tests even if they use mock assertions:

Critical side effects: Verifying rollback, cleanup, or compensating actions happened (e.g., expect(inventory.release).toHaveBeenCalled() after payment failure)
External API contracts: Asserting arguments passed to external services (payment gateways, email providers, third-party APIs) — this verifies the contract. Heuristic: A dependency is "external" if it crosses a system boundary (network call, filesystem, third-party service). Injected interfaces that represent in-process collaborators (repositories, caches, internal services) are "internal" even if abstracted behind an interface. When ambiguous, classify as BORDERLINE.
Enum boundary values: Testing all values of a small enum for complete branch coverage — but flag if multiple tests exercise the exact same branch (e.g., two true cases in a boolean return)
Complex snapshots: Snapshots for complex/hard-to-assert outputs (UI trees, large serialized structures)
Error message contracts: When the error message IS the public API contract (e.g., user-facing error codes)
Any test whose removal would reduce behavior coverage

When in doubt, classify as BORDERLINE, not REMOVE. False positives erode user trust.

Report Template

# Test Cleanup Report

**Date**: YYYY-MM-DD
**Project**: <project path>
**Test framework**: <detected framework>
**Total test files**: N
**Total tests analyzed**: N
**Tests flagged for removal**: N
**Tests flagged as borderline**: M

## Summary by Category

| Category | Count | Files Affected |
|----------|-------|----------------|
| Implementation detail: private state access | N | file1, file2 |
| Implementation detail: mock call counts | N | ... |
| ... | ... | ... |

## Flagged Tests

### <file-path>

#### REMOVE: <test name>
- **Category**: <category>
- **Why**: <1-2 sentence explanation>

#### BORDERLINE: <test name>
- **Category**: <category>
- **Why**: <explanation>
- **Note**: <why it's borderline>

## Tests Kept

<N tests across M files classified as valuable behavioral tests.>

Common Mistakes

Mistake	Fix
Removing tests without user approval	ALWAYS use AskUserQuestion before any removal
Flagging mock assertions that verify critical side effects	Read the DO NOT FLAG section — rollback/cleanup verification is legitimate
Modifying source code during cleanup	Only modify test files — source code is out of scope
Not running tests after cleanup	Always verify remaining tests pass in Step 5
Flagging all mock-based assertions	Mock assertions are fine when verifying observable behavior or external contracts
Running analysis sequentially	Split test files among parallel agents for large suites
Removing test helpers still used by other tests	Check all usages before removing any helper/fixture
Flagging internal args tests when no other test verifies persistence	If removing the test would leave a gap where wrong data could be saved undetected, classify as BORDERLINE
Not writing the report file	Always write to `docs/reviews/test-cleanup-YYYY-MM-DD.md`

Red Flags

STOP if you catch yourself:

Removing tests without asking the user first
Flagging a test just because it uses mocks (mocks are a tool, not a smell)
Modifying production/source code
Skipping the verification step (running remaining tests)
Being too aggressive — when in doubt, mark BORDERLINE not REMOVE

cleaning-tests

Popularity

Invocation

Context Preview

SKILL.md

cleaning-tests

Popularity

Invocation

Context Preview

SKILL.md

Cleaning Tests

Overview

When to Use

Agent Assumptions

The Process

Step 0: Discovery (Controller)

Step 1: Parallel Analysis (N Agents)

Step 2: Consolidate + Report (Controller)

Step 3: User Approval

Step 4: Parallel Cleanup (Agents)

Step 5: Verify (Controller)

Flag Criteria

DO Flag

Do NOT Flag

Report Template

Common Mistakes

Red Flags

Similar Skills

Cleaning Tests

Overview

When to Use

Agent Assumptions

The Process

Step 0: Discovery (Controller)

Step 1: Parallel Analysis (N Agents)

Step 2: Consolidate + Report (Controller)

Step 3: User Approval

Step 4: Parallel Cleanup (Agents)

Step 5: Verify (Controller)

Flag Criteria

DO Flag

Do NOT Flag

Report Template

Common Mistakes

Red Flags

Similar Skills