From odin
Evaluates tests for deletion by checking if removing them would let a real bug reach production. Use on legacy suites, slow CI, or post-refactor sweeps.
How this skill is triggered — by the user, by Claude, or both
Slash command
/odin:tests-purge-unneededThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Tests are not assets. Tests are liabilities that pay rent by catching real bugs. Volume is not a quality signal — coverage percentage is not a quality signal — only the counterfactual matters: **if I delete this test, can a real bug now reach prod?** If the answer is no, the test is dead weight, and dead weight slows CI, breeds noise, and trains reviewers to ignore failures.
Tests are not assets. Tests are liabilities that pay rent by catching real bugs. Volume is not a quality signal — coverage percentage is not a quality signal — only the counterfactual matters: if I delete this test, can a real bug now reach prod? If the answer is no, the test is dead weight, and dead weight slows CI, breeds noise, and trains reviewers to ignore failures.
Modern insight (2025): TDD pairs with purge discipline. The same rigor that earns RED before GREEN earns deletion before keep — a test that cannot describe the bug it would catch should not exist. Mutation testing exposes which tests are actually load-bearing; the rest are cargo cult.
See python for pytest examples (dynamic-language carve-out). See typescript for jest/vitest examples (static-language redundancy). See rust for cargo test examples (compile-time-guaranteed redundancy). See keep-vs-delete-table for the language-agnostic decision rubric.
The principles below are load-bearing. Internalize them; do not paraphrase.
This is the discriminator. Before keeping any test, ask: what bug, specifically, does this fail on? If the answer is "none I can name" or "a bug the compiler would catch", the test does not earn its keep. The bar is a real bug — not a hypothetical, not a "what if the implementation changes" — a concrete failure mode that a real change could plausibly introduce and that this test would catch.
Tests earn their keep at boundaries. Protocol compliance (HTTP status codes, message formats, retry semantics), error semantics (what happens on malformed input, partial failure, timeout), security invariants (authz/authn enforcement, input validation, rate limits), and real-I/O integration (DB transactions, file I/O, network calls) are exactly where bugs hide and where the type system cannot help. These tests stay.
In statically-typed languages, the compiler already proves that a constructor returns the type it claims, that a struct has the fields it declares, and that a config object has the shape its type asserts. A test like assert User(name="x").name == "x" does not catch any bug a compiler does not — the only way it can fail is if User's type signature changes, in which case the test itself fails to compile and the assertion is moot. Delete it.
Identity-passthrough tests (assert echo(x) == x, expect(passthrough(value)).toEqual(value)) prove nothing — they describe the function's signature, not its behavior. If echo is supposed to validate or transform x, test the validation/transformation. If echo is genuinely identity, the function is dead code.
~/.claude/claude/system-prompt-baseline.md <directives>)The carve-out for mandate 3 is language-dependent and must mirror the user's system-prompt-baseline.md testing charter exactly:
The split is not aesthetic — it is about what guarantee the language provides. See references/python.md vs references/typescript.md for the contrasting examples.
references/keep-vs-delete-table.md)| Pattern | Static lang | Dynamic lang |
|---|---|---|
| Constructor returns expected fields | Delete — type system covers | Keep — boundary shape test |
| Function passes input through verbatim | Delete | Delete (test the dead code, then delete it) |
| HTTP handler returns 401 on missing auth | Keep — security invariant | Keep |
| Parser rejects malformed input | Keep — boundary | Keep |
| Two structs equal after struct-assembly | Delete | Keep if shape changes are plausible |
| Mock returns fixture, test asserts the fixture | Delete — testing the mock | |
| Real DB transaction commits + rolls back | Keep — real I/O integration |
~/.claude/claude/system-prompt-baseline.md, system-prompt-baseline.md wins — this skill mirrors the user's testing charter; if drift is detected, system-prompt-baseline.md is the source of truth.| Gate | Pass Criteria | Blocking |
|---|---|---|
| Bug articulation | Each candidate has a one-sentence failure-mode description | Yes |
| Static-guarantee check | Confirmed compiler/type-checker does or does not cover the test | Yes |
| Bug-injection check | Test verified to NOT catch the bug, before deletion | Yes |
| Atomic commit | Deletion is its own commit with rationale | Yes |
| Suite still passes | Remaining tests green after deletion | Yes |
| Code | Meaning |
|---|---|
| 0 | Deletion safe — bug-injection confirmed test catches nothing, atomic commit landed |
| 11 | Test framework not detected — cannot run bug-injection check |
| 12 | Bug articulation failed — candidate kept pending review |
| 13 | Static guarantee unclear — language carve-out not resolvable; kept pending |
| 14 | Test caught the injected bug — load-bearing; kept |
| 15 | Suite regressed after deletion — rollback required |
cleanup-codebase — sibling deletion discipline for non-test code (dead fields, redundant wrappers, stale config)tests-adversarial — the complement: writing tests that do catch bugs, especially in failure paths and silent-failure regionstest-driven — the design-side discipline (RED → GREEN → REFACTOR); purge runs in the REFACTOR phase~/.claude/claude/system-prompt-baseline.md <directives> testing charter — the source-of-truth principle this skill mirrorsnpx claudepluginhub outlinedriven/odin-claude-plugin --plugin odinAudits existing test suite alignment after code changes, identifying stale assertions, tests for deleted code paths, and coincidence tests. Use after any code modification.
Enforces iron-law red-green-refactor TDD with dual-runtime examples (Vitest for TypeScript/Next.js, pytest for Python/FastAPI), regression-for-every-bug pattern, and property-based testing for invariants.
Performs mutation testing using Claude as the mutation engine: generates code mutants, runs tests, tracks kill/survive rates, identifies test gaps, and recommends test improvements. No external mutation tools required.