From break-things
Use when evaluating whether an existing test suite adequately guards a system. Triggers on "is this well-tested?", "are we testing the right things?", "our tests pass but production breaks", "what's missing from our test coverage?", after a refactor, before a major release, when inheriting an unfamiliar codebase.
How this skill is triggered — by the user, by Claude, or both
Slash command
/break-things:is-it-testedThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Evaluate whether a test suite's variety matches the failure space it guards. Sources: Ashby (1956), Beizer (1990).
Evaluate whether a test suite's variety matches the failure space it guards. Sources: Ashby (1956), Beizer (1990).
Not every suite needs a full audit. If the codebase is small, the failure space is obvious, and you can state "the tests cover X, Y, Z and miss nothing critical" in a sentence — do that. Scale the audit to the cost of an undetected failure reaching production.
| Question | What it produces | Source |
|---|---|---|
| What's the failure space? | Enumerate what can go wrong — not lines of code, but failure modes. Group by category: data corruption, auth bypass, silent wrong answer, crash, performance regression. | Beizer (1990): fault-based analysis. |
| What does the suite guard? | Map existing tests to failure modes from question 1. For each test, state the causal claim it embodies. Tests that can't be mapped to a failure mode may not be guarding anything. | — |
| Where are the gaps? | Failure modes with no corresponding test. This is the variety analysis: D (failure space) vs R (test suite). Unguarded failure modes are the gaps. | Ashby (1956): requisite variety. |
This is requisite variety applied to testing:
If D > R, the suite has gaps. The question is whether the unguarded failure modes matter. Prioritize by: (1) severity of the failure, (2) likelihood given anticipated changes, (3) cost to add the guard.
what-to-test's targeted gate.| Thought | Reality |
|---|---|
| "We have good coverage" | Coverage measures execution, not falsification. 90% coverage with weak assertions is 90% false confidence. |
| "The tests pass" | Passing tests prove the tests pass. They don't prove the system is correct. |
| "We'd catch it in code review" | Code review catches what reviewers think to look for. Tests catch what you predicted once and encoded permanently. |
| "Property-based testing is overkill" | For failure spaces too large to enumerate, generative testing is the only way to reach adequate variety. It's not overkill — it's requisite. |
| "We'll add tests when something breaks" | Reactive testing means every bug ships once. Predictive testing means it doesn't. |
A test suite audit reveals known unknowns — failure modes you can name but haven't tested. It cannot reveal unknown unknowns. For exploring failure spaces beyond what you can enumerate, consider property-based testing (Claessen & Hughes, 2000), fuzzing, or chaos engineering. These are generative approaches that complement the predictive approach of what-to-test and the evaluative approach of is-it-tested.
what-to-test are seeds for question 1 — but the failure space is broader than any single change.is-it-tested is evaluative: does the suite's variety match the failure space?
Provides a checklist for code reviews covering functionality, security, performance, maintainability, tests, and quality. Use for pull requests, audits, team standards, and developer training.
npx claudepluginhub jackwillis/claude-plugins --plugin break-things