From testing-canon
Write, organize, name, and tame unit tests in the style of Roy Osherove's "The Art of Unit Testing" — trustworthy, maintainable, readable tests built around a unit of work's entry point and exit points. Use this skill whenever the user asks you to write or scaffold unit tests, name or structure a test, choose between a value / state / interaction test, set up or avoid a mocking (isolation) framework, or clean up tests that are brittle, flaky, over-mocked, or unreadable — even if they never mention the book or Osherove. Also use it for questions like "how should I name this test", "why is this test so brittle", "is this a stub or a mock", or "how many asserts per test". Enforces: assert an exit point (observable behavior), never an internal call; never verify a stub; one concern per test with no logic in the test body; and prefer value/state tests over interaction tests.
How this skill is triggered — by the user, by Claude, or both
Slash command
/testing-canon:art-of-unit-testingThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill makes you write and clean up unit tests the way Roy Osherove teaches in
This skill makes you write and clean up unit tests the way Roy Osherove teaches in
The Art of Unit Testing. It is the practical on-ramp companion to
khorikov-unit-testing: where Khorikov gives you the scoring theory (the four
pillars, managed vs. unmanaged dependencies), Osherove gives you a concrete mental
model for what a test is shaped like and three qualities to judge it by. The two
agree on almost everything; where a rule here has a deeper justification, this skill
points to the Khorikov skill rather than repeating it.
The book's organizing goal: a test is only worth having if it is trustworthy, maintainable, and readable. A test you don't trust sends you back to manual testing and debugging; a test that's painful to maintain gets deleted or ignored; a test you can't read can't tell you what broke. Most LLM- and junior-written tests fail exactly these three — they over-mock, assert on internals, hide intent behind helpers, and carry logic in the test body. This skill exists to stop that.
Examples below are language-agnostic pseudocode. Translate them to the user's stack (Jest/Vitest, JUnit, pytest, xUnit, RSpec…) and match their existing conventions.
Osherove's central idea. Stop thinking of a "unit" as "a method" or "a class."
A unit of work is everything that happens between the moment you trigger an entry point and the moment a noticeable result appears at an exit point. A unit of work may be one function, several functions, or a few collaborating modules — its size is defined by the entry/exit pair you test through, not by class boundaries.
There are exactly three kinds of exit point — three ways a unit of work can produce a noticeable result:
This model drives every later decision. A test triggers one entry point and checks one exit point. If a unit of work has three exit points, that's (at least) three tests — one each — not one test asserting all three. Splitting by exit point is what keeps tests focused, named clearly, and easy to debug.
Why "exit point" and not "behavior"? Because behaviors can be purely internal, and internal behavior is exactly what you must not test. An exit point is, by definition, externally noticeable. Asserting anything that is not an exit point is the root of brittle tests.
Each exit-point type calls for a different technique. Prefer them in this order — the same ordering Khorikov calls output > state > communication:
Osherove's own rule of thumb is that interaction/mock tests should be a small
fraction of a suite (he aims for well under ~10%). If you find yourself writing
mostly interaction tests, the code probably needs restructuring so its important
results surface as values or state instead. See
khorikov-unit-testing/references/four-pillars-and-styles.md for the deeper "why,"
and §5 of the repo README for the classical-vs-mockist position this reflects.
Judge every test you write or review against these. They are the readable summary of
the whole book; depth is in references/good-unit-test-qualities.md.
A test that sacrifices any one of these is usually a net cost. Maintainability and readability are not "nice to have" — an unmaintainable suite gets abandoned, and an abandoned suite protects nothing.
Step 1 — Find the entry point and the exit points. Name the entry point you'll trigger and list the unit of work's exit points. This is the slow part the first time against unfamiliar code; once mapped, each test is quick.
Step 2 — One test per exit point. Don't cram multiple exit points into one test. Each becomes its own test with its own behavior-revealing name.
Step 3 — Pick the technique by exit-point type (value > state > interaction, above). Default to value/state. Only reach for a mock when the exit point is a call to a third party.
Step 4 — Decide doubles deliberately (see "Test doubles" below). Substitute a dependency only to gain control or isolation — not reflexively. Stub incoming dependencies; mock outgoing ones; never assert on a stub.
Step 5 — Structure it (see "Structure & naming"): one Arrange-Act-Assert, one concern, no logic in the test, hard-coded expected values, a name that reads like a sentence.
If the code makes this hard — you can't reach an exit point without faking five things, or the only observable result is a private field — that's a design signal, not a reason to over-mock. Introduce a seam or extract the logic (see "Isolation frameworks & seams").
When asked to review, fix flaky tests, or "make these stop breaking on every
refactor," produce findings, not just a rewrite. For each problem test, report what
it couples to, which quality it sacrifices, and the fix. Full rubric and the
anti-pattern list in references/maintainable-readable-tests.md.
The highest-value things to hunt for first:
if/loops/switch/computation in the test body. → Remove
it; parameterize instead. Logic in a test means the test itself can have bugs.Osherove's vocabulary, by the direction of the dependency relative to the unit of work:
This is the same incoming/outgoing split as khorikov-unit-testing. The canonical,
finer-grained taxonomy — Dummy, Stub, Spy, Mock, Fake — is Gerard Meszaros's;
this repo treats those five as the reference vocabulary. When you need the precise
distinctions (e.g. Spy vs. Mock, or what "Fake" strictly means), read
xunit-test-patterns/references/test-doubles-taxonomy.md. Osherove's "fake/stub/
mock" maps onto it: his stub = Meszaros's Stub (and Dummy/Fake on the incoming
side); his mock = Meszaros's Mock (and Spy).
AAA. Arrange (set up inputs and the unit), Act (trigger the one entry point), Assert (check the one exit point). Keep the three visually separate.
One concern per test. If you can't name the test without "and," it's testing two things — split it. A test that checks one concern is the one that, when it fails, tells you exactly what broke.
No logic in the test body. No conditionals, loops, or arithmetic over the result. Hard-code expected values; never recompute them with the production algorithm (that test passes by construction and catches nothing). Parameterize similar cases instead of looping.
Naming — house style note. Osherove teaches the USE convention — a name in
three parts: Unit under test, Scenario, Expectation (e.g.
verifyPassword_withAFailedRule_returnsErrorWithReason). It's a real improvement over
a bare method name, and you'll see it throughout the book. This repo's default,
though, is Khorikov's behavior-sentence naming — a plain sentence a domain expert
would understand (A_password_failing_a_rule_is_rejected_with_a_reason). Both encode
the same three pieces of information; the sentence form reads better and couples less
to the method name. Use behavior-sentence naming unless the project already
standardizes on USE. See khorikov-unit-testing ("Test structure & naming").
Readability over DRY. Some duplication across tests is fine and often clearer. Extract object construction into builders/factory methods when it removes noise, but never hide what a test asserts behind a shared helper.
A seam is a place where two pieces of code meet and you can substitute one side
without editing the other — an injected dependency, an overridable function, a module
boundary. Seams are what make a unit of work testable in the first place; designing
for testability is largely about putting seams where you need control or isolation.
(This is Michael Feathers' concept; for getting existing untestable code under
test, see legacy-code-testing.)
Isolation (mocking) frameworks automate creating fakes. Use them for what they're good at — quickly standing up stubs and the occasional mock at a real seam. But heed the trap: because they make it trivial to fake anything, they nudge you toward over-mocking and interaction tests you didn't need. Before reaching for the framework, ask whether a value or state test would do. The framework is a convenience, not a license to mock.
Read the relevant file when you need depth:
references/good-unit-test-qualities.md — the trustworthy/maintainable/readable
qualities in full, unit vs. integration, and the entry/exit-point model applied.references/test-doubles-and-isolation.md — fakes/stubs/mocks by direction, the
Meszaros mapping, isolation frameworks, seams, and when faking is justified.references/maintainable-readable-tests.md — the over-specification catalogue,
taming brittle/flaky tests, naming, and a review rubric.../../EXAMPLES.md (repo root) has worked before/after pairs under "The Art of Unit
Testing."
Provides a checklist for code reviews covering functionality, security, performance, maintainability, tests, and quality. Use for pull requests, audits, team standards, and developer training.
npx claudepluginhub arcboxlabs/testing-canon --plugin testing-canon