Skill

art-of-unit-testing

Write, organize, name, and tame unit tests in the style of Roy Osherove's "The Art of Unit Testing" — trustworthy, maintainable, readable tests built around a unit of work's entry point and exit points. Use this skill whenever the user asks you to write or scaffold unit tests, name or structure a test, choose between a value / state / interaction test, set up or avoid a mocking (isolation) framework, or clean up tests that are brittle, flaky, over-mocked, or unreadable — even if they never mention the book or Osherove. Also use it for questions like "how should I name this test", "why is this test so brittle", "is this a stub or a mock", or "how many asserts per test". Enforces: assert an exit point (observable behavior), never an internal call; never verify a stub; one concern per test with no logic in the test body; and prefer value/state tests over interaction tests.

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/testing-canon:art-of-unit-testing

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

This skill makes you write and clean up unit tests the way Roy Osherove teaches in

Supporting Files

references/good-unit-test-qualities.mdreferences/maintainable-readable-tests.mdreferences/test-doubles-and-isolation.md

SKILL.md

256 lines · ~3.4k tokens

Stats

Stars1

MaintenanceGood

Last CommitJun 9, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

The Art of Unit Testing (Osherove)

This skill makes you write and clean up unit tests the way Roy Osherove teaches in The Art of Unit Testing. It is the practical on-ramp companion to khorikov-unit-testing: where Khorikov gives you the scoring theory (the four pillars, managed vs. unmanaged dependencies), Osherove gives you a concrete mental model for what a test is shaped like and three qualities to judge it by. The two agree on almost everything; where a rule here has a deeper justification, this skill points to the Khorikov skill rather than repeating it.

The book's organizing goal: a test is only worth having if it is trustworthy, maintainable, and readable. A test you don't trust sends you back to manual testing and debugging; a test that's painful to maintain gets deleted or ignored; a test you can't read can't tell you what broke. Most LLM- and junior-written tests fail exactly these three — they over-mock, assert on internals, hide intent behind helpers, and carry logic in the test body. This skill exists to stop that.

Examples below are language-agnostic pseudocode. Translate them to the user's stack (Jest/Vitest, JUnit, pytest, xUnit, RSpec…) and match their existing conventions.

The core model: unit of work, entry point, exit points

Osherove's central idea. Stop thinking of a "unit" as "a method" or "a class."

A unit of work is everything that happens between the moment you trigger an entry point and the moment a noticeable result appears at an exit point. A unit of work may be one function, several functions, or a few collaborating modules — its size is defined by the entry/exit pair you test through, not by class boundaries.

There are exactly three kinds of exit point — three ways a unit of work can produce a noticeable result:

A return value (or thrown error) — the unit computes and hands something back.
A noticeable state change — afterward, you can query the system and observe something changed (a flag flipped, an item added).
A call to a third party — the unit tells an external dependency to do something (send an email, publish a message).

This model drives every later decision. A test triggers one entry point and checks one exit point. If a unit of work has three exit points, that's (at least) three tests — one each — not one test asserting all three. Splitting by exit point is what keeps tests focused, named clearly, and easy to debug.

Why "exit point" and not "behavior"? Because behaviors can be purely internal, and internal behavior is exactly what you must not test. An exit point is, by definition, externally noticeable. Asserting anything that is not an exit point is the root of brittle tests.

Three exit points → three kinds of test (in preference order)

Each exit-point type calls for a different technique. Prefer them in this order — the same ordering Khorikov calls output > state > communication:

Value test (return-value exit) — trigger the entry point, assert on what comes back. Easiest to write, hardest to break. Reach for this first.
State test (state-change exit) — perform the action, then make a second, legitimate observation to confirm the change. Good, as long as you observe state a real client could see — never a private field exposed just for the test.
Interaction test (third-party exit) — replace the external dependency with a mock and verify the unit called it. The most fragile kind: "who called whom" is usually an implementation detail. Use it only when the call itself is the result that matters (the email is the outcome), and keep it rare.

Osherove's own rule of thumb is that interaction/mock tests should be a small fraction of a suite (he aims for well under ~10%). If you find yourself writing mostly interaction tests, the code probably needs restructuring so its important results surface as values or state instead. See khorikov-unit-testing/references/four-pillars-and-styles.md for the deeper "why," and §5 of the repo README for the classical-vs-mockist position this reflects.

The three qualities (what you are optimizing for)

Judge every test you write or review against these. They are the readable summary of the whole book; depth is in references/good-unit-test-qualities.md.

Trustworthy — when it's red there's a real bug, and when it's green you can ship without re-checking by hand. Killed by: flakiness, non-determinism, testing the wrong thing, logic in the test, tests that depend on each other or on run order.
Maintainable — it doesn't break when you refactor internals, and it's cheap to keep running. Killed by: over-specification (asserting how instead of what), over-mocking, duplicated setup with no factoring.
Readable — a person can tell at a glance what scenario it covers and what it expects. Killed by: cryptic names, magic values, hidden assertions in shared helpers, multi-purpose tests.

A test that sacrifices any one of these is usually a net cost. Maintainability and readability are not "nice to have" — an unmaintainable suite gets abandoned, and an abandoned suite protects nothing.

Workflow A — Writing a test you can trust

Step 1 — Find the entry point and the exit points. Name the entry point you'll trigger and list the unit of work's exit points. This is the slow part the first time against unfamiliar code; once mapped, each test is quick.

Step 2 — One test per exit point. Don't cram multiple exit points into one test. Each becomes its own test with its own behavior-revealing name.

Step 3 — Pick the technique by exit-point type (value > state > interaction, above). Default to value/state. Only reach for a mock when the exit point is a call to a third party.

Step 4 — Decide doubles deliberately (see "Test doubles" below). Substitute a dependency only to gain control or isolation — not reflexively. Stub incoming dependencies; mock outgoing ones; never assert on a stub.

Step 5 — Structure it (see "Structure & naming"): one Arrange-Act-Assert, one concern, no logic in the test, hard-coded expected values, a name that reads like a sentence.

If the code makes this hard — you can't reach an exit point without faking five things, or the only observable result is a private field — that's a design signal, not a reason to over-mock. Introduce a seam or extract the logic (see "Isolation frameworks & seams").

Workflow B — Reviewing / taming brittle and flaky tests

When asked to review, fix flaky tests, or "make these stop breaking on every refactor," produce findings, not just a rewrite. For each problem test, report what it couples to, which quality it sacrifices, and the fix. Full rubric and the anti-pattern list in references/maintainable-readable-tests.md.

The highest-value things to hunt for first:

Over-specification — the single biggest cause of brittle tests. A test is overspecified when it asserts how the unit works instead of what it produces. The four classic forms:
1. asserting on purely internal state of the object under test,
2. using multiple mocks in one test,
3. using a stub as a mock (asserting a call into a data-providing double),
4. assuming an exact call order or exact string when the behavior doesn't require it. → Re-point each assertion at an exit point; delete the rest.
Flakiness / non-determinism — tests touching real time, real network, shared state, or run-order dependence. → Inject time and randomness as values; isolate tests from each other. A test that isn't consistent isn't trustworthy.
Logic in the test — if/loops/switch/computation in the test body. → Remove it; parameterize instead. Logic in a test means the test itself can have bugs.
Unreadable tests — cryptic names, magic numbers, assertions buried in helpers. → Name the behavior, surface the meaningful values, keep the assertion visible.

Test doubles in one screen

Osherove's vocabulary, by the direction of the dependency relative to the unit of work:

Fake (a.k.a. test double) — the umbrella term for any stand-in. When unsure what to call it, call it a fake. The same fake can act as a stub in one test and a mock in another.
Stub — replaces an incoming dependency (an indirect input): it feeds data into the unit of work. You set it up but never assert on it. Verifying a call into a stub pins down an implementation detail and makes the test brittle.
Mock — replaces an outgoing dependency (a third-party exit point): the unit sends something to it. This is the one double you may assert on — and only when that outgoing call is the exit point you're testing.

This is the same incoming/outgoing split as khorikov-unit-testing. The canonical, finer-grained taxonomy — Dummy, Stub, Spy, Mock, Fake — is Gerard Meszaros's; this repo treats those five as the reference vocabulary. When you need the precise distinctions (e.g. Spy vs. Mock, or what "Fake" strictly means), read xunit-test-patterns/references/test-doubles-taxonomy.md. Osherove's "fake/stub/ mock" maps onto it: his stub = Meszaros's Stub (and Dummy/Fake on the incoming side); his mock = Meszaros's Mock (and Spy).

Structure & naming

AAA. Arrange (set up inputs and the unit), Act (trigger the one entry point), Assert (check the one exit point). Keep the three visually separate.

One concern per test. If you can't name the test without "and," it's testing two things — split it. A test that checks one concern is the one that, when it fails, tells you exactly what broke.

No logic in the test body. No conditionals, loops, or arithmetic over the result. Hard-code expected values; never recompute them with the production algorithm (that test passes by construction and catches nothing). Parameterize similar cases instead of looping.

Naming — house style note. Osherove teaches the USE convention — a name in three parts: Unit under test, Scenario, Expectation (e.g. verifyPassword_withAFailedRule_returnsErrorWithReason). It's a real improvement over a bare method name, and you'll see it throughout the book. This repo's default, though, is Khorikov's behavior-sentence naming — a plain sentence a domain expert would understand (A_password_failing_a_rule_is_rejected_with_a_reason). Both encode the same three pieces of information; the sentence form reads better and couples less to the method name. Use behavior-sentence naming unless the project already standardizes on USE. See khorikov-unit-testing ("Test structure & naming").

Readability over DRY. Some duplication across tests is fine and often clearer. Extract object construction into builders/factory methods when it removes noise, but never hide what a test asserts behind a shared helper.

Isolation frameworks & seams

A seam is a place where two pieces of code meet and you can substitute one side without editing the other — an injected dependency, an overridable function, a module boundary. Seams are what make a unit of work testable in the first place; designing for testability is largely about putting seams where you need control or isolation. (This is Michael Feathers' concept; for getting existing untestable code under test, see legacy-code-testing.)

Isolation (mocking) frameworks automate creating fakes. Use them for what they're good at — quickly standing up stubs and the occasional mock at a real seam. But heed the trap: because they make it trivial to fake anything, they nudge you toward over-mocking and interaction tests you didn't need. Before reaching for the framework, ask whether a value or state test would do. The framework is a convenience, not a license to mock.

Reference files

Read the relevant file when you need depth:

references/good-unit-test-qualities.md — the trustworthy/maintainable/readable qualities in full, unit vs. integration, and the entry/exit-point model applied.
references/test-doubles-and-isolation.md — fakes/stubs/mocks by direction, the Meszaros mapping, isolation frameworks, seams, and when faking is justified.
references/maintainable-readable-tests.md — the over-specification catalogue, taming brittle/flaky tests, naming, and a review rubric.

../../EXAMPLES.md (repo root) has worked before/after pairs under "The Art of Unit Testing."

art-of-unit-testing

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

art-of-unit-testing

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

The Art of Unit Testing (Osherove)

The core model: unit of work, entry point, exit points

Three exit points → three kinds of test (in preference order)

The three qualities (what you are optimizing for)

Workflow A — Writing a test you can trust

Workflow B — Reviewing / taming brittle and flaky tests

Test doubles in one screen

Structure & naming

Isolation frameworks & seams

Reference files

Similar Skills

The Art of Unit Testing (Osherove)

The core model: unit of work, entry point, exit points

Three exit points → three kinds of test (in preference order)

The three qualities (what you are optimizing for)

Workflow A — Writing a test you can trust

Workflow B — Reviewing / taming brittle and flaky tests

Test doubles in one screen

Structure & naming

Isolation frameworks & seams

Reference files

Similar Skills