Skill

khorikov-unit-testing

Write and review unit tests in the style of Vladimir Khorikov's "Unit Testing Principles, Practices, and Patterns" (the classical/Detroit school). Use this skill whenever the user asks you to write, add, generate, scaffold, fix, refactor, audit, or review unit tests or a test suite — even if they never mention Khorikov or the book — and whenever you are about to write tests as part of a larger coding task. Also use it for questions like "what should I test here", "should I mock this", "why is this test brittle/flaky", "is this a good test", or "how should I structure this test". The skill enforces four rules above all: test OBSERVABLE BEHAVIOR not implementation details; maximize resistance to refactoring (no false failures); mock ONLY unmanaged out-of-process dependencies; and do NOT write tests for trivial code.

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/testing-canon:khorikov-unit-testing

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

This skill makes you write and review unit tests the way Vladimir Khorikov argues

Supporting Files

references/anti-patterns-and-review.mdreferences/four-pillars-and-styles.mdreferences/test-doubles.mdreferences/what-to-test.md

SKILL.md

261 lines · ~3.3k tokens

Stats

Stars1

MaintenanceGood

Last CommitJun 9, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Khorikov Unit Testing

This skill makes you write and review unit tests the way Vladimir Khorikov argues for in Unit Testing Principles, Practices, and Patterns. The book's central claim is that the goal of testing is to enable sustainable growth of a project — tests should let people refactor and add features without fear. A test that breaks every time the code is reorganized (even when behavior is unchanged) actively works against that goal. Bad tests can be worse than no tests at all.

Most LLM- and junior-written test suites fail in the same predictable ways: they mock every collaborator, assert on internal calls, pin down implementation details, test trivial getters and setters, re-implement the production algorithm inside the test, and chase a coverage number. This skill exists to stop exactly those habits.

The principles below are language- and framework-agnostic. Translate the example pseudocode to the user's actual stack (xUnit/JUnit/pytest/Jest/RSpec/etc.) and match their existing conventions.

The one rule above all

A test should verify a unit of observable behavior, not a unit of code, and not how that behavior is implemented.

Observable behavior = something a client of the code actually cares about: a return value, a change to observable state, or a call to an out-of-process collaborator that the outside world can see (an email sent, a message published). Everything else — which private methods run, which internal helpers get called, in what order — is an implementation detail. Coupling tests to implementation details is the single largest source of brittle tests.

Ask of every assertion: "If I refactored the internals without changing what a client observes, would this assertion still pass?" If the honest answer is "no," the test is coupled to an implementation detail and must change.

The four pillars (the scoring frame)

Every test is judged on four dimensions. See references/four-pillars-and-styles.md for the full treatment; the short version:

Protection against regressions — does it actually catch bugs? Grows with the amount and significance of the code it exercises.
Resistance to refactoring — can it survive a refactoring of the code under test without failing? A test that fails when behavior is still correct is a false positive (false alarm), and false alarms destroy trust in the suite.
Fast feedback — how quickly it runs.
Maintainability — how easy it is to read and to keep running.

The trade-off that drives almost every decision: you cannot maximize all four. Resistance to refactoring is effectively non-negotiable (you either couple to implementation details or you don't), so in practice you trade protection against regressions against fast feedback. Test value ≈ the product of the first three — if any one is near zero, the test is near worthless.

When you write or review a test, you are implicitly buying regression protection and resistance to refactoring; never sacrifice resistance to refactoring to get a little more coverage.

Workflow A — Writing a test

Follow this order. Do not jump straight to "mock the dependencies and assert the calls" — that is the habit this skill is designed to break.

Step 1 — Decide whether this code should be unit-tested at all. Categorize the code under test (full detail in references/what-to-test.md):

Domain model / algorithms (rich logic, few collaborators) → the highest-value target. Test thoroughly with unit tests.
Trivial code (getters/setters, one-line constructors, plain DTOs) → do not test. Tests here are pure cost with no protection.
Controllers / orchestrators (little logic, many collaborators — glue code) → cover with a few integration tests, not exhaustive unit tests.
Overcomplicated code (rich logic and many collaborators — e.g. a fat service mixing decisions with I/O) → do not test it as-is. First refactor: push the logic into a domain model and the orchestration into a thin controller (the Humble Object pattern). Then test the extracted domain model.

If the user asks you to test trivial or overcomplicated code, say so and propose the better target (skip it, or extract-then-test) instead of mechanically generating tests.

Step 2 — Pick the test style. Prefer, in order (detail in references/four-pillars-and-styles.md):

Output-based — call a (pure) function, assert on its return value. Highest resistance to refactoring and best maintainability. If the code can be made pure, prefer this.
State-based — perform an operation, then assert on the resulting observable state. Good, as long as you check state a client could see, not private internals.
Communication-based — assert that the code called a collaborator (a mock). The most fragile style; reserve it for the unmanaged-dependency boundary only (Step 3).

Step 3 — Decide test doubles deliberately. This is where most suites go wrong (full detail in references/test-doubles.md). Two rules:

Replace a dependency with a stub when it only provides data to the code under test (an incoming query). Never assert on a stub — verifying calls into a stub pins down an implementation detail.
Replace a dependency with a mock (and assert on it) only when it is an unmanaged out-of-process dependency — something outside your control whose communication is itself observable behavior (SMTP server, payment gateway, message bus to other systems).
Do not mock managed dependencies — out-of-process dependencies you fully control and that are not observable to outsiders, chiefly your own database. Use the real thing in an integration test instead.
Default for in-process collaborators: don't replace them at all. Use the real objects. In the classical school you isolate tests from each other, not the system under test from its in-process collaborators.

Step 4 — Structure the test (see "Test structure & naming" below): one Arrange-Act-Assert, one unit of behavior, no branching logic, a behavior-revealing name, and expected values hard-coded (never recomputed by re-running the production algorithm).

Workflow B — Reviewing / refactoring existing tests

When asked to review, audit, or improve tests, score each test against the four pillars and scan for the anti-patterns in references/anti-patterns-and-review.md. Produce findings, not just a rewrite. For each problem test, report:

What it couples to — quote the assertion(s) that pin down implementation details, mock-call verifications on stubs, or asserted internal interactions.
Which pillar it sacrifices — usually resistance to refactoring (brittle) or maintainability (unreadable), sometimes regression protection (tests nothing real, e.g. asserting a mock was called with what you just told it to return).
The fix — rewrite toward output- or state-based verification of observable behavior; delete tests of trivial code; collapse interaction assertions; replace mocked managed dependencies with a real instance in an integration test.

Highest-priority red flags to catch first:

Mocks/spies asserting on in-process collaborators or on stubs → brittle, often tautological. Fix or remove.
A mocked database/repository with assertions on its calls → convert to an integration test with a real (or in-memory-but-equivalent) DB, assert on state.
Tests that re-implement the production logic to compute the expected result → replace with hard-coded expected values.
Tests of trivial code or private methods → delete; if a private method is complex enough to "need" a test, that's a signal to extract it into its own unit with a public API.
Production code that only exists to enable tests (test-only flags, branches, if (testing)) → "code pollution"; remove it and find another seam.

Test structure & naming

AAA. Every test has three sections: Arrange (set up inputs and the system under test), Act (invoke the one operation being tested), Assert (verify the outcome). Separate them clearly.

One unit of behavior per test. If you need two Act sections, you're testing two behaviors — split into two tests.
A multi-line Act section is usually a smell: it means a single operation requires several calls, which points to a missing encapsulation in the production code. Note it.
No branching in tests. No if/switch/loops over logic. A test should be a flat, obvious sequence. Branching means the test is trying to cover multiple cases — parameterize instead.
Don't recompute expected values. Hard-code them. If the test calls the same algorithm the code uses to produce the "expected" result, it tests nothing and leaks domain knowledge into the test.

Naming. Do not use the rigid Method_Scenario_Result pattern (e.g. IsDeliveryValid_PastDate_ReturnsFalse). Name the test as a plain sentence describing the behavior to a non-programmer or domain expert:

Example 1: Bad: Sum_TwoNumbers_ReturnsCorrectSum Good: Sum_of_two_numbers

Example 2: Bad: IsDeliveryValid_InvalidDate_ReturnsFalse Good: Delivery_with_a_past_date_is_invalid

The name should describe what the system does in a scenario, not mention the method under test or the literal return value. Use underscores (or your stack's idiom) for readability; the production code's naming rules don't apply to test names.

Readability over DRY. Some duplication in tests is fine and often better — tests are read far more than they're refactored. It's OK to extract object creation into test data builders / factory methods, but don't hide the meaning of a test behind shared helpers. Never extract the assertions into a shared method that obscures what each test actually checks.

Integration tests (the other half)

Unit tests cover the domain model. Integration tests cover the controllers — the glue that orchestrates the domain model and out-of-process dependencies.

Test the longest happy path through a use case, plus any edge cases the domain-model unit tests can't reach.
Use real managed dependencies (the real database) — that's the point of an integration test.
Mock unmanaged dependencies (SMTP, message bus, third-party APIs) and assert on those interactions, because they are observable behavior and you must not hit them for real in a test.

Reference files

Read the relevant file when you need depth on that topic:

references/four-pillars-and-styles.md — the four pillars in full, the trade-off and the test-value formula, the three styles of testing, and why code coverage is a poor target.
references/what-to-test.md — the four-quadrant code categorization, the Humble Object pattern, and how hexagonal / functional-core-imperative-shell architectures make code testable.
references/test-doubles.md — mocks vs. stubs, Command-Query Separation, managed vs. unmanaged dependencies, and exactly when mocking is justified.
references/anti-patterns-and-review.md — the full anti-pattern catalogue and a concrete rubric for reviewing an existing test suite.

../../EXAMPLES.md (repo root) has worked before/after examples for each rule.

Related skills in this collection

This skill is the classical-school core; others in the repo extend it. Reach for them when the task shifts:

art-of-unit-testing — the fundamentals companion (good-test qualities, test organization, the same stub/mock-by-direction split in Osherove's vocabulary).
effective-software-testing — how to derive the test cases (partitions, boundaries, coverage as a guide) once you know what makes a test good.
xunit-test-patterns — the canonical 5-name Test Double taxonomy and the smell→pattern catalogue when a suite is a mess.
legacy-code-testing — get untested code under test first (characterization tests), then grow real behavior tests with these four principles.
agile-testing-quadrants — where unit tests fit in a whole team's test strategy.
context-driven-testing — the investigative mindset and "how much testing is enough for this context" judgment these rules assume.

khorikov-unit-testing

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

khorikov-unit-testing

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Khorikov Unit Testing

The one rule above all

The four pillars (the scoring frame)

Workflow A — Writing a test

Workflow B — Reviewing / refactoring existing tests

Test structure & naming

Integration tests (the other half)

Reference files

Related skills in this collection

Similar Skills

Khorikov Unit Testing

The one rule above all

The four pillars (the scoring frame)

Workflow A — Writing a test

Workflow B — Reviewing / refactoring existing tests

Test structure & naming

Integration tests (the other half)

Reference files

Related skills in this collection

Similar Skills