How this skill is triggered — by the user, by Claude, or both
Slash command
/backend-engineer:backend-testingThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Tests exist to let you change code with confidence. The whole discipline reduces to one rule: **assert observable behavior at the right level, fast and deterministically.** A suite that is slow, flaky, or coupled to implementation is a liability, not an asset.
Tests exist to let you change code with confidence. The whole discipline reduces to one rule: assert observable behavior at the right level, fast and deterministically. A suite that is slow, flaky, or coupled to implementation is a liability, not an asset.
Spend your test budget by level. Most value comes from many fast tests near the base.
| Level | Volume | Scope | Dependencies |
|---|---|---|---|
| Unit | Many | One unit of business logic | All boundaries mocked |
| Integration | Fewer | Wiring across the unit and its real infra | Real DB; external APIs stubbed |
| E2E | Few | Critical end-to-end user paths only | Full stack |
Reject the inverted pyramid (mostly E2E, few unit). It is slow, flaky, and gives vague failures — a red E2E tells you something broke, not what. Push every assertion to the lowest level that can prove it: business rules belong in unit tests, wiring belongs in integration, and only the handful of paths where a broken flow is catastrophic earn an E2E.
Assert the observable outcome and the enforced rule, not the mechanics that produced it.
LimitExceeded."creditService.check() was called once with (orderId, amount)."The second test fails the moment you refactor internals even though behavior is unchanged — so it punishes exactly the work tests should protect. Verify the return value, the persisted state, the emitted event, or the error type. Reserve "was this called" assertions for genuine side effects you cannot observe any other way (an email was sent, a payment was charged).
In unit tests, mock at the unit's boundaries — repositories, external service clients, clocks, queues — and drive logic through fakes. Keep real DB connections out of unit tests: they are slow, order-dependent, and flaky, which is the opposite of what a unit test must be. A unit test runs in milliseconds, in any order, in isolation.
Mock the boundary, never the thing under test. If a test mocks so much that it only re-asserts the mock setup, it tests nothing — see Pitfalls.
Wiring bugs — SQL dialect quirks, transaction boundaries, serialization, migrations — only surface against the real engine. Use testcontainers to spin up real PostgreSQL/Redis/Kafka in Docker for integration tests.
Prefer real dependencies over in-memory fakes (H2, SQLite) unless the fake is your production engine. Fakes pass on behavior production rejects: an H2 query that fails on real Postgres ships green. Stub only the things you do not own and cannot run cheaply (third-party HTTP APIs) with a contract-faithful stub.
Isolate test state so tests stay independent: a fresh schema per run plus a transaction rollback or truncation per test. Shared mutable state across tests is a top cause of flakiness.
A flaky test erodes trust in the whole suite — once people ignore red, real bugs hide behind the noise. Fixing flakes outranks writing new tests. Common causes and fixes:
| Cause | Fix |
|---|---|
sleep/timing assumptions | Explicit waits / poll-until-condition |
| Real external calls | Stub the boundary |
| Shared mutable state | Per-test isolation (fresh state, rollback) |
| Unordered collections | Assert on sets, not positions |
Quarantine a flake only as a stopgap with a tracking ticket — never as the resolution.
Resilience that is never tested is a hope, not a guarantee. Assert the behavior directly:
Cross-reference the resilience skill for the patterns themselves; this skill covers proving they work.
A service unit test asserts a business rule against a mocked repository — no DB, no network, deterministic (language-agnostic pseudocode):
test "transfer rejected when balance is insufficient":
accounts = MockAccountRepository()
accounts.find("A").returns(Account(id: "A", balance: 50))
service = TransferService(accounts)
result = service.transfer(from: "A", to: "B", amount: 100)
assert result.isError()
assert result.error == "InsufficientFunds"
assert accounts.save.wasNotCalled() # side effect we cannot otherwise observe
The test names a rule, drives it through a mocked boundary, and asserts the observable outcome (error type) plus the one side effect that has no return value (nothing was persisted). It says nothing about how the balance was checked — a refactor of the internals keeps it green.
For the senior judgment on what is worth testing at all — coverage as a means, not a target — defer to senior-mindset.
npx claudepluginhub shoto290/shoto --plugin backend-engineerProvides CDSS development patterns for drug interaction checking, dose validation, clinical scoring (NEWS2, qSOFA), and alert classification integrated into EMR workflows.