From claudekit
Enforces red-green-refactor TDD for features, bugfixes, refactors with testable outcomes. Requires pasting red (failing) and green (passing) test runner outputs as evidence.
How this skill is triggered — by the user, by Claude, or both
Slash command
/claudekit:test-firstThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Red-green-refactor TDD with strict evidence requirements. The skill exists because
Red-green-refactor TDD with strict evidence requirements. The skill exists because the most common testing failure isn't missing tests — it's tests written after the code, designed to pass against the implementation rather than to specify it. Test-first inverts the order: a failing test asserts the desired behavior, the smallest implementation makes it pass, and refactoring runs with the test as a safety net. Each step produces test runner output that goes into the PR. The skill is for engineers shipping production code — not a TDD evangelism doc.
Goal: Identify one observable behavior to assert, smaller than the task.
Inputs: A task from your plan with an Acceptance: line.
Actions:
it <verb>s <subject> when <condition>.Output: A test name and a one-line description of the input/output pair.
Goal: A test that currently fails for the right reason.
Inputs: The test name from Step 1.
Actions:
Output: Test code committed in a test: commit (or staged), red runner
output captured.
Goal: The test passes after a minimal implementation.
Inputs: The failing test.
Actions:
Output: Implementation committed (or staged) in a separate commit from the test. Green runner output captured.
Goal: Improve the code's structure with the test as a safety net.
Inputs: Passing test and implementation.
Actions:
Output: Refactored implementation, all tests still green.
Goal: Cycle back to Step 1 with a new case.
Inputs: Acceptance criteria not yet covered.
Actions:
Output: A complete test suite for the task, with red→green evidence for each step.
| Excuse | Why it sounds reasonable | Why it's wrong | What to do instead |
|---|---|---|---|
| "I'll write the tests after the implementation — same outcome." | The tests do exist either way; order seems like ceremony. | Tests written after implementation are designed against the code that exists, not against the behavior the code should have. They confirm what was built; they don't catch what should have been. The "same outcome" claim only holds if you'd write the same tests both ways, and the literature (and most engineers' experience) shows you don't. | Write the test first. Take the 5 minutes to capture the failing output. The discipline cost is low and the test you write is meaningfully different. |
| "This is too simple to test — it's just a getter." | Trivial code is genuinely common, and writing tests for it does feel like make-work. | "Too simple" is the line said before someone changes the getter to compute something derived, and the absence of a test means the change ships without a check. The test is documentation: when the next person modifies the function, they see what callers expect. | If the code has any behavior — even returning a stored value with a computed default — write the test. Three lines of test code are not the cost they feel like. If it's truly inert (a constant), skip the test and don't lose sleep. |
| "Tests slow me down — I'll add them at the end." | Writing tests during a feature does add minutes to the cycle. | "At the end" usually means after the PR is open, after the reviewer is waiting, after the feature is "done in your head." At that point tests get written under time pressure, against the implementation that exists, with the corners-cut they always have under that condition. The minutes saved by deferring are paid back at 3-5x in the PR cycle. | Write the test first. Each red→green cycle is 10-15 minutes. By the end of the task, the tests are real and the PR is short, not a 200-line rewrite of a hand-tested feature. |
| "The test is hard to write — must be the wrong abstraction." | Test difficulty is genuinely a signal of design problems. | True at the limit, but "hard to write" sometimes just means the case is genuinely complex and the test deserves the work. Treating every hard-to-test case as an architectural smell becomes an excuse to skip tests on actual complexity. | Distinguish: does the test require setting up 10 mocks (architectural smell) vs is the assertion logic complex (legitimate complexity)? Skip the test only in the first case, and only after writing down the architectural concern in the spec or a follow-up. |
| "I'll write one big integration test instead of 10 unit tests." | Integration tests cover more in fewer lines. | One big test that covers 10 cases is one big test that fails opaquely when any of the 10 break. The failure message says "the integration test is red"; you spend 30 minutes finding which case. Ten unit tests that each cover one case fail with the case name in the report. | Write one unit test per case. Use integration tests for cross-component behavior, not as a substitute for unit-level coverage. |
| "I'll mock everything — it'll be fast." | Speed of test runs matters; mocks are how you get there. | Over-mocking produces tests that pass while the integration is broken. The test exercised your mock; the production code exercises a real database, a real HTTP client, a real clock — and they don't match the mock. The fast-but-wrong test is worse than no test because it provides false confidence. | Mock external services (HTTP, DB, third-party APIs) with named contract assertions. Don't mock language primitives, your own modules, or anything within the unit you're testing. Time-skew tests deserve a fake clock, not a real one. |
| Checkpoint | Required artifact | What "no evidence" looks like |
|---|---|---|
| End of Step 1 | A test name in it <verb>s <subject> when <condition> form | "I'll figure out what to test as I write." |
| End of Step 2 | Failing test code + red runner output (paste) | "I wrote a test; it should fail." |
| End of Step 3 | Passing test + green runner output (paste) + full-suite output | "Tests pass on my machine." |
| End of Step 4 | Post-refactor green output (paste) | "I cleaned things up." |
| End of Step 5 | All acceptance criteria covered by named tests | "Coverage is good." |
If the runner output isn't pasted somewhere in the PR or scratch artifact, you have not satisfied this skill.
expect(x).toBe('hello world') against return 'hello world'). It's a
tautology, not a behavioral check.test: commits. The tests were retrofitted.npx claudepluginhub duthaho/claudekit --plugin claudekitEnforces RED-GREEN-REFACTOR TDD cycle: write a failing test first, then minimal code to pass, then refactor. Use when implementing features or fixing bugs during the implement phase.
Enforces test-driven development for features, bug fixes, and refactoring. Requires failing tests before any production code, with guidance on test types and spec-to-test mapping.
Enforces strict Test-Driven Development (TDD): write failing test first for features, bug fixes, refactors before any production code.