From ai-dev-toolkit
Use when the user asks to add tests, expand test coverage, gate quality with tests, test against specs, write unit tests for deterministic behavior, or verify the app matches its design specs. Also use when the user says "cover the app with tests", "make sure the specs are enforced by tests", or "what test gaps do we have". Use this for any project with design specs, PRDs, or implementation plans that define testable behavior.
How this skill is triggered — by the user, by Claude, or both
Slash command
/ai-dev-toolkit:spec-driven-testsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Write deterministic unit tests that enforce an app's design specs, PRD, and implementation plans as CI-gated quality checks.
Write deterministic unit tests that enforce an app's design specs, PRD, and implementation plans as CI-gated quality checks.
Core principle: Tests validate what the specs say the app does — not just what the code happens to do. Every test traces back to a documented behavior.
REQUIRED BACKGROUND: Follow superpowers:test-driven-development for the red-green-refactor cycle when writing tests for new behavior. For covering existing behavior, tests passing immediately is expected.
digraph spec_tests {
rankdir=TB;
discover [label="1. Discover\nproject context", shape=box];
read_specs [label="2. Read specs\n& PRD", shape=box];
explore_gaps [label="3. Explore\ncurrent tests", shape=box];
identify [label="4. Build gap\ntable", shape=box];
plan [label="5. Plan test\nphases", shape=box];
write [label="6. Write tests\n(TDD cycle)", shape=box];
verify [label="7. Verify spec\nalignment", shape=box];
update_docs [label="8. Update docs\n& counts", shape=box];
discover -> read_specs -> explore_gaps -> identify -> plan -> write -> verify -> update_docs;
verify -> write [label="misaligned"];
}
Before anything else, understand the project's testing setup:
vitest.config.*, jest.config.*, pytest.ini, tsconfig.json, package.json scripts)**/*.test.*, **/*.spec.*, __tests__/, tests/).github/workflows/, .gitlab-ci.yml, Jenkinsfile, etc. — understand what gates mergesdocs/, specs/, *.md)Run the existing test suite to get baseline counts:
# Find and run the test command (varies by project)
npm run test:run # or: pytest, cargo test, go test ./..., etc.
Read all design docs to extract deterministic behaviors. Look for:
For each behavior, note the source document and section for traceability.
For each test file, understand what behaviors are already covered. Build an inventory:
Map every spec requirement to test coverage:
| # | Spec requirement | Source doc | Tested? | Test file | Priority |
|---|---|---|---|---|---|
| R01 | Done tasks hidden from overview | PRD §3.2 | YES | task-groups.test.ts | — |
| R02 | Rate limit 30/member/hour | PRD §5.1 | YES | rate-limiter.test.ts | — |
| R03 | Tags max 2 + "+N more" | UX spec §4 | NO | — | High |
Priority rules:
Group tests for incremental progress:
Key principles:
export. This is a source change but not a behavioral change.Common patterns across frameworks:
For private helpers that need testing:
// Before: function pad(n) { ... }
// After: export function pad(n) { ... }
For components with browser APIs not available in test environment (scrollIntoView, visualViewport, etc.):
// Stub missing APIs in beforeEach
beforeEach(() => { Element.prototype.scrollIntoView = mockFn() })
For services that call external APIs:
// Mock the SDK, verify config (model, tokens), not response content
expect(apiCall.model).toBe('expected-model')
expect(apiCall.max_tokens).toBe(256)
After writing tests, cross-check each test against the spec it validates:
| Spec requirement | Test file | Test name | Aligned? |
|---|---|---|---|
| Done tasks hidden | task-groups.test | excludes done tasks | Yes |
| Max 2 tags displayed | task-card.test | shows first 2 tags | Yes |
If a test contradicts a spec: Fix the test (specs are authoritative). If the code contradicts a spec: Flag it to the user — don't silently accept.
After all tests pass, update project documentation:
| Mistake | Fix |
|---|---|
| Testing AI response content | Only test model selection, tools, tokens — not generated text |
| Mocking too deeply | Test the real function; only mock external deps |
| Testing implementation details | Test behavior visible to users, not internal state |
| Missing spec reference | Every test should trace to a spec requirement |
| Forgetting to update docs | Always update docs with new test counts |
| Writing tests without reading specs first | Specs define what's correct — code might be wrong |
| Proposing tests for non-deterministic behavior | If output varies per run, it's not a unit test candidate |
npx claudepluginhub elvinouyang/claude-skill-collection --plugin ai-dev-toolkitWrites failing tests from feature specs for TDD RED phase, parsing acceptance criteria and user test cases to match project test framework and conventions.
Creates and manages unit and integration tests by analyzing codebase, auto-detecting test frameworks, and generating tests that follow project conventions.
Enforces a test-driven development workflow with edge-case-first testing, strong assertions, and GIVEN/WHEN/SHOULD naming. Useful when writing or reviewing tests.