From cosmo-agent-skills
Test-driven workflow: write failing tests first, verify red/green, cover edge cases, and keep the full suite green before merge. Use when implementing features, fixing bugs, or refactoring — before and alongside production code.
How this skill is triggered — by the user, by Claude, or both
Slash command
/cosmo-agent-skills:code-testingThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
How to **test code correctly** — test-first when possible, always verify tests catch
How to test code correctly — test-first when possible, always verify tests catch real failures, and stress edge cases until the suite is green. Complements code-writing (how to implement) and code-reviewer (pre-merge review).
Test commands and fixtures live in the repo's CLAUDE.md (e.g. pytest, npm test).
Core principle: If you did not see the test fail for the right reason, you do not know it tests the right thing.
Default (test-first):
Ask before skipping test-first:
Never skip tests entirely for production paths — at minimum add or extend tests before calling work done.
RED Write one minimal failing test (one behavior)
↓
VERIFY RED Run targeted test — must fail for expected reason (not typo/import error)
↓
GREEN Minimal production code to pass
↓
VERIFY GREEN Run targeted test + full suite — all green
↓
REFACTOR Clean up (no new behavior); stay green
↓
REPEAT Next behavior or edge case
One test, one behavior. Name states what should happen.
Good: clear name, asserts real behavior, minimal setup, uses real code paths.
Bad: vague name (test1), tests mock call counts instead of outcome, multiple
unrelated assertions, "and" in the name (split tests).
Prefer testing observable behavior (return value, raised error, side effect on real objects) over implementation details.
For numerical / research code, also assert: shapes, finfinite, known limits,
regressions against a reference value when available.
Run the single test (or smallest file):
pytest path/to/test_module.py::test_name -q
# or project equivalent from CLAUDE.md
Confirm:
Skipping VERIFY RED is the most common way to ship useless tests.
Write the simplest code that passes. No extra features, config knobs, or refactors outside scope (code-writing §3). Complexity is allowed when tests or requirements prove the simple version fails — document why.
If other tests break, fix before moving on.
Only after green: rename, dedupe, extract helpers. No new behavior without a new RED test.
After happy path, add tests for behavior your change touches:
| Category | Examples |
|---|---|
| Empty / minimal | zero length, None where allowed, single element |
| Boundaries | min/max, off-by-one, saturation |
| Invalid input | wrong type, out of range, malformed config |
| Failure modes | I/O error, missing file, timeout |
| Numerics | NaN, Inf, denormal, dtype/shape mismatch |
Loop: add test → verify red (if bug exists) or green (regression guard) → fix → full suite green. code-reviewer expects this coverage for changed code.
Do not fix bugs without a regression test unless the user explicitly waives it.
| Quality | Good | Bad |
|---|---|---|
| Scope | One behavior per test | Kitchen-sink test |
| Name | Describes expected behavior | test_foo, test_works |
| Subject | Real code path | Mock interaction only |
| Proof | Saw it fail, then pass | Written after code; passed first run |
| Edges | Explicit cases listed | Happy path only |
mock.assert_called() without checking outcomeIdeal: test-first. When code already exists:
Tests-after that pass on first run do not prove they catch regressions — add a deliberate break or mutation check when unsure.
| Problem | Try |
|---|---|
| Don't know how to test | Write desired API/assertion first; ask user; simplify interface |
| Test setup huge | Extract fixtures; simplify design |
| Must mock everything | Reduce coupling; inject dependencies |
| Numerical test unstable | Tight tolerances with justification; fixed seeds; reference values |
| Excuse | Reality |
|---|---|
| "Too simple to test" | Simple code breaks; test is cheap |
| "I'll test after" | Pass-on-first-run proves nothing |
| "I manually tested" | Not repeatable; no regression net |
| "Keep code as reference" | You'll adapt it; that's tests-after |
| "Deleting work is wasteful" | Unverified code is debt |
| Skill | Role |
|---|---|
| code-writing | Surgical impl; simplicity vs needed complexity |
| code-reviewer | Confirms suite green + edge coverage on diff |
Repo CLAUDE.md | pytest paths, markers (slow), GPU fixtures |
Workflow: code-testing (this skill) while implementing → full green suite → code-reviewer before merge.
npx claudepluginhub licongxu/cosmo-agent-skills --plugin cosmo-agent-skillsGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.