Skill

gen-tests

Génère les tests pytest manquants. Fan-out parallèle par fichier. Triage mécanique/sémantique. Améliore le code source si les tests échouent.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/starter:gen-tests

User invocable

Model invocable

Inline context

Default effort

Uses dynamic context injection — preprocesses shell commands at runtime

Tool Access

This skill is limited to the following tools:

Bash(python:*)Bash(uv:*)Bash(poetry:*)Bash(git add:*)Bash(git diff:*)Bash(git ls-files:*)Bash(printf:*)ReadGlobEditMultiEdit

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Three modes (decided from `$ARGUMENTS`):

SKILL.md

220 lines · ~3.1k tokens

Stats

LanguagePython

Stars0

MaintenanceExcellent

Last CommitMay 21, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

/starter:gen-tests

Three modes (decided from $ARGUMENTS):

Argument	Mode	Targets
(none)	Diff mode	`.py` changed in working tree (staged + unstaged + untracked), excluding `tests/*`
One or more paths	Targeted	Each path; directories expanded to `*/.py`
`all`	Full sweep	Every tracked `.py`, excluding `tests/*`. Slow; surfaces pre-existing untested code

Context

Argument: $ARGUMENTS
Runner: !python "${CLAUDE_PLUGIN_ROOT}/tools/resolve_runner.py" 2>/dev/null || echo uv
Changed Python files: !python "${CLAUDE_PLUGIN_ROOT}/tools/list_changed.py" --no-tests 2>/dev/null
All Python files (only used if $ARGUMENTS == "all"): !python "${CLAUDE_PLUGIN_ROOT}/tools/list_changed.py" --all --no-tests 2>/dev/null
Pytest version: !python "${CLAUDE_PLUGIN_ROOT}/tools/resolve_runner.py" --probe pytest 2>/dev/null

Operating mode

Silent. Emit NO narrative text between tool calls — no "Fanning out…", no "Reading files…", no "Delegating to…". The user sees only tool calls and the final summary line. Every intermediate status belongs in the final summary, not as a standalone text block.

Your task

Generate the missing pytest tests for the resolved targets via parallel fan-out (one test-writer subagent per source file, all spawned in a single message), verify they collect+pass, auto-repair mechanical failures, and emit one summary line. Halt only when semantic failures (real assertion mismatches) remain — those need the human's judgment, not another retry.

Guards

pytest: NOT INSTALLED → output pytest not installed. Run /starter:proj-init. Stop.
Resolve target list:
- $ARGUMENTS empty → use Changed Python files. If empty → output No target files. Stop with success.
- $ARGUMENTS == "all" → use All Python files. If empty → output No tracked .py files. Stop with success.
- Otherwise → split tokens; for each token, if it's a directory, glob **/*.py excluding tests/**; if a file, take it directly; if missing, output Target not found: <token>. Stop. Reject any token under tests/: output Target must not be under tests/. Stop.

Discover targets

The model performs target discovery directly using Read and Glob. Build two lists: targets (files needing test generation) and skipped (files filtered out, with reason).

Step 1 — Resolve package_name and import_root from pyproject.toml:

Read pyproject.toml. Extract [project].name (PEP 621) or [tool.poetry].name. Lowercase, replace hyphens with underscores → package_name.
If src/<package_name>/ exists (Glob for src/<package_name>/__init__.py) → import_root = package_name.
Else if <package_name>/__init__.py exists at repo root → import_root = package_name.
Else if src/__init__.py exists or sources live under src/ → import_root = "src".
Else → import_root = null, package_name = null (path-based imports).

Step 2 — Path mirror (deterministic):

Source	Test path
`src/foo/bar.py`	`tests/foo/test_bar.py`
`foo/bar.py` (no `src/`)	`tests/foo/test_bar.py`
`pkg.py` (root)	`tests/test_pkg.py`

Step 3 — Skip filter. Diff mode and all mode apply the filter; targeted mode skips this step (when the user explicitly named files, respect that intent). For each source, Read the file content and apply heuristics in order. The first match wins:

Skip reason	Detection
`fastapi-handler`	Imports `fastapi` AND defines a router/app at module top level (e.g., `app = FastAPI(...)`, `router = APIRouter(...)`)
`streamlit-page`	Imports `streamlit` (typically `import streamlit as st`)
`cli-entrypoint`	Has `if __name__ == "__main__":` AND uses `argparse`, `click`, `typer`, or `sys.argv` directly
`model-only`	Module's only top-level non-underscore symbols are dataclasses (`@dataclass`), Pydantic models (inherit `BaseModel`), or `Enum` subclasses. No standalone functions.
`no-public-symbols`	No top-level `def`, `async def`, or `class` whose name does not start with `_`

Step 4 — Existing-tests check. For each non-skipped source, compute its test_path (Step 2), then Glob for it. If the test file exists, Read it and list its top-level def test_* functions. For each public symbol in the source, check whether at least one test mentions the symbol name. If every public symbol has a corresponding test, skip with reason all-tested.

Step 5 — Build missing_symbols. For each source that survived all filters, list public symbols (top-level def/async def/class not starting with _) that have no existing test. Each entry: {"name": "<symbol>", "is_async": <bool>} (true only for async def symbols).

Step 6 — Assemble targets[]. Each target: {"source_path", "test_path", "missing_symbols"}. Drop targets whose missing_symbols is empty (treat as all-tested).

If targets is empty after this discovery → output All target files already have tests (or were skipped). N skipped: <reason summary>. Stop with success.

Fan-out to test-writer (parallel)

Issue ALL Task calls in a single message — this is the parallelism that delivers the speedup. For N targets ≤ 10, that's N Task tool calls in the same response. For N > 10, split into batches of 10 across consecutive messages (Claude Code's parallel limit is 10 per message).

Each Task call invokes subagent test-writer with this JSON:

{
  "target": {
    "source_path": "...",
    "test_path": "...",
    "missing_symbols": [...]
  },
  "package_name": "<package_name from Step 1>",
  "import_root": "<import_root from Step 1>",
  "runner": "<literal Runner from Context, uv or poetry>"
}

Each subagent returns a single line:

file=<test_path> tests_added=<n> omitted=<n> collection_ok=<true|false>

Aggregate across all subagents:

files = list of every file= value
total_added = sum of tests_added
total_omitted = sum of omitted
collection_ok_all = AND of all collection_ok values

If collection_ok_all == false → output Halted: test collection failed. followed by the file=... line(s) where collection_ok=false. Stop.

Stage

Run git add in its own Bash call — do not chain it with pytest. If chained, pytest's exit code 1 will mask a successful stage and surface as a spurious error.

git add -- <space-separated test paths from files>

Verify

Run pytest in a separate Bash call after staging completes:

<runner> run pytest -q --no-header --tb=short <space-separated test paths from files> 2>&1

Replace <runner> with the literal Runner value from the Context above. Read pytest's output. Build counters passed, failed, errors from the final pytest summary line.

If `failed == 0 && errors == 0`

Output Generated tests for N files in parallel: <comma-list>. All tests pass. Stop with success.

If tests fail — triage, then fix

Step 1 — triage each failure. For each failing test, Read the test file and the pytest traceback. Classify the failure into exactly one of two categories:

Category	Description	Examples
Mechanical	The test cannot execute due to a setup error produced by the test-writer, not a source deficiency	Wrong `mock.patch` target path (`rag_engine.fn` when fn lives in `text_processor`), missing import in test file, wrong fixture name, wrong exception class imported
Semantic	The test runs but the source code does not satisfy the assertion	`AssertionError`, unexpected return value, function raises when it shouldn't (or vice versa), wrong side-effect

Step 2 — fix mechanical failures automatically (no consent needed).

Mechanical failures are test-writer bugs — the test logic is correct but the plumbing is wrong. Fix them directly in the test file:

Wrong mock.patch path: the patch target must be the module where the name is looked up at call time. If rag_engine.py calls setup_advanced_text_processing() which it imported from text_processor, the correct patch is rag_engine.setup_advanced_text_processing (patch where it is used, not where it is defined). Read the source file to confirm the actual import, then correct the patch string.
Missing import in test file: add the missing import / from ... import line.
Wrong fixture name or conftest path: correct to match what pytest expects.

Apply mechanical fixes via Edit/MultiEdit on the test files only. Re-run pytest after fixing all mechanical failures:

<runner> run pytest -q --no-header --tb=short <test paths> 2>&1

If all tests now pass → stage both test files and any modified source files → output summary → stop with success.

Step 3 — propose source code improvements for remaining semantic failures.

Tests are the truth. If a semantic failure remains, the original source code has a deficiency — the test expectation is correct. Do NOT change test assertions or test logic.

Read each failing test file and the corresponding source file. Determine what change to the source code would make the test pass. Display all proposed improvements:

<N> semantic test failure(s) — proposing improvements to source code:

  1. <test_file>::<test_name> — <failure description>
     Source: <source_file>:<line>
     → Proposed improvement: <concrete description of the code change>

  2. ...

Step 4 — ask consent once. Call AskUserQuestion with:

header: Source code improvements
question: Do you want me to improve the source code so all tests pass?
multiSelect: false
options:
- label Yes, description Apply all proposed improvements above
- label No, description Skip — tests remain failing

Step 5 — on Yes, apply improvements.

Apply the proposed changes to the source files via Edit/MultiEdit. Then re-run pytest:

<runner> run pytest -q --no-header --tb=short <test paths> 2>&1

If all tests pass → stage test files and modified source files → output summary → stop with success.

If tests still fail → repeat Steps 3–5 (cap at 2 iterations total). If still failing after 2 iterations:

Halted: <n> test(s) still failing after source code improvements. Review the failures above.

Stop with non-zero exit.

Step 5 — on No.

Tests generated but <N> failing. Source code improvements declined.

Stop with success (the tests are written and staged, even if some fail).

Hard rules

Silent. No narrative between tool calls. "Fanning out…", "Reading files…", "Delegating to…" are all forbidden. Only the final summary block is output.
Never invoke test-fixer. Mechanical failures are repaired directly by the parent (Step 2 of the failure triage). Do not spawn a test-fixer subagent.
Stage and Verify in separate Bash calls. Never chain git add and pytest in one command — pytest's exit code 1 will mask a successful stage.
Fan-out parallèle: les N appels Task partent dans un seul message du parent. Sériels = pénalité de 5-10× sur le wall-clock.
Two failure categories, two responses. Mechanical failures (wrong mock path, missing import) are fixed in the test file automatically — they are test-writer bugs, not source deficiencies. Semantic failures (bad assertions, wrong return value) are fixed in the source code, with user consent.
Never weaken a test. Never change test assertions, expected values, or expected exceptions. Never remove or skip a test case. Mechanical fixes touch only plumbing (patch target string, import statement, fixture name) — never the assertion logic.
Consent before modifying source. Always AskUserQuestion before changing the user's original code. The user must see what will change and approve.
Hermetic. Never write gen_tests_*.py, targets.json, parser scripts, or any scratch helper into the user's tree.
Single message per phase. Discover (1 message of Reads/Globs), fan-out (1 message of N Tasks), stage (1 message of git adds), verify (1 message), then improvement loop (1 message per iteration).

gen-tests

Invocation

Tool Access

Context Preview

SKILL.md

gen-tests

Invocation

Tool Access

Context Preview

SKILL.md

/starter:gen-tests

Context

Operating mode

Your task

Guards

Discover targets

Fan-out to test-writer (parallel)

Stage

Verify

If `failed == 0 && errors == 0`

If tests fail — triage, then fix

Hard rules

Similar Skills

/starter:gen-tests

Context

Operating mode

Your task

Guards

Discover targets

Fan-out to test-writer (parallel)

Stage

Verify

If `failed == 0 && errors == 0`

If tests fail — triage, then fix

Hard rules

Similar Skills

gen-tests

Invocation

Tool Access

Context Preview

SKILL.md

gen-tests

Invocation

Tool Access

Context Preview

SKILL.md

/starter:gen-tests

Context

Operating mode

Your task

Guards

Discover targets

Fan-out to test-writer (parallel)

Stage

Verify

If failed == 0 && errors == 0

If tests fail — triage, then fix

Hard rules

Similar Skills

/starter:gen-tests

Context

Operating mode

Your task

Guards

Discover targets

Fan-out to test-writer (parallel)

Stage

Verify

If failed == 0 && errors == 0

If tests fail — triage, then fix

Hard rules

Similar Skills

If `failed == 0 && errors == 0`

If `failed == 0 && errors == 0`