From starter
Génère les tests pytest manquants. Fan-out parallèle par fichier. Triage mécanique/sémantique. Améliore le code source si les tests échouent.
How this skill is triggered — by the user, by Claude, or both
Slash command
/starter:gen-testsThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Three modes (decided from `$ARGUMENTS`):
Three modes (decided from $ARGUMENTS):
| Argument | Mode | Targets |
|---|---|---|
| (none) | Diff mode | *.py changed in working tree (staged + unstaged + untracked), excluding tests/** |
| One or more paths | Targeted | Each path; directories expanded to **/*.py |
all | Full sweep | Every tracked *.py, excluding tests/**. Slow; surfaces pre-existing untested code |
python "${CLAUDE_PLUGIN_ROOT}/tools/resolve_runner.py" 2>/dev/null || echo uvpython "${CLAUDE_PLUGIN_ROOT}/tools/list_changed.py" --no-tests 2>/dev/nullpython "${CLAUDE_PLUGIN_ROOT}/tools/list_changed.py" --all --no-tests 2>/dev/nullpython "${CLAUDE_PLUGIN_ROOT}/tools/resolve_runner.py" --probe pytest 2>/dev/nullSilent. Emit NO narrative text between tool calls — no "Fanning out…", no "Reading files…", no "Delegating to…". The user sees only tool calls and the final summary line. Every intermediate status belongs in the final summary, not as a standalone text block.
Generate the missing pytest tests for the resolved targets via parallel fan-out (one test-writer subagent per source file, all spawned in a single message), verify they collect+pass, auto-repair mechanical failures, and emit one summary line. Halt only when semantic failures (real assertion mismatches) remain — those need the human's judgment, not another retry.
pytest: NOT INSTALLED → output pytest not installed. Run /starter:proj-init. Stop.$ARGUMENTS empty → use Changed Python files. If empty → output No target files. Stop with success.$ARGUMENTS == "all" → use All Python files. If empty → output No tracked .py files. Stop with success.**/*.py excluding tests/**; if a file, take it directly; if missing, output Target not found: <token>. Stop. Reject any token under tests/: output Target must not be under tests/. Stop.The model performs target discovery directly using Read and Glob. Build two lists: targets (files needing test generation) and skipped (files filtered out, with reason).
Step 1 — Resolve package_name and import_root from pyproject.toml:
Read pyproject.toml. Extract [project].name (PEP 621) or [tool.poetry].name. Lowercase, replace hyphens with underscores → package_name.src/<package_name>/ exists (Glob for src/<package_name>/__init__.py) → import_root = package_name.<package_name>/__init__.py exists at repo root → import_root = package_name.src/__init__.py exists or sources live under src/ → import_root = "src".import_root = null, package_name = null (path-based imports).Step 2 — Path mirror (deterministic):
| Source | Test path |
|---|---|
src/foo/bar.py | tests/foo/test_bar.py |
foo/bar.py (no src/) | tests/foo/test_bar.py |
pkg.py (root) | tests/test_pkg.py |
Step 3 — Skip filter. Diff mode and all mode apply the filter; targeted mode skips this step (when the user explicitly named files, respect that intent). For each source, Read the file content and apply heuristics in order. The first match wins:
| Skip reason | Detection |
|---|---|
fastapi-handler | Imports fastapi AND defines a router/app at module top level (e.g., app = FastAPI(...), router = APIRouter(...)) |
streamlit-page | Imports streamlit (typically import streamlit as st) |
cli-entrypoint | Has if __name__ == "__main__": AND uses argparse, click, typer, or sys.argv directly |
model-only | Module's only top-level non-underscore symbols are dataclasses (@dataclass), Pydantic models (inherit BaseModel), or Enum subclasses. No standalone functions. |
no-public-symbols | No top-level def, async def, or class whose name does not start with _ |
Step 4 — Existing-tests check. For each non-skipped source, compute its test_path (Step 2), then Glob for it. If the test file exists, Read it and list its top-level def test_* functions. For each public symbol in the source, check whether at least one test mentions the symbol name. If every public symbol has a corresponding test, skip with reason all-tested.
Step 5 — Build missing_symbols. For each source that survived all filters, list public symbols (top-level def/async def/class not starting with _) that have no existing test. Each entry: {"name": "<symbol>", "is_async": <bool>} (true only for async def symbols).
Step 6 — Assemble targets[]. Each target: {"source_path", "test_path", "missing_symbols"}. Drop targets whose missing_symbols is empty (treat as all-tested).
If targets is empty after this discovery → output All target files already have tests (or were skipped). N skipped: <reason summary>. Stop with success.
Issue ALL Task calls in a single message — this is the parallelism that delivers the speedup. For N targets ≤ 10, that's N Task tool calls in the same response. For N > 10, split into batches of 10 across consecutive messages (Claude Code's parallel limit is 10 per message).
Each Task call invokes subagent test-writer with this JSON:
{
"target": {
"source_path": "...",
"test_path": "...",
"missing_symbols": [...]
},
"package_name": "<package_name from Step 1>",
"import_root": "<import_root from Step 1>",
"runner": "<literal Runner from Context, uv or poetry>"
}
Each subagent returns a single line:
file=<test_path> tests_added=<n> omitted=<n> collection_ok=<true|false>
Aggregate across all subagents:
files = list of every file= valuetotal_added = sum of tests_addedtotal_omitted = sum of omittedcollection_ok_all = AND of all collection_ok valuesIf collection_ok_all == false → output Halted: test collection failed. followed by the file=... line(s) where collection_ok=false. Stop.
Run git add in its own Bash call — do not chain it with pytest. If chained, pytest's exit code 1 will mask a successful stage and surface as a spurious error.
git add -- <space-separated test paths from files>
Run pytest in a separate Bash call after staging completes:
<runner> run pytest -q --no-header --tb=short <space-separated test paths from files> 2>&1
Replace <runner> with the literal Runner value from the Context above. Read pytest's output. Build counters passed, failed, errors from the final pytest summary line.
failed == 0 && errors == 0Output Generated tests for N files in parallel: <comma-list>. All tests pass. Stop with success.
Step 1 — triage each failure. For each failing test, Read the test file and the pytest traceback. Classify the failure into exactly one of two categories:
| Category | Description | Examples |
|---|---|---|
| Mechanical | The test cannot execute due to a setup error produced by the test-writer, not a source deficiency | Wrong mock.patch target path (rag_engine.fn when fn lives in text_processor), missing import in test file, wrong fixture name, wrong exception class imported |
| Semantic | The test runs but the source code does not satisfy the assertion | AssertionError, unexpected return value, function raises when it shouldn't (or vice versa), wrong side-effect |
Step 2 — fix mechanical failures automatically (no consent needed).
Mechanical failures are test-writer bugs — the test logic is correct but the plumbing is wrong. Fix them directly in the test file:
mock.patch path: the patch target must be the module where the name is looked up at call time. If rag_engine.py calls setup_advanced_text_processing() which it imported from text_processor, the correct patch is rag_engine.setup_advanced_text_processing (patch where it is used, not where it is defined). Read the source file to confirm the actual import, then correct the patch string.import / from ... import line.Apply mechanical fixes via Edit/MultiEdit on the test files only. Re-run pytest after fixing all mechanical failures:
<runner> run pytest -q --no-header --tb=short <test paths> 2>&1
If all tests now pass → stage both test files and any modified source files → output summary → stop with success.
Step 3 — propose source code improvements for remaining semantic failures.
Tests are the truth. If a semantic failure remains, the original source code has a deficiency — the test expectation is correct. Do NOT change test assertions or test logic.
Read each failing test file and the corresponding source file. Determine what change to the source code would make the test pass. Display all proposed improvements:
<N> semantic test failure(s) — proposing improvements to source code:
1. <test_file>::<test_name> — <failure description>
Source: <source_file>:<line>
→ Proposed improvement: <concrete description of the code change>
2. ...
Step 4 — ask consent once. Call AskUserQuestion with:
Source code improvementsDo you want me to improve the source code so all tests pass?falseYes, description Apply all proposed improvements aboveNo, description Skip — tests remain failingStep 5 — on Yes, apply improvements.
Apply the proposed changes to the source files via Edit/MultiEdit. Then re-run pytest:
<runner> run pytest -q --no-header --tb=short <test paths> 2>&1
If all tests pass → stage test files and modified source files → output summary → stop with success.
If tests still fail → repeat Steps 3–5 (cap at 2 iterations total). If still failing after 2 iterations:
Halted: <n> test(s) still failing after source code improvements. Review the failures above.
Stop with non-zero exit.
Step 5 — on No.
Tests generated but <N> failing. Source code improvements declined.
Stop with success (the tests are written and staged, even if some fail).
test-fixer. Mechanical failures are repaired directly by the parent (Step 2 of the failure triage). Do not spawn a test-fixer subagent.git add and pytest in one command — pytest's exit code 1 will mask a successful stage.Task partent dans un seul message du parent. Sériels = pénalité de 5-10× sur le wall-clock.AskUserQuestion before changing the user's original code. The user must see what will change and approve.gen_tests_*.py, targets.json, parser scripts, or any scratch helper into the user's tree.npx claudepluginhub nasswiel/shapsha --plugin starterProvides CDSS development patterns for drug interaction checking, dose validation, clinical scoring (NEWS2, qSOFA), and alert classification integrated into EMR workflows.