From code-factory
Use when you have a plan directory from decomposing-specs and need to execute it — loads phases on demand, dispatches phase-elaborator for sketched phases, runs implementers with size-scaled phase reviews, and validates the full spec at the end
How this skill is triggered — by the user, by Claude, or both
Slash command
/code-factory:executing-plansThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Orchestrate execution of a multi-file plan produced by `decomposing-specs`. The plan lives at `docs/plans/<topic>/` with `plan.md`, `standards.md`, and `phases/NN-<name>.md` files. Phase 1 and Verification are pre-elaborated; phases 2..N-1 start as sketches and are elaborated just-in-time. The orchestrator never writes code — it only coordinates.
Orchestrate execution of a multi-file plan produced by decomposing-specs. The plan lives at docs/plans/<topic>/ with plan.md, standards.md, and phases/NN-<name>.md files. Phase 1 and Verification are pre-elaborated; phases 2..N-1 start as sketches and are elaborated just-in-time. The orchestrator never writes code — it only coordinates.
Efficiency posture: subagent dispatch is expensive. Load only what you need (plan.md + the active phase file), batch consecutive cohesive tasks, scale phase-boundary review to phase size, and gate re-reviews on severity. Reviewer cycles can outnumber implementer cycles if you're not careful.
State-machine diagram: flow.md. Open it once for orientation if you need it.
Read only these two files into context at startup:
docs/plans/<topic>/plan.md — phases table, summaries, coverage matrixdocs/plans/<topic>/standards.md — shared codebase contextDo not read all phase files at startup. Each phase file is loaded only when its turn comes.
Extract from plan.md:
For each phase in order:
If the phase status is sketch, dispatch the phase-elaborator agent:
Spec path: <from plan.md>
Plan path: docs/plans/<topic>/plan.md
Standards path: docs/plans/<topic>/standards.md
Phase file path: docs/plans/<topic>/phases/NN-<name>.md
Prior phase summary: <one paragraph: what phase N-1 actually built — files created/modified, key decisions, any drift from its sketch>
Repo root: <pwd>
The elaborator overwrites the sketch in-place and returns a short summary. If it reports drift (anticipated files turned out wrong, EARS coverage shift, task count changed), update plan.md's phase row + coverage matrix accordingly. If the drift is structural (new dependency between phases, scope change), pause and surface to the user.
If the phase is already elaborated (Phase 1 or Verification), skip this step.
Read docs/plans/<topic>/phases/NN-<name>.md into context now. This is the only phase file in your context — older phase files were dropped after their phase completed (they're on disk if needed).
Walk tasks top to bottom within the phase. For each task (or batch — see below), dispatch the implementer agent with:
standards.md (or hand the path; small files are fine to inline). Implementers must cite standards.md for shared context but only act on the deltas in the task.Default to batching when consecutive sequential tasks within a phase share context. One implementer running 2-4 cohesive tasks is dramatically cheaper than 2-4 separate dispatches — context loads once.
Batch consecutive tasks N..N+k when ALL of these hold:
[P] markers between themFiles: lists overlap by ≥50%, OR they share the same pattern-to-mirror citation**Risk:** highThe implementer reports per-task DONE/BLOCKED in the result. If any task in the batch returns BLOCKED or NEEDS_CONTEXT, the implementer reports which task and what it completed; resume the rest as a smaller batch or individual dispatch.
Do NOT batch when tasks touch disjoint subsystems, an intermediate task is high-risk, or a task is flagged **Risk:** high.
Tasks marked [P] within a phase have no intra-phase dependencies and may be dispatched concurrently:
[P] tasks in the phase[P] tasks before parallelizing. If two [P] tasks are each <1 hour and touch disjoint files, hand both to a single implementer to run sequentially. Reserve concurrent dispatch for [P] tasks ≥1 hour each or that benefit from isolation.[P] tasks execute sequentially (or batched) after all parallel tasks completeIf any parallel task fails or blocks, handle individually — don't block the others. Concurrent dispatch is advisory; sequential is the safe fallback.
| Status | Response |
|---|---|
| DONE | Proceed to next task or phase review |
| DONE_WITH_CONCERNS | Read concerns. Correctness/scope issues → dispatch fix. Observations → note and proceed. |
| NEEDS_CONTEXT | Provide missing context (often a row from standards.md), re-dispatch same task |
| BLOCKED | Context problem → provide more context. Too hard → break task down. Plan wrong → generate fix task or re-elaborate phase. |
Track any Ephemeral Tests entries reported by implementers. If the report lists anything other than None, record the file path, test name, originating task, and whether durable behavioral coverage already exists. Do not treat the presence of ephemeral tests as a task failure during the phase, but do not lose the list — final validation must clean them up.
If an implementer fails the same task 3 times, stop execution and report to the user.
After all tasks in the phase complete, scale review depth to phase size and risk.
When the current phase will run Tier B or Tier C review and the next phase exists with status sketch, dispatch the next phase's phase-elaborator at the same time as the phase-boundary reviewers. This is the only permitted speculative elaboration: at most one upcoming phase, only while review agents are already running, and never while current-phase implementation is still in progress.
Use the normal elaborator prompt from Step 2a, with the current phase summary marked as pending review:
Prior phase summary: <one paragraph: what phase N actually built before review — files created/modified, key decisions, any known concerns>
Review status: Phase N review is in progress; this elaboration may need a small follow-up adjustment if review fixes materially change files, APIs, dependencies, or EARS coverage.
Do not read the next phase file into the orchestrator's context yet. Let the elaborator overwrite the sketch in-place and report drift in the background while the current phase review continues.
After review and any review-triggered fixes complete:
phase-elaborator for a quick adjustment before Step 2b reads the next phase file.plan.md so the phase row has status elaborated and the final task count from the elaborator. If the material changes alter phase boundaries or the coverage matrix, update those rows too. Structural scope changes still pause for the user.If the current phase is Tier A, skip this optimization; there is no review window to hide the elaboration behind, so the normal Step 2a just-in-time elaboration is cheaper and simpler.
| Phase shape | Review tier |
|---|---|
1-2 tasks, no **Risk:** high flag | Tier A — defer. Skip phase-boundary review entirely. The final spec-reviewer pass at Step 3 covers correctness; design/security/test concerns fold into the next phase's review or the final pass. |
1-2 tasks BUT flagged **Risk:** high | Tier B — focused. Dispatch only the reviewers relevant to the risk (e.g., security-reviewer + correctness-reviewer for an auth boundary). Skip the full 4-agent suite. |
| 3+ tasks, normal risk | Tier C — full suite (4 reviewers in parallel + test-coverage-reviewer). |
3+ tasks, includes **Risk:** high | Tier C with extra weight on the relevant specialized reviewer. |
Dispatch all 4 specialized reviewers in parallel, each with the files changed during the phase, the phase file (for plan alignment), and the standards file (for convention reference):
correctness-reviewer — plan alignment, logic, completeness, edge casesdesign-reviewer — patterns, naming, reuse, deduplication, complexitysecurity-reviewer — vulnerabilities, input validation, auth, secretstest-quality-reviewer — assertion quality, test design, edge cases, anti-patternsWait for all 4. Aggregate and deduplicate findings: when multiple reviewers flag the same file:line, merge into one finding noting which reviewers reported it.
SendMessage for re-review. Only re-run reviewers that failed.After code review passes, dispatch test-coverage-reviewer with:
PASS → proceed to next phase. FAIL → dispatch implementer to write missing tests, continue the same coverage reviewer for re-check. Max 2 fix cycles.
Skip Stage 2 in Tier A. The final spec-reviewer at Step 3 covers requirement-to-test mapping for deferred phases.
Spec compliance is not checked at phase boundaries — only at Step 3 (final validation).
After phase review passes, the phase file's contents are no longer needed in context. The next phase's elaboration + reading replaces it. (You don't have to actively unload — just don't carry phase N's task text forward.)
After all phases complete, run three-stage final validation.
Before running final CI, remove or rewrite any tests marked or reported as ephemeral during implementation.
Dispatch an implementer with:
Ephemeral Tests list from task reportsTest durability: ephemeralplan.mdThe cleanup implementer must:
durable unless they are mislabeled implementation-detail assertions, in which case rewrite them as behavioral testsIf no implementer reported ephemeral tests, still do a lightweight scan of changed test files for obvious scaffolding tests: file-existence checks, symbol-name checks, helper-call-only assertions, module-structure assertions, empty stubs, and tautological mock assertions. If found, dispatch the cleanup implementer with those candidates.
If ephemeral tests remain after 2 cleanup attempts, stop final validation and surface the remaining file/test names plus the reason they could not be removed. Do not run final CI/spec compliance while known ephemeral tests remain.
Run the project's full test suite, linter, formatter, and typechecker (commands from standards.md). If anything fails, dispatch implementer to fix. Repeat until green or 3 attempts, then report to user.
Different from phase reviews. Phase reviews check "did we build what the plan said?" This checks "did we satisfy the original spec?"
Dispatch spec-reviewer with:
It validates:
Deviation tolerance: implementations that deviate from the plan are acceptable if the EARS requirement is satisfied and no unrelated functionality breaks.
If gaps are found:
Max 3 remediation cycles. If gaps remain, escalate to auto-debugger.
Dispatch auto-debugger as a last resort. Provide:
Critical: auto-debugger gets fresh context — do NOT continue an existing agent or include prior remediation history. Phase-boundary reviewers reuse context for focused re-checks; auto-debugger is the opposite. Fresh context avoids retry loops.
| Verdict | Interactive (user present) | Autonomous (coder-task) |
|---|---|---|
RETRY_TASK | Dispatch new implementer with fix plan + current diff, re-run final validation (one attempt) | Same |
BLOCK_TASK | Mark task blocked with root cause, continue with remaining work, report blocked items at end | Same |
NEEDS_HUMAN | Stop, report root-cause analysis, wait for guidance | Do NOT stop. Post root-cause analysis + specific questions to the GitHub issue. Treat affected tasks as BLOCK_TASK. Continue with remaining unblocked work. Note the gap in the PR description. |
| Mistake | Fix |
|---|---|
| Reading all phase files at startup | Load only plan.md and standards.md initially. Phase files load one at a time when their turn comes. |
| Executing a sketched phase without elaborating | If plan.md shows status sketch for the upcoming phase, dispatch phase-elaborator first. |
| Pre-elaborating future phases speculatively | Only elaborate one phase ahead during Tier B/C phase review. If review fixes materially change the prior phase, send a follow-up adjustment to the elaborator before reading the next phase. |
| Ignoring elaborator drift reports | If phase-elaborator reports the sketch was wrong, update plan.md's phase row + coverage matrix before continuing. |
| Orchestrator writes code itself | Only dispatch agents — never write code |
| One implementer dispatch per task when consecutive tasks share context | Batch consecutive cohesive tasks (≤4 hrs, overlapping files) into one implementer call |
| Running the full 4-agent suite on a 1-task phase | Use Tier A (defer); the final spec-reviewer covers correctness anyway |
| Re-dispatching reviewers for MINOR findings | Severity-gate re-review to CRITICAL/MAJOR only |
Two [P] tasks each <1 hr dispatched as two parallel agents | Bundle into one implementer running sequentially |
| Re-running all reviewers when only some failed | Only re-run reviewers that returned ISSUES |
| Running spec-reviewer at a phase boundary | Spec compliance is checked only during final validation |
| Pasting the plan directory path instead of task text into the implementer prompt | Inline the task block + relevant standards rows. Implementers don't read plan files. |
| Ignoring DONE_WITH_CONCERNS | Read concerns before deciding to proceed |
| Retrying a blocked implementer without changes | Change something: more context, smaller task, or re-elaborate the phase |
| Treating ephemeral TDD tests as final coverage | Track them from implementer reports and remove or replace them before final CI |
| Final review checks plan compliance, not spec | Final review must check EARS requirements from the spec |
| Including prior attempts in auto-debugger prompt | Auto-debugger must get fresh context — spec, failures, code only |
| Blocking all parallel tasks when one fails | Handle parallel task failures individually after all resolve |
Guides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub xmtplabs/code-factory --plugin code-factory