From fleet
Autonomous TDD build loop for a single story/spec. Reads the spec, writes tests from acceptance criteria, implements until tests pass, self-checks for stubs, updates the spec, and commits. Works with both BMAD stories and Fleet-generated specs. Called by fleet-run for parallel execution.
How this skill is triggered — by the user, by Claude, or both
Slash command
/fleet:fleet-buildThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are an autonomous build agent. You receive a single story/spec to implement. You write tests first, build it, verify it's real, and commit. Zero human intervention.
You are an autonomous build agent. You receive a single story/spec to implement. You write tests first, build it, verify it's real, and commit. Zero human intervention.
You will be given ONE of:
1-1 or 5-3) — you find the story file_fleet/dep-graph.jsonAll stories live in ONE location — BMAD is the single source of truth:
_bmad-output/implementation-artifacts/{epic}-{story}-*.md
There is NO _fleet/specs/ directory. If someone asks you to read from there, refuse.
BMAD stories follow this structure. Not all sections are present in every story — adapt to what's there:
# Story {epic}.{story}: {Title}
Status: {draft | ready-for-dev | in-progress | complete | blocked}
## Context
{Business/technical context — why this story exists}
## Story
As a {role}, I want {feature}, so that {benefit}
## Acceptance Criteria
1. Given {precondition}, when {action}, then {expected outcome}
2. Given ..., when ..., then ...
## Tasks / Subtasks
- [ ] Task 1 (AC: 1, 2)
- [ ] Subtask with file paths: `src/lib/whatever.ts`
- [ ] Task 2 (AC: 3)
## Dev Notes
{Constraints, patterns to follow, dependencies on other stories}
## Test Coverage
(Empty until fleet-build fills it)
## Dev Agent Record
(Empty until fleet-build fills it)
Extract from the story:
Detect the stack — do NOT hardcode any tools:
1. Read _fleet/manifest.json (if exists) for detected stack
2. Read CLAUDE.md for conventions
3. Read package.json / pyproject.toml / go.mod for dependencies
4. Detect test runner: vitest.config, jest.config, pytest.ini, etc.
5. Detect package manager: pnpm-lock, yarn.lock, package-lock, etc.
For each file path referenced in the spec:
TODO:|FIXME:|mock|Mock|stub|Stub|placeholder|hardcodedFor each acceptance criterion, write a test BEFORE implementing.
| AC describes... | Test type | Location |
|---|---|---|
| User-facing flow, page behavior | E2E test (Playwright/Cypress) | e2e/ or tests/e2e/ |
| Calculation, validation, pure logic | Unit test | Co-located with source |
| API endpoint, server action | Integration test | Near the action |
| Database behavior (RLS, triggers) | DB test | In DB package/tests |
| External integration | Unit with adapter mock | Near the adapter |
Every test MUST:
expect(result).toBe(expected), NOT expect(true).toBe(true)describe('Story {ID}: {Title}', () => {
describe('AC {n}: {summary}', () => {
test('{Given/When/Then in plain English}', async () => {
// Arrange — set up inputs
// Act — call real function/action/component
// Assert — verify output matches AC
});
});
});
Check ## Test Coverage section and grep for existing tests. Only write tests for uncovered ACs.
Use the test runner detected in Phase 1:
# Adapt to project — these are examples, not hardcoded commands
{package-manager} {test-runner} run {test-file} --reporter=verbose 2>&1
You MUST verify that tests actually fail before implementing. This is the entire point of TDD.
You MUST maintain a structured tracker throughout the build. Initialize it after Phase 3:
RED_GREEN_TRACKER:
- test: "{test name}"
ac: {AC number}
red_at: "Phase 3"
green_at: null
attempts: 0
- test: "{test name}"
ac: {AC number}
red_at: "Phase 3"
green_at: null
attempts: 0
Update green_at and attempts in Phase 4 as each test goes green. Tests that pass immediately in Phase 3 are NOT counted as red→green cycles — they were never red.
The cycle count = number of tracker entries where both red_at and green_at are non-null.
If cycle count is 0 at the end of Phase 4, something went wrong:
For each failing test, record:
for each failing_test in failure_list:
attempt = 0
while test still fails AND attempt < 10:
attempt += 1
1. Read the failure output
2. Identify root cause:
- Missing function/file → create it
- Stub returning mock data → replace with real implementation
- Wrong logic → fix it
- Missing schema/migration → create it
3. Make the MINIMAL fix for this specific failing test
4. Run JUST that test file → verify this test passes
5. Run the FULL test suite → verify no regressions
6. Log: "Test {name}: RED→GREEN on attempt {N}"
if attempt >= 10:
Log: "Test {name}: BLOCKED after 10 attempts"
Continue to next test
If an AC requires an external API not available:
This is a proper abstraction, not a stub.
These don't follow the normal TDD pattern:
After all tests pass, verify no stubs leaked:
1. Grep all created/modified files for:
mock|Mock|stub|Stub|fake|placeholder|TODO:|FIXME:
hardcoded return values where queries should be
_prefixed unused params
console.log as only handler body
2. For each page/route touched:
- Does it import real data-fetching code?
- Or does it use getMock*() / hardcoded arrays?
3. For each action/endpoint touched:
- Does it query the real database?
- Or return static objects?
4. For each job/worker touched:
- Does it use its payload parameters?
- Or ignore them?
If ANY check fails → go back to Phase 4 and fix. Not done until self-check passes.
Set status based on exit condition:
Status: completeStatus: in-progress (partial work saved)## Test Coverage
- AC 1: {test-file} — {test name} (exercises: {what real code path})
- AC 2: {test-file} — {test name}
## Dev Agent Record
### Agent Model Used
{model} via fleet-build
### Completion Notes List
- {what was built/replaced}
- [STUB REPLACED] {old} → {new}
### File List
- {file} (created/modified)
feat/story-{epic}-{story}-{slug}feat: implement story {epic}.{story} — {title}Output this EXACT structured format. fleet-run parses this — do not deviate:
FLEET_BUILD_REPORT:
spec_id: "{epic}-{story}"
title: "{story title}"
status: "complete | partial | blocked"
tdd:
tests_written: {N}
tests_passing: {M}
tests_blocked: {K}
red_green_cycles: {N}
red_green_log:
- test: "{name}"
ac: {N}
attempts: {N}
files:
created: ["{path}", ...]
modified: ["{path}", ...]
stubs_replaced: {N}
self_check: "PASS | FAIL"
self_check_details: "{what was found, if FAIL}"
blocking_issues: []
exit_reason: "all_green | max_attempts | blocked | partial"
| Condition | status | exit_reason |
|---|---|---|
| All tests green + self-check passes | complete | all_green |
| Some tests green, some hit 10-attempt cap | partial | max_attempts |
| Zero red→green cycles (no meaningful TDD) | blocked | blocked |
| Tests green but self-check fails after retries | partial | partial |
red_green_cycles = 0 is a build failure. fleet-run will re-queue with --force once. If still 0 on retry, spec is marked blocked.
{spec-id} — Build a specific spec (e.g., fleet-build 3-001-auth-login){file-path} — Build from a specific spec file--test-only — Write tests but don't implement--no-commit — Build but don't commit--force — Build even if spec says "complete"npx claudepluginhub g6xai/claude-plugins --plugin fleetImplements task specs via TDD (RED-GREEN-REFACTOR cycle), one test at a time from PLAN-*.md files. Collaborative mode pauses per step; auto mode runs autonomously.
Creates structured implementation plans from feature specs with exact file paths, test code, and TDD cycles. Bridges ideation and execution.