From claude-harness-forge
Autonomous build loop with Karpathy ratcheting, GAN evaluator, browser console capture, UI standards review, 8-gate ratchet, session chaining, and cross-project learnings. Iterates story groups until all features pass or stopping criteria met.
How this skill is triggered — by the user, by Claude, or both
Slash command
/claude-harness-forge:auto [--mode full|lean|solo|turbo] [--group GROUP_ID][--mode full|lean|solo|turbo] [--group GROUP_ID]The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Autonomous build loop implementing Karpathy's ratcheting pattern with GAN-style generator-evaluator separation, agent teams for parallel execution, sprint contracts for verifiable done-criteria, self-healing with failure-driven learning, and session chaining for multi-context-window builds.
Autonomous build loop implementing Karpathy's ratcheting pattern with GAN-style generator-evaluator separation, agent teams for parallel execution, sprint contracts for verifiable done-criteria, self-healing with failure-driven learning, and session chaining for multi-context-window builds.
/auto
/auto --mode lean
/auto --mode solo
/auto --group D
--mode controls which ratchet gates are enforced. Default: full. Options: full, lean, solo, turbo.--group resumes or targets a specific dependency group. If omitted, picks the next unfinished group from the dependency graph.Before /auto can run, the following must exist:
specs/stories/ — approved story files with acceptance criteria.specs/design/ — approved architecture artifacts including api-contracts.md and component-map.md..claude/program.md — project constraints and conventions.features.json — feature tracking file (created by /spec).specs/stories/dependency-graph.md — group ordering and dependencies.claude-progress.txt — session tracking file (created by /build phase 4).If any prerequisite is missing, stop and report what is absent. Do not proceed with partial context.
After verifying the prerequisites above, validate that the evaluation pipeline can actually run. Read project-manifest.json and check:
evaluation.api_base_url — must be non-null (unless project is a CLI/library with no API).evaluation.health_check — must be non-null (unless project is a CLI/library).evaluation.ui_base_url — must be non-null if the project has a frontend (check stack.frontend is non-null).verification.mode — must be one of docker, local, or stub.verification.dev_bootstrap — must be non-null. This is the command the evaluator uses to start the app.If any required field is null, halt with this message:
BLOCKED: Evaluation pipeline not ready. The following manifest fields are null:
- evaluation.api_base_url (required — set by /architect)
- verification.dev_bootstrap (required — set by /architect)
Run /architect to populate these fields before starting the build loop.
Without these, the evaluator cannot run and all stories will be marked
"done" without verification — defeating the purpose of the ratchet.
Do NOT proceed with the build loop if evaluation readiness fails. Building 50 stories that can never be E2E-tested is wasted effort.
After evaluation readiness passes, attempt to start the app stack and verify it works before writing any feature code:
verification.dev_bootstrap command.evaluation.api_base_url + evaluation.health_check with retry (5 attempts, exponential backoff).verification.dev_teardown, proceed to build loop.BLOCKED: Local dev stack failed health check. Fix infrastructure before building features. Include the error output.This catches infrastructure problems (missing Docker images, broken database configs, port conflicts) before any stories are implemented.
Critical rule: /auto orchestrates but NEVER implements code directly.
/auto is the orchestrator. It reads state, makes decisions, spawns agents, and manages the loop./implement or direct agent spawn)./evaluate or direct agent spawn)./auto never writes application code, tests, or configuration files itself.At the start of EVERY iteration — including the first — read these files in order:
.claude/program.md — Constraints may have changed mid-run. Re-read every iteration. Never cache..claude/state/learned-rules.md — Accumulated project rules. Inject verbatim into ALL agent prompts spawned this iteration.claude-progress.txt — Read the LAST session block (the block after the final === Session marker). Extract: current_group, groups_completed, groups_remaining, last_commit, next_action.features.json — Current pass/fail state for all features. Determines what work remains.specs/stories/dependency-graph.md — Pick the next unfinished group. A group is "unfinished" if any of its stories' features are not passing in features.json. Respect dependency ordering: do not start a group whose upstream dependencies have failing features.If claude-progress.txt indicates a current_group that is not yet complete, resume that group. Otherwise, select the next unfinished group in dependency order.
Run /status to display current project state before resuming the build loop.
Sprint contracts define the verifiable done-criteria for a group. Two-step propose-approve process using generator and evaluator agents.
Spawn generator as a subagent with this prompt:
Read stories [list IDs for this group],
specs/design/api-contracts.md,specs/design/component-map.md,specs/test_artefacts/test-cases.md(for test cases mapped to this group's stories), andspecs/test_artefacts/traceability-matrix.md(for BRD traceability). Propose a sprint contract for group {ID}. Include: setup (test fixtures), api_checks, playwright_checks, design_checks, architecture_checks, teardown, features list. Each check must trace to a test case ID from test-cases.md. Write the contract tosprint-contracts/{group}.json.The
setuparray must create all test data needed for checks to be meaningful (users, seed records, config). Theteardownarray must clean up test data. Without fixtures, admin pages show "access denied" and data pages show empty states — the evaluator sees rendered HTML and marks PASS but verifies nothing.API checks must verify behavioral correctness, not just liveness. A 200 response with
{"error": "Failed to connect"}or{"data": []}when data should exist is a FAIL. Includeexpect.body_containsorexpect.body_schemaassertions for every check.
The generator produces a draft contract based on the story acceptance criteria and the architecture design.
Spawn evaluator as a subagent with this prompt:
Read the proposed sprint contract at
sprint-contracts/{group}.json. Review each check against the story acceptance criteria and API contracts. Add any missing checks. Remove any checks that do not trace to an acceptance criterion. Write the final contract to the same path.
Rules:
Spawn the generator agent to create and manage a Claude Code agent team for the current group.
Before spawning teammates, the generator analyzes the component map:
Produces: / Consumes: in component map)Log the micro-DAG to iteration-log.md.
If no cross-dependencies exist, all teammates spawn in parallel (legacy behavior).
| Phase | Who | Starts When | Must Do |
|---|---|---|---|
| 1 | Teammates with no upstream deps | Immediately | Implement + commit typed interface contracts |
| 2 | Teammates consuming Phase 1 outputs | All Phase 1 teammates complete | Code against committed interface contracts |
| 3 | Integrators for shared files | All Phase 2 teammates complete | Collect declared additions, write to shared files |
Max 5 concurrent teammates per phase. Batch in groups of 5 if more.
Every teammate receives:
specs/stories/story-NNN.md)specs/design/component-map.md).claude/state/learned-rules.md — inject verbatim).claude/skills/code-gen/SKILL.md).claude/skills/code-gen/references/api-integration-patterns.mdIn Solo mode, the generator works alone sequentially. No team spawning, no phases. Read stories in dependency order and implement one at a time.
| Role | Model | Rationale |
|---|---|---|
/auto orchestrator | Opus | Judgment, architectural decisions |
| Evaluator | Opus | Skeptical verification |
| Design critic | Opus | Subjective visual judgment |
| Generator lead | Sonnet | Coordination, lower cost |
| Generator teammates | Sonnet | Mechanical implementation |
| Security reviewer | Sonnet | Pattern matching |
Read execution.model_routing from project-manifest.json at the start of every iteration:
"model_routing": {
"strategy": "cloud-only | hybrid | local-only",
"reasoning_agents": { "model": "...", "provider": "...", "base_url": "..." },
"code_gen_agents": { "model": "...", "provider": "...", "base_url": "..." },
"local_model": { "name": "...", "runtime": "...", "startup_command": "..." }
}
How to apply:
base_url in code_gen_agents using OpenAI-compatible API format.base_url and model name when spawning every agent.When strategy is hybrid or local-only, and the provider is openai-compatible:
curl -s {base_url}/models | jq .local_model.startup_command exists, start it and wait for health.claude-progress.txt session block.Fallback: If local model is unreachable after 3 retries, log a warning and ask the human whether to fall back to cloud or abort.
After the agent team completes, run the ratchet gate. The ratchet is monotonic: progress never regresses.
Every gate produces one of three results:
| State | Meaning | Progression |
|---|---|---|
| PASS | Gate executed and all checks succeeded | Allowed |
| FAIL | Gate executed and one or more checks failed | Blocked — enters self-healing loop |
| NOT_RUN | Gate could not execute due to missing prerequisites | Blocked — halt with actionable error |
Only PASS allows progression. NOT_RUN is treated like FAIL for progression purposes, but with a different response: instead of entering the self-healing loop, halt immediately and report what prerequisite is missing. The self-healing loop cannot fix missing infrastructure or configuration.
A gate returns NOT_RUN when:
project-manifest.json evaluation URLs are nullverification.dev_bootstrap is null or the bootstrap command has not been runLog gate states in claude-progress.txt using the three-state format: gate_5: PASS, gate_5: FAIL (api_check POST /users), or gate_5: NOT_RUN (sprint contract missing).
Twelve sub-gates, mode-dependent:
| Gate | Full | Lean | Solo | Turbo | Condition |
|---|---|---|---|---|---|
| 1. Unit tests (pytest, vitest) | Yes | Yes | Yes | Per commit | Always |
| 2. Lint + types (ruff, mypy, tsc) | Yes | Yes | Yes | Per commit | Always |
| 3. Coverage >= baseline | Yes | Yes | Yes | Per commit | Always |
| 4. Architecture (files exist, schema validation) | Yes | Yes | No | End only | Always |
| 5. Evaluator (API + Playwright + Browser Console) | Yes | Yes | No | End only | Always |
| 6. Code reviewer (static quality + story traceability) | Yes | Yes | No | End only | Always |
| 7. UI standards review (SaaS/enterprise conformance) | Yes | No | No | End only | UI projects |
| 8. Security reviewer (OWASP web + agentic top 10) | Yes | No | No | End only | Always |
| 9. Mutation testing (mutmut/Stryker) | Yes | Yes | No | End only | Always |
| 10. Compliance reviewer (bias, fairness, PII) | Yes | No | No | End only | ML projects only |
| 11. Spec gaming detection | Yes | Yes | Yes | Per commit | Always |
| 12. Smoke launch (real data) | Yes | Yes | Yes | Per commit | Always |
Run after Gate 3 (coverage). Uses mutmut (Python) or Stryker (TypeScript) to inject small bugs and verify tests catch them.
# Python
mutmut run --paths-to-mutate=src/ --tests-dir=tests/ --runner="pytest -x -q"
# TypeScript
npx stryker run
Ratchet the mutation score: if previous score was 72%, new code must maintain or exceed 72%. Read baseline from .claude/state/mutation-baseline.txt (created on first run if missing).
If mutation score drops: FAIL. The generator must add tests that catch the surviving mutants.
Skip if project-manifest.json → ai_native.type is not ml or agentic and compliance.model_card_required is false.
Spawn the compliance-reviewer agent. It checks:
If any BLOCK findings: FAIL. Generator must fix before proceeding.
Run in ALL modes — this is the anti-gaming gate. Check after every commit:
expect(true).toBe(true), assert True, expect(x).toBe(x) patterns. FAIL if found.toBe(42) → toBeTruthy()), WARN.This gate exists because research shows frontier models actively game specifications (METR 2025, Anthropic reward hacking research). It cannot be disabled.
Run in ALL modes after every group. This gate verifies the application actually starts and runs with real production data. It exists because tests using synthetic fixtures routinely pass while the app crashes on launch (see learnings/failure-patterns/common-failures.md, pattern F1).
For web apps (api_base_url set in manifest):
docker compose pscurl -sf {api_base_url}/healthFor CLI apps / libraries / non-web projects (no api_base_url):
Two sub-checks (both required):
12a. Headless smoke launch:
python3 -c "import {module}; print('OK')" or equivalentpython3 -c "
from {module} import MainClass, DataLoader
data = DataLoader.from_file() # REAL file, not test fixture
app = MainClass(data)
for _ in range(100):
app.update()
print('Smoke launch: OK')
"
12b. PTY-based E2E (for interactive CLI apps):
If the app has a terminal UI (curses, prompt_toolkit, rich, etc.), run PTY-based E2E tests that launch the actual app in a pseudo-terminal, send keystrokes, and verify rendered output. See .claude/skills/test-patterns/SKILL.md for the PTY testing pattern.
Required scenarios:
For all project types:
For builds using Opus 4.6+ where the model can sustain coherence across long tasks:
Use when: Model is highly capable AND project is well-specified AND you trust the generator to self-correct. Do NOT use when: External API integrations, complex multi-service architecture, or first time using the harness.
Skip gates 4-8 (architecture, evaluator, code reviewer, UI standards, security) for commits that ONLY contain:
Detection: If git diff --name-only shows only .md files, or if the commit message starts with fix: lint or docs:, skip the evaluator. Gates 1-3 (tests + lint + coverage) always run.
This prevents the expensive evaluator from blocking trivial housekeeping changes.
cd backend && uv run pytest -x -q && cd ..
cd frontend && npm test && cd ..
Both must pass with zero failures. The -x flag stops at first failure for fast feedback.
# Backend
uv run ruff check . && uv run mypy src/
# Frontend
npm run lint && npm run typecheck
All four commands must exit with code 0.
uv run pytest --cov=src --cov-report=term-missing -q | grep "^TOTAL" | awk '{print $NF}'
Compare the result with .claude/state/coverage-baseline.txt. The new coverage percentage must be greater than or equal to the baseline AND >= 80% (hard floor). If it drops below either threshold, the gate FAILS — even if all tests pass.
Coverage policy (ref: "AI is forcing us to write good code" by Steve Krenzel):
Spawn evaluator to verify architecture_checks from the sprint contract:
files_must_exist must be present on disk.specs/design/api-contracts.schema.json if specified.Spawn evaluator with the full sprint contract and --skip-lifecycle (the orchestrator manages the app lifecycle via SECTION 7). The evaluator runs three layers plus browser health monitoring:
Layer 1 — API Checks:
api_checks against the live Docker stack.docker compose logs backend --tail=50 for stack traces.Layer 2 — Playwright Checks:
playwright_checks against the running UI.Layer 2.5 — Browser Console Health (runs during Layer 2): During every Playwright check, the evaluator captures browser telemetry:
console.error entries → FAIL (with full error text and source file:line)expected_errors → FAILconsole.warn entries → WARN (logged, non-blocking)Browser errors are captured using the richest available tool:
browser_console_messages and browser_network_requests provide full console output and network activity without writing test files. browser_take_screenshot captures visual state for UI review.read_console_messages and read_network_requests for real-browser capture with full stack traces.page.on('console') and page.on('pageerror') wired into E2E test files.The evaluator auto-detects which tools are available at the start of each evaluation pass.
Browser errors produce structured failure JSON with layer: "browser_console" or layer: "network" and error_type: "console_error" or "network_error", enabling targeted self-healing (generator fixes the exact file:line from the error).
The evaluator writes its report to specs/reviews/evaluator-report.md.
Spawn code-reviewer agent to run a static analysis pass on all files changed in this group. Checks:
The code-reviewer writes its report to specs/reviews/code-review.md.
Eval validation: If the code-reviewer's rules or learned-rules have changed since the last eval run, auto-run the eval samples (.claude/evals/) to verify the reviewer still catches known violations.
FAIL if any BLOCK-level finding exists (architecture violation, security issue, missing story traceability). WARN for advisory findings (function length, missing docstrings).
Skip in Lean, Solo, and Turbo (per-group) modes.
Spawn ui-standards-reviewer agent for every frontend page in the current group:
calibration-profile.json for project type and UI standards config.claude/skills/evaluate-patterns/references/ui-standards-checklist.md)This is a single pass, not an iterative loop. If checks fail, the fix instructions are sent to the generator via the normal self-healing loop (max 3 attempts). No scoring, no plateau detection, no originality judgment.
Skip in Lean, Solo, and Turbo (per-group) modes.
Spawn security-reviewer agent to scan all code changed in this group for OWASP top 10 vulnerabilities:
FAIL if any critical vulnerability found. Fix instructions go to generator via self-healing.
Execute these steps in order:
git add -A && git commit -m "feat: implement group {group}"passes: true for all features in this group's sprint contract.Run /status to update and display specs/status.md with current project health.
Do not immediately revert. Attempt targeted self-healing first.
Attempt 1-3:
Diagnose: Read the evaluator report (specs/reviews/evaluator-report.md) for specific failure details. Identify the exact check that failed and the error output.
Classify the failure into one of 12 categories:
| Category | Signal | Auto-Fix Strategy |
|---|---|---|
| Lint/format | ruff/eslint error output | ruff check --fix && ruff format |
| Type error | mypy/tsc error with file:line | Fix the type annotation at the specified location |
| Test failure | pytest/vitest assertion error | Fix the production code, NOT the test |
| Import error | ImportError / ModuleNotFoundError | Fix the import path or __init__.py |
| Coverage drop | Coverage % below baseline | Add tests for the specific uncovered lines |
| API check fail | HTTP 500/404/wrong schema | Read docker compose logs backend --tail=50, identify root cause from stack trace, fix service/router |
| Playwright fail | Element not found / assertion error | Read the selector, fix the component |
| Console error | console.error or unhandled rejection during Playwright | Read browser error with source file:line, fix the component (null check, error boundary, loading state) |
| Network error | Frontend fetch returns unexpected 4xx/5xx | Fix the API call URL, error handling, or backend endpoint |
| Infrastructure fail | Container exit code / service won't start / health check timeout | Read docker compose logs or process stderr, fix config or deps |
| Setup fail | Test fixture creation failed (API returned error during setup) | Fix the endpoint or seed script that the setup action calls |
| Behavioral fail | 200 response with error body, empty list when data expected | Fix the feature logic — the endpoint is reachable but not functioning correctly |
| Architecture drift | Schema mismatch / missing file | Read the schema, fix the response or create the file |
| UI standards fail | Conformance check failed | Apply the fix instruction from ui-standards-reviewer (e.g., change color to #767676, add min-height: 44px) |
Spawn generator to apply the targeted fix. The generator prompt must include:
specs/reviews/eval-failures-NNN.json (see evaluator agent for schema).prior_attempts: On attempt 2, include attempt 1's fix description and result. On attempt 3, include both. This prevents the generator from re-trying the same fix.Error type to fix strategy mapping:
| error_type | Strategy |
|---|---|
lint_format | Run auto-fix tools (ruff check --fix, eslint --fix) |
type_error | Fix annotation at file:line from stack trace |
import_error | Check module path, fix import statement |
key_error | Check data shape at source — log incoming data, fix accessor |
timeout | Check if service is started, increase timeout, add retry |
connection_refused | Verify service URL in config, check port mapping |
validation_error | Compare request/response against schema, fix model |
assertion_error | Read test assertion, compare expected vs actual, fix logic |
console_error | Read browser error source file:line, add null check / error boundary / loading state |
network_error | Fix frontend fetch URL or error handling, or fix backend endpoint returning unexpected status |
ui_standards_fail | Apply specific fix from ui-standards-reviewer (color, spacing, touch target, empty state) |
api_transient | Retry evaluator check once (code may be correct, API was flaky). If retry passes, do not count as a self-heal attempt. |
api_permanent | Fix wrapper error handling or request format |
Re-run the failed gate (not all gates — just the one that failed).
3rd failure — hard stop for this group:
git checkout -- ..claude/state/failures.md with group ID, failure category, all three attempt summaries.claude-progress.txt./auto is responsible for starting and stopping the application. The evaluator does NOT manage the app lifecycle.
Read verification.mode from project-manifest.json. Default: docker.
Startup:
verification.dev_bootstrap from project-manifest.json (e.g., docker compose -f docker-compose.dev.yml up -d)evaluation.api_base_url + evaluation.health_checkBetween Groups:
# Re-run bootstrap to pick up new code
{verification.dev_bootstrap}
Wait for health check before handing off to evaluator.
Teardown:
{verification.dev_teardown}
Error Context: docker compose logs --tail=50 {service_name}
Startup:
verification.local.start_commands from manifest.claude/state/process-{name}.logBetween Groups: Kill and restart processes (re-run start commands).
Teardown: Kill all background processes started by the orchestrator.
Error Context: Read from .claude/state/process-{name}.log
Startup:
verification.stub.schema_source from manifestBetween Groups: Regenerate mock server if schema has been amended (check specs/design/amendments/).
Teardown: Kill mock server process.
Error Context: Stub mismatch reports — when a request doesn't match any endpoint in the schema, log the requested path and method.
Stub mode limitations: Layer 1 checks validate request/response shapes but cannot verify business logic. Layer 2 (Playwright) skipped unless a separate frontend URL is configured.
When using --worktree flag, each worktree gets its own app instance:
project-manifest.json)After each agent team completes (before the ratchet gate):
specs/design/amendments/ for new files that were not present at the start of this iteration.api-contracts.md, component-map.md, schema files).git add specs/design/ && git commit -m "refactor: update api-contracts for {change description}"Amendments are a signal that the implementation discovered a design gap. They must be incorporated before evaluation, not deferred.
When an amendment is detected and processed:
specs/brd/changelog.md to get the current version.## v{N} — {date}
- **Change:** {amendment description}
- **Reason:** Auto-detected during implementation — design gap discovered by generator/evaluator
- **Impact:** {affected artifacts}
- **Cascade:** design done | implement in-progress
Read calibration-profile.json for project type and UI standards config. Fall back to SaaS defaults if file does not exist.
The calibration-profile.json file contains a simple feature-flag set (no weighted scoring):
{
"project_type": "saas",
"ui_standards": {
"responsive_required": true,
"mobile_breakpoint": 375,
"desktop_breakpoint": 1280,
"wcag_level": "AA",
"min_touch_target": 44,
"spacing_grid": 8,
"empty_states_required": true,
"error_pages_required": true
}
}
For each frontend page in the current group:
responsive_required) widths using Playwright.ui-standards-reviewer with screenshots + calibration profile..claude/skills/evaluate-patterns/references/ui-standards-checklist.md, filtered by project type.| Aspect | Old (design-critic GAN) | New (ui-standards-reviewer) |
|---|---|---|
| Model | Opus | Sonnet |
| Execution | Multi-iteration GAN loop (up to 10 rounds) | Single pass |
| Scoring | 4 weighted criteria, numeric scores, thresholds | Binary PASS/FAIL per checklist item |
| Originality | Scored and optimized for | Not evaluated |
| Plateau detection | Yes (forced pivots on score stagnation) | N/A |
| Fix handling | Critique sent back for re-scoring | Fix instructions sent to generator via self-healing |
| Cost per group | $8-15 (multiple Opus evaluations) | $0.50-1.50 (single Sonnet pass) |
claude-progress.txt is the memory bridge between context windows. Each iteration appends a new session block.
=== Session {N} ===
date: {ISO 8601}
mode: {full|lean|solo}
groups_completed: [A, B, C]
groups_remaining: [D, E, F]
current_group: D (extraction)
current_stories: [E4-S1, E4-S2]
sprint_contract: sprint-contracts/group-D.json
last_commit: {hash} "{message}"
features_passing: 47 / 203
coverage: 82%
learned_rules: 6
blocked_stories: none
next_action: Run evaluator against group D
next_action is critical. This field tells a fresh context window exactly what to do first. Be specific: "Run evaluator against group D" is good. "Continue" is not.blocked_stories if any stories failed 3 consecutive self-heal attempts. Format: [E4-S3 (import error), E5-S1 (docker fail)].OR logic with priority (check in order):
Hard stop: An architecture violation that self-healing cannot fix, OR the total iteration count exceeds 50. Stop the entire /auto run. Report status and hand off to the user.
Escalate (per-story): A story fails 3 consecutive self-heal iterations. Mark it BLOCKED. Log to failures.md. Extract learned rule. Skip to the next group. Do NOT stop the entire run.
Coverage gate: Coverage drops below the baseline AFTER a successful commit. This overrides the pass — revert the commit (git revert HEAD --no-edit), log the regression, and re-enter self-healing for coverage.
Success: All features in features.json have passes: true AND coverage >= baseline threshold. Print:
=== BUILD COMPLETE ===
Features passing: {N}/{N}
Coverage: {X}%
Groups completed: [list]
Blocked stories: [list or "none"]
Learned rules: {count}
Total iterations: {count}
Then:
docker compose down -v/architect --post-build — fills in verdict, patterns, and recommendations in the stack decision record. Updates integration notes for any external APIs used.README.md for the built application (see below)git add README.md && git commit -m "docs: add README with architecture, setup, and API reference"/context-budget --summary output)After the build completes, generate a README.md that describes the GENERATED APP (not the harness).
Read these files for content:
specs/brd/brd.md — project descriptionspecs/design/architecture.md — system architecturespecs/design/api-contracts.md or api-contracts.schema.json — API surfacespecs/design/component-map.md — module structureproject-manifest.json — tech stackinit.sh — setup stepsdocker-compose.yml (if exists) — services.env.example (if exists) — required environment variablesRequired sections: Project description, Architecture (diagram/layers), Tech Stack (table), Prerequisites, Quick Start (copy-paste commands), API Endpoints (table), Project Structure (directory tree), Running Tests, Environment Variables (table from .env.example), Development notes.
Rules:
/auto, agents, or the GAN loop. This is a developer README for the app..env.example exactly.Learned rules are the harness's long-term memory. They prevent the same mistake from recurring across iterations and context windows.
Extract a new rule when the same error type (by category from SECTION 6) appears 2 or more times in .claude/state/failures.md. Check after every failure entry.
Append to .claude/state/learned-rules.md:
## Rule {N}: {descriptive title}
- **Source:** Group {group}, Story {story}, Iteration {iter}
- **Pattern:** {what went wrong — the repeated error signature}
- **Rule:** {the concrete instruction to prevent recurrence}
- **Applied in:** {list of agents/skills that must follow this rule}
learned-rules.md does not exist yet, create it with a header: # Learned Rules\n\nRules extracted from failure patterns during autonomous build.\nprogram.md each iteration: Constraints can change mid-run (e.g., a human updates program.md while /auto is running). Always re-read at the start of every iteration.git checkout -- . reverts everything. After the 3rd failure, only the current group's files should be reverted. Use the file ownership list from component-map.md to scope the revert: git checkout -- {file1} {file2} ...failures.md for recurring patterns BEFORE spawning the generator. If the same error has appeared before, inject the relevant learned rule into the generator prompt proactively.--post-build to fill in verdict and patterns. Skipping this breaks the cross-project knowledge loop.After all groups pass and the build is complete:
findings_reporting.enabled in manifest, prompt: "Build complete. Report findings to the forge? Run /report-findings to review and submit."specs/brd/changelog.md has entries beyond v1, display: "This build processed {N} requirement changes (v1 → v{M}). See specs/brd/changelog.md for the full history."/status to display the final project dashboard.Guides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub rlpatrao/claude_harness_forge