From dev-tools
Use when user asks to "set up a harness", "initialize agent infrastructure", "bootstrap AGENTS.md", "하네스 초기화", "에이전트가 자꾸 실수해요", "Claude Code 리포지토리 설정", or repo has no AGENTS.md/docs/ structure. Also for "validate harness", "harness audit", "하네스 점검". Repo-scoped — does NOT modify ~/.claude/CLAUDE.md.
How this skill is triggered — by the user, by Claude, or both
Slash command
/dev-tools:harness-initThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Set up complete harness for repo so Claude Code (and other AI agents) do reliable, consistent work. Harness = full environment of scaffolding, constraints, feedback loops, docs surrounding agent.
examples/agents-md-example.mdreferences/agent-teams-onboarding.mdreferences/architecture-template.mdreferences/backlog-template.mdreferences/claudeignore-template.mdreferences/competing-hypotheses-playbook.mdreferences/conventions-template.mdreferences/coordination-patterns.mdreferences/delegation-template.mdreferences/enforcement-template.mdreferences/eval-criteria-template.mdreferences/golden-principles-guide.mdreferences/handoff-template.mdreferences/harness-evolution.mdreferences/harness-invariants.mdreferences/maintenance.mdreferences/maturity-levels.mdreferences/orchestrator-template.mdreferences/power-user-settings.mdreferences/runbook-template.mdSet up complete harness for repo so Claude Code (and other AI agents) do reliable, consistent work. Harness = full environment of scaffolding, constraints, feedback loops, docs surrounding agent.
Three sources inform harness design:
Key insight: if agent struggles, that's harness defect, not agent defect. Fix environment, not prompt.
Simplification principle: Find simplest solution, increase complexity only when needed. Every harness component encodes assumption about what model can't do alone — start minimal, add scaffolding only on concrete failures. Harness built for weakest model slows stronger model.
Trigger conditions are in the frontmatter description above.
Before starting, gather project info. Scan repo first (Step 1: read existing files, git log, package.json). Ask user only for answers that code doesn't reveal (e.g., team size, sprint cadence, external integrations).
Work through steps in order. Each produces concrete artifacts.
Before acting, determine mode AND maturity level.
Default toward orchestration infrastructure. When unsure whether a repo "needs" Step 4b/4c (agents + orchestrator), build them. Empirical failure mode: repos initialized without orchestrator + agents never auto-delegate downstream — the router (Step 7b) has nothing to route to, and the model does everything inline. The override cost of unused role files is ~50 lines on disk; the cost of skipping is a permanently inline workflow that floods the main context. Skip only for genuinely trivial repos (single script, docs-only, one-file library).
Mode selection:
| Condition | Mode | Action |
|---|---|---|
No AGENTS.md, no docs/, no .claude/agents/ | New setup | Run Steps 1–10 |
| Existing harness, user adds agent/skill/area | Extend | Run only affected steps (see matrix below) |
| User asks "harness 점검", "validate", "audit" | Audit | Run scripts/validate-harness.sh, report maturity level, stop |
Maturity assessment (for New setup and Extend mode):
Run scripts/validate-harness.sh against the target repo (if it exists). Classify as Level 1 / 2 / 3 per references/maturity-levels.md. Report the current level and which level to target.
Extend mode — step selection matrix:
| Change type | Steps to run |
|---|---|
| Add agent role | 4b (new role file) → 4c (update orchestrator) → 9 (validate) |
| Add/modify skill | 4c (skill update) → 9 |
| Add new domain with orchestrator | 4b + 4c + 4d → 9 |
| Architecture change | Affected docs → 4b (impacted roles) → 4c → 9 |
Report mode + current maturity level + target level before proceeding.
Before creating anything, understand what exists.
Scan the repo for:
- README.md, CLAUDE.md, AGENTS.md (existing agent config)
- docs/ directory (existing documentation)
- Build/CI config (package.json, Cargo.toml, pom.xml, Makefile, etc.)
- Lint config (.eslintrc, checkstyle, rustfmt, etc.)
- Test infrastructure (test directories, test config)
- Source structure (how code is organized)
- Git history (commit message patterns, branch strategy)
Record findings — these shape every artifact created downstream. If existing AGENTS.md or docs/ exist, read them, decide what to keep vs. replace.
Golden principles = 3-7 invariants that, if violated, cause most damage. Must be:
Read references/golden-principles-guide.md for examples across tech stacks.
Delegation is golden principle candidate. Agents overestimate understanding, skip delegation when "merely recommended." If project uses sub-agents, include delegation discipline principle with objective, measurable triggers — not subjective ones like "unfamiliar module." See "Delegation Discipline" section in references/golden-principles-guide.md.
Ask user: "What rules, if broken, cause most pain in this codebase?" Answer seeds golden principles.
Add the Agent Integrity Principle universally — include it in every project's golden principles (see references/golden-principles-guide.md → "Agent Integrity Principle"). Prevents agents fabricating values not directly observed. Mark unverified values as [unknown — read {source} to verify] instead of guessing.
AGENTS.md is map, not encyclopedia (target ≤100 lines; hard warn at >200 — keeps harness within agent context window). Must fit in agent's context window without crowding actual work.
See examples/agents-md-example.md for complete reference.
Required sections: ## Docs Index, ## Golden Principles, ## Delegation, ## Token Economy, ## Working with Existing Code, ## Language Policy, ## Maintenance. Full structure in examples/agents-md-example.md.
Three embedded blocks mandatory in AGENTS.md — copy verbatim from examples/agents-md-example.md (do not paraphrase): ## Maintenance edit policy, ## Token Economy rules, context-anxiety note.
What NOT to put in AGENTS.md: workflow details, delegation details, evaluation criteria, architecture deep dives, API references. These belong in docs/.
Create these files. Each read on demand, not loaded every session. Each template file is self-describing — read before writing doc.
| File | Purpose | Template |
|---|---|---|
docs/architecture.md | Project structure, layer rules, module boundaries, dependency directions | references/architecture-template.md |
docs/conventions.md | Naming, code style, framework rules agents frequently get wrong (don't duplicate linter) | references/conventions-template.md |
docs/workflows.md | Six standard workflows (plan/code/draft/constrain/sweep/explore) with delegation gates embedded | references/workflows-template.md |
docs/delegation.md | Pattern-selection flowchart, Spawn Prompt Contract, Effort Tier, routing table, per-role model | references/delegation-template.md (+ coordination-patterns.md) |
docs/eval-criteria.md | Generator-Evaluator separation, Sprint Contract, calibration methodology | references/eval-criteria-template.md |
docs/runbook.md | Build/test/deploy commands, failure modes, env setup | references/runbook-template.md |
Non-negotiable for docs/delegation.md: triggers in routing table must be objective and measurable — never subjective conditions ("unfamiliar module") agent can rationalize away.
Complete Step 4a first. If this is a multi-agent project: also complete Step 4b after Step 4a.
Required so scripts/reconcile-harness.py has valid files to operate on. Without these, reconciliation has nothing to process and warns about missing schema.
Create at repo root:
backlog.md — queue of work not yet in flight. Copy minimal template from references/backlog-template.md. Empty sections fine.tasks.md — DO NOT create at init time. Exists only during active sprint. Include template path (references/tasks-template.md) as reference in docs/workflows.md so first sprint starter knows schema.Both files follow Reconciliation Contract documented in references/harness-invariants.md.
Default: create the starter pack (implementer, explorer, qa-verifier, product-evaluator) for any repo with >1 source file. The cost of an unused role file is ~50 lines on disk; the cost of skipping is that the router in Step 7b has nothing to route to, and the agent does everything inline (the user's reported failure mode). Empirically, harness-init that skipped this step produced repos where "auto-delegation never fires" — because there were no delegation targets to fire at.
Skip only if the repo is genuinely trivial: a single-script tool, a docs-only repo, or a one-file library with no meaningful module boundaries. When unsure, create the starter pack — the override cost is low.
Create .claude/agents/{role}.md for each recurring role. Claude Code reuses these for both subagent spawns and Agent Teams teammates — define once, use both ways.
Read references/teammate-role-template.md for full schema and starter pack (implementer, explorer, qa-verifier, product-evaluator).
Team communication protocol: For every role that will participate in TeamCreate-based orchestration, add the ## Team Communication Protocol section (template in references/teammate-role-template.md). This section specifies which agents to receive from/send to, task update calls, and _workspace/ artifact path. Without it, inter-agent coordination degrades to guessing.
Also write references/handoff-template.md-style handoff-{feature}.md schema reference into docs/workflows.md for multi-session work. Handoff files are deferred Spawn Prompt Contracts.
Default: create at least one orchestrator skill for the repo's primary work domain (e.g., code-orchestrator for an app repo, release-orchestrator for a library, review-orchestrator for a docs repo). The threshold is not "≥2 agents collaborating" — it is "the user will repeatedly invoke this kind of work." An orchestrator is the named target that the UserPromptSubmit router (Step 7b) routes prompts to. Without one, Step 7b has nothing to install and the agent has nothing to invoke.
If a domain genuinely needs ≥2 coordinating agents, prefer Template A (team) or C (hybrid). For single-agent domains, use Template B with one sub-agent — it is still worth creating because the orchestrator gives the router a target and the model an explicit "spawn the agent, do not inline" instruction.
Create at:
.claude/skills/{domain}-orchestrator/SKILL.md
Read references/orchestrator-template.md and choose one of:
The orchestrator must include:
_workspace/campaign-state.json; resume from next_entry_point if presentreferences/delegation-template.md → Data Transfer Protocols)_workspace/ naming: {phase:02d}_{agent}_{artifact}.{ext} + campaign-state.jsonreferences/orchestrator-template.md → Task Claim Protocol)After creation, register in CLAUDE.md: add ## Harness: {Domain} pointer block with trigger rule and change history table. Keep CLAUDE.md thin — pointer + history only (no agent list, no skill list).
Directive description mandatory. The skill's description: field is the primary auto-invocation mechanism — Claude reads it on every prompt. Anthropic's skill-creator docs report directive descriptions ("ALWAYS invoke when X — do NOT inline-execute") improved auto-invocation on 5 of 6 public skills vs descriptive phrasing ("Triggers on X"). Use the template in references/orchestrator-template.md → "Description writing rule". Pair with Step 7b's router for highest reliability.
Skip only if: the repo is genuinely trivial (single-script tool, docs-only repo) — the same bar as Step 4b. The historical "single-agent project → skip" rule was wrong: it produced exactly the user-reported failure where init completes but downstream work never delegates. An orchestrator with one sub-agent target is still a net win over no orchestrator.
If Step 4c created an orchestrator, add a _workspace/ section to docs/runbook.md:
## _workspace/ Convention
Intermediate artifacts live under `_workspace/` at repo root.
Naming: `{phase:02d}_{agent}_{artifact}.{ext}`
Preserve across sessions — enables partial re-run without full restart.
Remove only on explicit "reset" request.
This makes the convention discoverable to future sessions and new contributors.
Copy scripts/sweep.sh into target project's tools/ directory, adapt # ADAPT: sections. Read references/sweep-template.md for ecosystem-specific adaptation guidance.
Sweep script performs five checks: lint scan, doc drift, golden principle violations, harness freshness, finding report. Includes periodic load-bearing assessment — stress-testing whether each harness component still compensates for real model limitation. See references/sweep-template.md → "Load-Bearing Assessment".
Trigger policy required — sweep deliberately NOT part of session-start sync loop (too heavy for every session). Pick one, document in docs/runbook.md:
bash tools/sweep.sh between features.claude/settings.json hook with staleness guard (e.g., skip if tools/.sweep-stamp <7 days old)CronCreate scheduleWhichever chosen, record in references/harness-invariants.md → "Sweep Trigger Policy" so future sessions know cadence.
If project has linters, improve error messages for agent consumption:
Before (human-oriented):
ERROR: Line 42 — violation of rule X
After (agent-readable):
ERROR: Line 42 — violation of rule X
FIX: {what to change and how}
REF: {which doc or config file explains this rule}
Each error message becomes micro-instruction telling agent exactly how to fix issue.
Build multi-layer enforcement chain so golden principles are mechanically guaranteed. Read references/enforcement-template.md for detailed templates per layer.
Four layers (defense in depth):
.claude/settings.json) — Catch violations at edit timeTwo Layer 1 extensions (add when appropriate):
references/enforcement-template.md → "Circuit Breaker".references/enforcement-template.md → "Consent Gates".Match enforcement depth to maturity level target: Level 1 → no hooks required; Level 2 → Layer 1 + 3; Level 3 → all layers + circuit breaker. Read references/enforcement-template.md for templates and Agent Teams hook wiring.
Why this step exists. The AGENTS.md delegation table and orchestrator skill descriptions only fire if Claude voluntarily reads them and chooses to delegate. Official Anthropic docs say auto-invocation is description-driven; field testing (Scott Spence, "Claude Code Skills Don't Auto-Activate (a workaround)", 2025-11-06) reports it works ~50% of the time even with well-written descriptions. The result: init produces a beautiful delegation harness, then the agent does the work inline anyway.
Fix. Two mechanical mechanisms that make delegation non-optional:
UserPromptSubmit trigger router — pattern-matches each prompt, emits an explicit Use Skill(X) or Spawn Agent(subagent_type=X) instruction when a registered phrase matches. Explicit instructions outperform description discovery. Read references/trigger-router-template.md and install:
.claude/hooks/trigger-router.sh.claude/trigger-routes.json (one route per orchestrator skill + per high-leverage agent role)UserPromptSubmit hook to .claude/settings.jsonPreToolUse delegation gate (critical-path repos only) — blocks Edit|Write on critical paths (auth/billing/migrations) unless a delegation evidence file exists in _workspace/. Forces the orchestrator/explorer to run first. Read references/enforcement-template.md → "Delegation Gate (Layer 1 Extension)" and install:
.claude/hooks/delegation-gate.shPreToolUse matcher to .claude/settings.jsonDefault: install — matches the Step 4b/4c default-on policy. With those two defaults on, the router has targets to fire at; skipping 7b leaves the routing surface empty.
Skip only when both Step 4b and Step 4c were skipped (i.e., truly trivial single-script / docs-only / one-file library repos). For everything else, install — even with one orchestrator and one agent role, the router earns its keep.
Validation: test each route after creation:
echo '{"prompt": "<sample trigger phrase>", "session_id": "test"}' | bash .claude/hooks/trigger-router.sh
# Expected: "INSTRUCTION (auto-delegation router): Use Skill(...) ..."
This is the single highest-leverage step for projects where orchestrators exist but are not being invoked. Without it, the rest of the harness is documentation; with it, delegation has a mechanical discovery path (router) and a mechanical block (gate).
Three items at repo root. All mechanical wins — "create once, benefits every session."
CLAUDE.md (pointer)@AGENTS.md
Keeps loading chain clean: Claude loads CLAUDE.md → AGENTS.md (map) → docs/ (on demand). If drifts, repair with scripts/sync-claude-md.sh.
.claudeignore (scan exclusions)Prevents token burn on vendored deps, build outputs, generated artifacts. Compose from references/claudeignore-template.md (Common + language sections) based on Step 1 stack analysis.
.agents/skills symlinkTooling looks up project-local skills via .agents/ while files live under .claude/skills/. Create once at init; repair with scripts/symlink-guard.sh if broken:
bash ${CLAUDE_PLUGIN_ROOT}/skills/harness-init/scripts/symlink-guard.sh
If skills repo is not cloned or CLAUDE_PLUGIN_ROOT is unset, run directly:
mkdir -p .agents && ln -s ../.claude/skills .agents/skills
Accepted forms (POSIX symlink or Windows text-file fallback) documented in references/harness-invariants.md → File Layout Invariants.
If Step 4c created a team-mode orchestrator, complete Agent Teams setup:
Read references/agent-teams-onboarding.md for tooling prerequisites and environment check.
Add adversarial debugging playbook as on-demand workflow: references/competing-hypotheses-playbook.md. Maps to debate workflow in docs/workflows.md.
Skip entirely if: Step 4c chose Template B (sub-agent only). Agent Teams carries 3–5× token cost — don't enable it without an orchestrator that uses TeamCreate.
Token cost note: Team mode is not free. The orchestrator template enforces the decision gate (Q2 in Pattern Selection) so teams only activate when mid-flight coordination genuinely pays off.
Run scripts/validate-harness.sh against target project to verify all artifacts complete and consistent. If validation exits non-zero, halt immediately. Show the full validation report to the user. Do NOT auto-fix — user must review and decide. Re-run validation after user addresses issues.
Script checks:
AGENTS.md, CLAUDE.md, docs/*, backlog.md)references/harness-invariants.md)CLAUDE.md is exactly @AGENTS.md.agents/skills points to ../.claude/skillsbacklog.md schema (checkbox items under ## headings)## Maintenance section contains edit-policy rulesClean validate run at Level 2+ means enforcement is active and drift is mechanically prevented.
Manual checklist for items script cannot verify:
docs/ files don't duplicate each otherdocs/runbook.mdAfter setup, show the user all four of the following — this is the exit criterion for Step 10:
bash tools/sweep.sh).## Maintenance rules embedded in AGENTS.md; emphasize: only add when all 4 conditions are met.After setup, Level 3 enforcement mechanically prevents drift (hooks + CI). Unexpected violations after a clean init → treat as enforcement gap, not operator error — trace to the missing hook or CI check and fix the template.
After a harness is in use, it should evolve based on feedback. Trigger evolution when:
Read references/harness-evolution.md for feedback → fix target mapping and change history protocol. Record every change in the orchestrator's CLAUDE.md pointer block.
With Level 3 enforcement active, no manual sync routine is needed — hooks and CI prevent drift mechanically.
Regular actions:
| When | Action |
|---|---|
| Periodically or on "harness 점검" | bash scripts/validate-harness.sh — check maturity level |
| Sprint tasks complete | python scripts/reconcile-harness.py — sync tasks.md → backlog.md |
| Feedback from harness usage | Read references/harness-evolution.md |
Scripts (utilities, run from repo root):
| Script | Purpose |
|---|---|
scripts/validate-harness.sh | Full structural validation + maturity level report |
scripts/reconcile-harness.py | Sync completed tasks.md items into backlog.md |
scripts/sweep.sh | Archive stale content (copy and adapt per project in Step 5) |
scripts/sync-claude-md.sh | Repair CLAUDE.md → @AGENTS.md (if manually broken) |
scripts/symlink-guard.sh | Repair .agents/skills symlink (if manually broken) |
scripts/check-context-size.sh | Warn if AGENTS.md > 200 lines |
The last three scripts are repair tools, not routine ops. At Level 3, they should rarely be needed. The SessionStart hook (dev-tools:harness-maintenance) runs sync-claude-md (CLAUDE.md pointer check), symlink-guard (.agents/skills symlink check), and check-context-size (AGENTS.md size check) daily as a lightweight safety net; at Level 3 it should always be silent.
All references/*.md files cited inline at point of use — consult there. Files optional / surfaced on request:
references/orchestrator-template.md — 3-mode orchestrator templates (team/sub-agent/hybrid), _workspace/ convention, CLAUDE.md pointer block, directive-description rule. Read at Step 4c.references/trigger-router-template.md — UserPromptSubmit hook that maps prompt phrases → explicit Use Skill(X) / Spawn Agent(X) instructions. Lifts skill auto-invocation from the ~50% baseline (Scott Spence 2025-11-06) toward deterministic on matched prompts. Read at Step 7b.references/harness-evolution.md — Feedback-driven evolution: signal → fix target mapping, change history protocol. Read when harness needs evolution.references/maturity-levels.md — 3-level progression (Basic/Verified/Enforced), checklist per level, upgrade path. Read at Step 0 for existing repos.references/power-user-settings.md — Optional env vars (AUTOCOMPACT threshold, extended thinking) and output-style customization. Informational; surface to user after Step 10 if asked.examples/agents-md-example.md — Complete AGENTS.md for Next.js SaaS project with all three mandatory embedded blocksProvides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub kadragon/agent-toolkit --plugin dev-tools