Skill

harness

Configures a harness. A meta-skill that defines specialized agents and generates the skills they will use. (1) Used when requesting 'build a harness' or 'set up a harness', (2) when requesting 'harness design' or 'harness engineering', (3) when building a harness-based automation system for a new domain/project, (4) when reorganizing or expanding a harness configuration, (5) when requesting operations/maintenance of an existing harness such as 'check harness', 'audit harness', 'harness status', or 'sync agents/skills'.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/harness:harness

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

A meta-skill that configures a harness tailored to the domain/project, defines the roles of each agent, and generates the skills they will use.

Supporting Files

references/agent-design-patterns.mdreferences/orchestrator-template.mdreferences/qa-agent-guide.mdreferences/skill-testing-guide.mdreferences/skill-writing-guide.mdreferences/team-examples.md

SKILL.md

452 lines · ~6.9k tokens(exceeds 5k compaction limit)

Stats

LanguageHTML

Stars0

MaintenanceExcellent

Last CommitJun 6, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Harness — Agent Team & Skill Architect

A meta-skill that configures a harness tailored to the domain/project, defines the roles of each agent, and generates the skills they will use.

Core Principles:

Generate agent definitions (.claude/agents/) and skills (.claude/skills/).
Use Agent Teams as the default execution mode.
Register a harness pointer in CLAUDE.md. — Record only minimal pointers (trigger rules + change history) so that the orchestrator skill triggers in a new session.
A harness is an evolving system, not a static fixture. — Reflect feedback after each run and continuously update agents, skills, and CLAUDE.md.

Workflow

Phase 0: Status Audit

When the harness skill is triggered, the first step is to check the current status of the existing harness.

Read project/.claude/agents/, project/.claude/skills/, and project/CLAUDE.md

Determine execution mode based on the current state:

New Construction: No agent/skill directories or they are empty → Run all phases starting from Phase 1
Existing Expansion: Existing harness is present and there is a request to add a new agent/skill → Run only the required phases according to the Phase Selection Matrix below
Operations/Maintenance: Request for auditing, modifying, or syncing the existing harness → Skip to Phase 7-5 Operations/Maintenance workflow

Phase Selection Matrix for Existing Expansion:

Change Type	Phase 1	Phase 2	Phase 3	Phase 4	Phase 5	Phase 6
Add Agent	Skip (Use Phase 0 results)	Placement decision only	Required (incl. 3-0)	If dedicated skill is needed (incl. 4-0)	Modify orchestrator	Required
Add/Modify Skill	Skip	Skip	Skip	Required (incl. 4-0)	If connection changes	Required
Change Architecture	Skip	Required	Affected agents only (incl. 3-0)	Affected skills only (incl. 4-0)	Required	Required

Cross-reference the existing agent/skill list with the CLAUDE.md records to detect any drift
Summarize the audit results for the user and get confirmation on the execution plan

Phase 1: Domain Analysis

Identify the domain/project from the user's request
Identify core task types (generation, verification, editing, analysis, etc.)
Analyze conflicts/duplications with existing agents/skills based on Phase 0 audit results
Explore the project codebase — understand the tech stack, data models, and key modules
Detect user expertise — identify technical skill levels from dialogue context clues (terminology used, question level) and adjust communication tone accordingly. Do not use terms like "assertion" or "JSON schema" without explanation for less experienced users.

Phase 2: Team Architecture Design

2-1. Select Execution Mode

Agent Teams is the top default priority. Always evaluate Agent Teams first when 2 or more agents collaborate. Team members self-coordinate using direct communication (SendMessage) and a shared task list (TaskCreate), which improves output quality through sharing findings, debating conflicts, and filling gaps.

Mode	When to Use	Characteristics
Agent Teams (Default)	Collaboration of 2+ agents, real-time coordination/feedback exchange needed, mutual reference to intermediate outputs	Self-coordinate via `TeamCreate` + `SendMessage` + `TaskCreate`
Sub-agents (Alternative)	Single agent task, returning results to the main agent is sufficient, team communication overhead is excessive	Direct invocation of the `Agent` tool, parallelized with `run_in_background`
Hybrid	Different characteristics per Phase — e.g., parallel collection (Sub) → consensus-based integration (Team)	Mixed configuration of Team/Sub mode per Phase

Decision Order:

Check if the system can be designed using Agent Teams first — Default for 2 or more agents
Select Sub-agents only when team communication is structurally unnecessary (just returning results) and team overhead outweighs benefits
Consider Hybrid if Phase characteristics vary significantly — specify the execution mode of each Phase in the orchestrator

For detailed comparison and decision trees by pattern, see "Execution Mode" in references/agent-design-patterns.md.

2-2. Select Architecture Pattern

Decompose tasks into specialized areas
Determine agent team structure (for architecture patterns, see references/agent-design-patterns.md)
- Pipeline: Sequential dependent tasks
- Fan-out/Fan-in: Parallel independent tasks
- Expert Pool: Context-dependent selective invocation
- Producer-Reviewer: Generation followed by quality review
- Supervisor: Central agent managing state and dynamic distribution
- Hierarchical Delegation: Top-down recursive delegation

2-3. Agent Separation Criteria

Decide based on four axes: expertise, parallelism, context size, and reusability. For details, see "Agent Separation Criteria" in references/agent-design-patterns.md. Duplication and reuse reviews for existing agents are covered in Phase 3-0.

Phase 3: Agent Definition Generation

3-0. Existing Agent Duplication Review

Before creating a new agent, check for duplication with existing agents in project/.claude/agents/. Re-building harnesses repeatedly can lead to redundant agents accumulating under different names.

See "Agent Reuse Design" in references/agent-design-patterns.md for duplication classification and reuse design.

All agents must be defined as project/.claude/agents/{name}.md files. Direct prompt definitions inside the Agent tool are strictly prohibited. Reasons:

Agent definitions must exist as files to be reusable in future sessions
Team communication protocols must be explicitly stated to guarantee collaboration quality
The core value of a harness is the separation of agents (who) and skills (how)

If using built-in types (general-purpose, Explore, Plan), still create the agent definition file. Specify the built-in type in the subagent_type parameter of the Agent tool, and include roles, principles, and protocols in the agent definition file.

Model Setting: All agents must use model: "opus". Always specify the model: "opus" parameter when invoking the Agent tool. The quality of a harness is directly tied to the reasoning ability of its agents, and opus guarantees the highest quality.

Team Reorganization: Only one agent team can be active per session, but the team can be dismantled and reorganized between Phases. If different specialist combinations are needed per Phase (e.g., in a Pipeline pattern), save intermediate outputs as files, delete the previous team, and create a new one.

Define each agent in project/.claude/agents/{name}.md. Required sections: Core Role, Operating Principles, Input/Output Protocols, Error Handling, Collaboration. In Agent Teams mode, add a ## Team Communication Protocol section to specify message recipients/senders and task request scopes.

See "Agent Definition Structure" in references/agent-design-patterns.md + references/team-examples.md for definition templates and actual files.

Mandatory requirements when including a QA agent:

The QA agent must use the general-purpose type (Explore is read-only and cannot execute verification scripts)
The core of QA is not "checking existence" but "interface cross-comparison" — reading API responses and front-end hooks simultaneously to compare shapes
QA should be run incrementally immediately after each module is completed, rather than a single run after overall completion
See references/qa-agent-guide.md for the detailed guide

Phase 4: Skill Generation

Create the skills to be used by each agent at project/.claude/skills/{name}/SKILL.md. See references/skill-writing-guide.md for a detailed writing guide.

4-0. Existing Skill Duplication Review

Before creating a new skill, check for duplication with existing skills in project/.claude/skills/. Re-building harnesses repeatedly can lead to redundant skills accumulating under different names.

See "Skill Reuse Design" in references/skill-writing-guide.md for duplication classification and generalization patterns.

4-1. Skill Structure

skill-name/
├── SKILL.md (Required)
│   ├── YAML frontmatter (name, description required)
│   └── Markdown body
└── Bundled Resources (Optional)
    ├── scripts/    - Execution code for repetitive/deterministic tasks
    ├── references/ - Reference documents loaded conditionally
    └── assets/     - Files used in output (templates, images, etc.)

4-2. Description Writing — Active Trigger Indication

The description is the skill's sole trigger mechanism. Since Claude tends to be conservative about triggering skills, write descriptions actively ("pushy").

Bad Example: "A skill that processes PDF documents" Good Example: "Performs all PDF tasks including reading PDF files, extracting text/tables, merging, splitting, rotating, watermarking, encrypting/decrypting, and OCR. Must use this skill whenever a .pdf file is mentioned or a PDF output is requested."

Key: Describe both what the skill does and the specific trigger situations, distinguishing it from similar cases that should not trigger it.

4-3. Body Writing Principles

Principle	Description
Explain the Why	Instead of coercive instructions like "ALWAYS/NEVER", explain the reasoning. When LLMs understand the reasons, they make correct decisions even in edge cases.
Keep it Lean	The context window is a shared resource. Aim to keep the SKILL.md body under 500 lines. Delete lightweight details or move them to references/.
Generalize	Explain the underlying principles to cover diverse inputs rather than writing narrow rules tailored only to specific examples. Avoid overfitting.
Bundle Repetitive Code	If agents write identical scripts repeatedly during test runs, bundle them in `scripts/` beforehand.
Use Imperative Style	Use an imperative/directive tone (e.g., "Do this", "Run that" instead of "You can do this").

4-4. Progressive Disclosure

Skills manage context using a 3-tier loading system:

Tier	Loading Timing	Size Target
Metadata (name + description)	Always present in context	~100 words
SKILL.md Body	When the skill triggers	<500 lines
references/	On-demand only	Unlimited (scripts can execute without loading)

Size Management Rules:

If SKILL.md approaches 500 lines, extract details to references/ and leave a pointer in the body explaining "when to read this file"
Reference files over 300 lines must include a Table of Contents (ToC) at the top
If there are domain/framework variations, separate them into domain subdirectories under references/ and load only relevant files

cloud-deploy/
├── SKILL.md (Workflow + Selection Guide)
└── references/
    ├── aws.md    ← Load only when AWS is selected
    ├── gcp.md
    └── azure.md

4-5. Skill-Agent Connection Principles

1 Agent ↔ 1~N Skills (1:1 or 1:many)
Shared skills across multiple agents are allowed
Skills define "how to do it", while agents define "who does it"

See references/skill-writing-guide.md for detailed writing patterns, examples, and data schema standards.

Phase 5: Integration and Orchestration

An orchestrator is a special type of skill that coordinates the overall team by wiring individual agents and skills into a single workflow. While individual skills created in Phase 4 define "what and how each agent does", the orchestrator defines "who collaborates when and in what order". See references/orchestrator-template.md for templates.

Modifying Orchestrator during Existing Expansion: When expanding an existing harness instead of building a new one, modify the existing orchestrator instead of creating a new one. Reflect the new agent in the team composition, task allocation, and data flow, and add trigger keywords related to the new agent in the description.

The orchestrator pattern varies based on the execution mode selected in Phase 2-1:

5-0. Orchestrator Patterns (by Mode)

Agent Teams Pattern (Default): The orchestrator configures the team via TeamCreate and assigns tasks via TaskCreate. Team members coordinate using SendMessage to communicate directly. The leader (orchestrator) monitors progress and aggregates results.

[Orchestrator/Leader]
    ├── TeamCreate(team_name, members)
    ├── TaskCreate(tasks with dependencies)
    ├── Team members self-coordinate (SendMessage)
    ├── Collect and aggregate results
    └── Clean up team

Sub-agents Pattern (Alternative): The orchestrator invokes sub-agents directly using the Agent tool. Parallel execution is managed with run_in_background: true, and results are returned only to the main orchestrator. Used when team communication is unnecessary and overhead reduction is preferred.

[Orchestrator]
    ├── Agent(agent-1, run_in_background=true)
    ├── Agent(agent-2, run_in_background=true)
    ├── Wait and collect results
    └── Generate integrated output

Hybrid Pattern: Mix different modes per Phase. Common combinations:

Parallel collection (Sub) → Consensus integration (Team): Collect independent data in parallel using sub-agents in Phase 2 → Create a team in Phase 3 for discussion and consensus-based integration
Team generation (Team) → Verification (Sub): Team generates drafts in Phase 2 → A single sub-agent conducts independent validation in Phase 3
Team reorganization between Phases: TeamDelete and new TeamCreate for each Phase, with sub-agent invocations in between

If Hybrid is chosen, state the execution mode at the top of each Phase section in the orchestrator (e.g., **Execution Mode:** Agent Teams).

5-1. Data Passing Protocol

Specify data passing methods between agents in the orchestrator:

Strategy	Method	Applicable Mode	Suitable Cases
Message-based	Direct communication via `SendMessage`	Team	Real-time coordination, feedback exchange, lightweight state transfer
Task-based	Share task status via `TaskCreate`/`TaskUpdate`	Team	Progress tracking, dependency management, task requests
File-based	Read/write files at agreed paths	Team + Sub	Large data volumes, structured outputs, audit trails required
Return-based	Return messages from the `Agent` tool	Sub	Orchestrator directly collecting sub-agent results

Recommended Combination (Team mode): Task-based (coordination) + File-based (artifacts) + Message-based (real-time communication) Recommended Combination (Sub mode): Return-based (result collection) + File-based (large artifacts) Hybrid: Apply matching combinations based on the execution mode of each Phase.

Rules for File-based passing:

Create a _workspace/ folder under the working directory to store intermediate outputs
Filename convention: {phase}_{agent}_{artifact}.{ext} (e.g., 01_analyst_requirements.md)
Output only final outputs to the user-specified path, preserving intermediate files in _workspace/ (for post-verification and audit trails)

5-2. Error Handling

Include error handling policies in the orchestrator. Core principles: Retry once; if it fails again, proceed without the result (clarifying the omission in the report); do not delete conflicting data, but write it alongside its source.

See "Error Handling" in references/orchestrator-template.md for a strategy table by error type and implementation details.

5-3. Team Size Guidelines

Task Scale	Recommended Team Size	Tasks per Member
Small scale (5~10 tasks)	2~3 members	3~5 tasks
Medium scale (10~20 tasks)	3~5 members	4~6 tasks
Large scale (20+ tasks)	5~7 members	4~5 tasks

Larger teams increase coordination overhead. A focused team of 3 is better than a distracted team of 5.

5-4. Registering CLAUDE.md Harness Pointer

After completing the harness configuration, register a minimal pointer in the project's CLAUDE.md. Since CLAUDE.md is loaded in every new session, recording the harness's presence and trigger rules is sufficient for the orchestrator skill to handle the rest.

CLAUDE.md Template:

## Harness: {Domain Name}

**Goal:** {One-line core goal of the harness}

**Trigger:** Use the `{orchestrator-skill-name}` skill for any `{domain}`-related task requests. Simple questions can be answered directly.

**Change History:**
| Date | Change Details | Target | Reason |
|:---|:---|:---|:---|
| {YYYY-MM-DD} | Initial Configuration | All | - |

Do not include in CLAUDE.md: Agent lists, skill lists, directory structures, or detailed execution rules. Reason: These are managed in the orchestrator skill, .claude/agents/, and .claude/skills/, so including them in CLAUDE.md creates redundancy. Directory structures can be checked directly in the file system. CLAUDE.md only contains the pointer (trigger rules) + change history.

5-5. Post-Task Support

The orchestrator must handle not only initial execution but also subsequent follow-up tasks. Guarantee the following three aspects:

1. Include follow-up keywords in the orchestrator description: Initial creation keywords alone will not trigger follow-up requests. Make sure to include the following follow-up expressions in the description:

"re-run", "run again", "update", "modify", "improve", "refine"
"only the {sub-task} part of {domain}"
"based on previous results", "improve results"

2. Add context check step in Phase 1 of the orchestrator: Check the existence of existing artifacts at the start of the workflow to determine the execution mode:

_workspace/ exists + user requests partial modifications → Partial Re-run (re-invoke only the affected agent)
_workspace/ exists + user provides new input → New Run (move the existing _workspace to _workspace_prev/)
_workspace/ does not exist → Initial Run

3. Include re-invocation instructions in agent definitions: Explicitly state "actions when prior artifacts exist" in each agent .md file:

Read and reflect improvements if previous result files exist
Modify only the affected parts when user feedback is provided

See "Phase 0: Context Check" in references/orchestrator-template.md for the orchestrator template.

Phase 6: Validation and Testing

Validate the generated harness. See references/skill-testing-guide.md for detailed testing methodologies.

6-1. Structure Verification

Verify that all agent files are in their correct locations
Validate skill frontmatter (name, description required)
Check reference consistency between agents
Ensure no command files were generated

6-2. Mode-Specific Verification

Agent Teams: Check communication paths between members, task dependencies, and team size adequacy
Sub-agents: Check input/output connections for each agent, run_in_background settings, and return-value collection logic
Hybrid: Verify that each Phase's execution mode is specified in the orchestrator, and data passing is not broken at Phase transitions (e.g., when transitioning from Team → Sub, verify that the team's output connects to the sub-agent's input)

6-3. Skill Execution Testing

Perform actual execution tests for each generated skill:

Write Test Prompts — Write 2~3 realistic test prompts for each skill. Use concrete, natural sentences that a real user would input.
With-skill vs Without-skill Comparison — If possible, run executions with and without the skill in parallel to verify the value added by the skill. Spawn two agents:
- With-skill: Performs the task after reading the skill
- Without-skill (Baseline): Performs the same task without the skill
Evaluate Results — Evaluate output quality qualitatively (user review) + quantitatively (assertion-based). Use assertions for objectively verifiable outputs (files created, data extracted, etc.), and rely on user feedback for subjective qualities (writing style, design).
Iterative Improvement Loop — If issues are found during testing:
Bundle Repetitive Patterns — If agents write identical code repeatedly during test runs (e.g., generating the same helper script in every test), bundle that code in scripts/ beforehand.

6-4. Trigger Verification

Verify that each skill's description triggers correctly:

Should-trigger Queries (8~10) — Diverse expressions of the target intent (formal/casual, explicit/implicit)
Should-NOT-trigger Queries (8~10) — "Near-miss" queries with similar keywords where another tool or skill is more suitable

Near-miss Core: Obvious unrelated queries like "write a fibonacci function" have no test value. Focus on ambiguous boundary queries like "extract the charts from this excel file into PNGs" (xlsx skill vs image conversion).

Confirm that there are no trigger conflicts with existing skills at this stage.

6-5. Dry-run Testing

Review if the Phase sequence in the orchestrator skill is logical
Confirm there are no dead links in the data passing path
Check if all agent inputs match outputs from previous Phases
Verify that fallback paths for each error scenario are executable

6-6. Test Scenario Creation

Add a ## Test Scenarios section to the orchestrator skill
Describe at least 1 normal flow + 1 error flow

Phase 7: Harness Evolution

A harness is not a static artifact created once and forgotten. It is a system that continuously evolves based on user feedback.

7-1. Collect Post-Execution Feedback

Request feedback from the user upon completion of each harness execution:

"Are there any parts of the result that need improvement?"
"Would you like to change the agent team composition or workflow?"

Proceed if there is no feedback. Do not force it, but always provide the opportunity.

7-2. Feedback Application Path

Map different feedback types to their target modifications:

Feedback Type	Modification Target	Example
Output Quality	Affected Agent's Skill	"Analysis is too shallow" → Add depth criteria to skill
Agent Roles	Agent Definition `.md`	"Security review is also needed" → Add a new agent
Workflow Sequence	Orchestrator Skill	"Verification should come first" → Change Phase sequence
Team Composition	Orchestrator + Agents	"These two can be merged" → Merge agents
Missing Trigger	Skill Description	"Does not work with this expression" → Expand description

7-3. Change History

Record all modifications in the CLAUDE.md Change History table (identical to the table in Phase 5-4):

**Change History:**
| Date | Change Details | Target | Reason |
|:---|:---|:---|:---|
| 2026-04-05 | Initial Configuration | All | - |
| 2026-04-07 | Add QA Agent | agents/qa.md | Feedback on lack of output validation |
| 2026-04-10 | Add Tone Guide | skills/content-creator | "Too formal" feedback |

Track the direction of the harness's evolution through this history to prevent regressions.

7-4. Evolution Triggers

Propose evolution not only when the user explicitly requests "modify harness," but also when:

The same type of feedback is repeated 2 or more times
Agents repeatedly fail in a specific pattern
The user is observed bypassing the orchestrator to perform tasks manually

7-5. Operations/Maintenance Workflow

Perform audits, modifications, and syncing of existing harnesses systematically. Enter this workflow when branching to "Operations/Maintenance" in Phase 0.

Step 1: Status Audit

Compare the list of files in .claude/agents/ with the agent configuration in the orchestrator skill → Generate a mismatch list
Compare the list of directories in .claude/skills/ with the skill configuration in the orchestrator skill → Generate a mismatch list
Report the audit results to the user

Step 2: Incremental Additions/Modifications

Perform adding, modifying, or deleting of agents/skills based on the user's request
Make changes one at a time, and run Step 3 (Sync) immediately after each change

Step 3: Update CLAUDE.md Change History

Record the date, change details, target, and reason in the Change History table

Step 4: Verify Changes

Verify the structure of modified agents/skills (based on Phase 6-1)
Verify triggers if the changes affect triggers (based on Phase 6-4)
For large changes (changing architecture, adding/deleting 3+ agents), run Phase 6-3 (Execution Test) and Phase 6-5 (Dry-run)
Perform a final check of the alignment between CLAUDE.md and actual files

Output Checklist

Verify upon completion:

References

Harness patterns: references/agent-design-patterns.md
Existing harness examples (with full file contents): references/team-examples.md
Orchestrator templates: references/orchestrator-template.md
Skill Writing Guide: references/skill-writing-guide.md — Writing patterns, examples, and data schema standards
Skill Testing Guide: references/skill-testing-guide.md — Testing/evaluation/iterative improvement methodologies
QA Agent Guide: references/qa-agent-guide.md — Reference when including a QA agent in a build harness. Covers integration coherence verification, boundary bug patterns, and QA agent definition templates. Based on 7 real-world bug cases.

harness

Invocation

Context Preview

Supporting Files

SKILL.md

harness

Invocation

Context Preview

Supporting Files

SKILL.md

Harness — Agent Team & Skill Architect

Workflow

Phase 0: Status Audit

Phase 1: Domain Analysis

Phase 2: Team Architecture Design

2-1. Select Execution Mode

2-2. Select Architecture Pattern

2-3. Agent Separation Criteria

Phase 3: Agent Definition Generation

3-0. Existing Agent Duplication Review

Phase 4: Skill Generation

4-0. Existing Skill Duplication Review

4-1. Skill Structure

4-2. Description Writing — Active Trigger Indication

4-3. Body Writing Principles

4-4. Progressive Disclosure

4-5. Skill-Agent Connection Principles

Phase 5: Integration and Orchestration

5-0. Orchestrator Patterns (by Mode)

5-1. Data Passing Protocol

5-2. Error Handling

5-3. Team Size Guidelines

5-4. Registering CLAUDE.md Harness Pointer

5-5. Post-Task Support

Phase 6: Validation and Testing

6-1. Structure Verification

6-2. Mode-Specific Verification

6-3. Skill Execution Testing

6-4. Trigger Verification

6-5. Dry-run Testing

6-6. Test Scenario Creation

Phase 7: Harness Evolution

7-1. Collect Post-Execution Feedback

7-2. Feedback Application Path

7-3. Change History

7-4. Evolution Triggers

7-5. Operations/Maintenance Workflow

Output Checklist

References

Similar Skills

Harness — Agent Team & Skill Architect

Workflow

Phase 0: Status Audit

Phase 1: Domain Analysis

Phase 2: Team Architecture Design

2-1. Select Execution Mode

2-2. Select Architecture Pattern

2-3. Agent Separation Criteria

Phase 3: Agent Definition Generation

3-0. Existing Agent Duplication Review

Phase 4: Skill Generation

4-0. Existing Skill Duplication Review

4-1. Skill Structure

4-2. Description Writing — Active Trigger Indication

4-3. Body Writing Principles

4-4. Progressive Disclosure

4-5. Skill-Agent Connection Principles

Phase 5: Integration and Orchestration

5-0. Orchestrator Patterns (by Mode)

5-1. Data Passing Protocol

5-2. Error Handling

5-3. Team Size Guidelines

5-4. Registering CLAUDE.md Harness Pointer

5-5. Post-Task Support

Phase 6: Validation and Testing

6-1. Structure Verification

6-2. Mode-Specific Verification

6-3. Skill Execution Testing

6-4. Trigger Verification

6-5. Dry-run Testing