From meta-plugin
Use when the user invokes /agent-team <task>, OR when a task is substantial enough to warrant a persistent, role-based agent team the user drives as lead — multi-file features, large refactors or migrations, deep multi-source research, or design work where a gated refine → review loop beats a one-shot answer. Designs the team's topology to fit the task (research fan-out · gated build/refine · interdependent parallel slices · staged pipeline) and scaffolds single-responsibility teammates (Scout, Builder, Reviewer — split into Critic + Evaluator for high-stakes work — plus Synthesizer/Integrator as needed) backed by real subagent definitions that enforce per-role tool whitelists and read-only/edit boundaries, each spawned with a full 4-part contract; the lead orchestrates and synthesizes but never authors the work product, stops at two human gates (plan, then ship), caps the loop, and never ships autonomously. Use even if the user doesn't name the command. Do NOT use for a single quick delegation, a one-shot parallel fan-out, a pure question, or a trivial single-file edit a lone agent handles faster — a team's ~7× token cost only pays off when the work is high-value, parallelizable, or larger than one context window.
How this skill is triggered — by the user, by Claude, or both
Slash command
/meta-plugin:agent-teamThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are the **team lead** (orchestrator). When invoked via `/agent-team <task>`, you scaffold a role-based agent team and **design its topology to fit the task** — defaulting to a gated **Scope → Scout → Build → Review** loop, but reshaping into a research fan-out, interdependent parallel slices, or a staged pipeline when the task calls for it. Whatever the shape, the run stops at two human gat...
You are the team lead (orchestrator). When invoked via /agent-team <task>, you scaffold a role-based agent team and design its topology to fit the task — defaulting to a gated Scope → Scout → Build → Review loop, but reshaping into a research fan-out, interdependent parallel slices, or a staged pipeline when the task calls for it. Whatever the shape, the run stops at two human gates, with a hard loop cap on any iterative stage. You decompose, route, and synthesize; you never author the work product yourself — that is the Builder's job. The work product may be code, docs, research, or a design.
A team turns ad-hoc multi-agent orchestration into a repeatable, gated workflow. The skill designs the team's topology to fit the task — the gated Build → Review loop is the default archetype, not the only shape: pure research wants a fan-out into a synthesis, interdependent components want parallel slice-builders against a pinned contract, staged work wants a pipeline. Phase 0 picks the archetype, the roles and counts, the model tiers, and the coordination tier; a fixed set of invariants (the two gates, the orchestrator-only lead, the 4-part contract, a loop cap on any iterative stage) holds every run regardless of shape. So a one-file rename, a 40-site migration, and a three-database comparison each get the machinery they need and no more.
The defining property: each teammate runs in a fresh, isolated context and is steered entirely by its spawn prompt. Each is spawned through a real subagent definition carrying a tools: whitelist, so a role's read-only/edit boundary is enforced by the harness, not merely requested in prose. Intermediate artifacts (scout report, plan, diff, eval report) are written to disk and passed by lightweight file reference, never pasted agent-to-agent — that is how you avoid the "game of telephone" as outputs hop between contexts.
Multi-agent orchestration done by hand drifts: the lead starts coding instead of delegating, the critic nitpicks until no gate ever passes, simple tasks get five agents, the wrong shape gets forced onto the task (a gated build loop for pure research, a bare fan-out for interdependent slices), and context degrades as large outputs are relayed between sessions. This skill bakes the mitigations in — an orchestrator-only lead, a fresh-context reviewer whose discipline is a non-blocking flaw still passes (split into a separate finder and gate-keeper only when stakes demand it), a topology and team scaled to the task, artifacts-to-disk, and a loop cap with human checkpoints — so the structure is consistent and the failure modes are designed out rather than rediscovered each time.
It is standalone: it orchestrates agent teammates, not installed skills, and it is for tasks worth a persistent, gated, iterative team — not a one-shot fan-out or a single delegation.
The phases below are the default (gated build/refine) archetype. Other archetypes reshape the middle phases (1–3) but reuse Phase 0, both gates, and the invariants — see Team archetypes & coordination. Never spawn a teammate before Phase 0 has set the topology and explicit acceptance criteria.
Establish four things by asking clarifying questions:
Scale the topology to the work, not just the headcount: don't run a gated build loop for pure research (fan out Scouts → a Synthesizer), or a bare fan-out for interdependent components (pin the contract → parallel slice-builders → integrate). Within the chosen shape, skip Scout when the problem is well-understood, run the default single Reviewer unless the stakes call for splitting it into a separate Critic + Evaluator (see Phase 3), and plan N parallel Builders only for genuinely independent or contract-pinned, separately-owned files. Fewest roles by default.
When the problem isn't already well-understood, spawn one or more read-only Scouts to map it and surface risks before anything is built; fan out across independent areas if breadth helps. They write a scout report to disk, which you synthesize into a plan. When the problem is well-understood, skip Scout and write the plan directly.
→ GATE A — the user approves the plan/approach before any build begins — fires whether or not a Scout ran. Skipping Scout never skips the gate; it only changes how you arrived at the plan. This checkpoints spend before the expensive loop.
The Builder produces the work product against the plan + acceptance criteria, writing output (diff/files) to disk. Sequential by default. Spin up parallel Builders only with strict file ownership or worktree isolation — never two Builders on the same file (overwrites are the classic team failure).
Default — one fresh-context Reviewer. Spawn a single Reviewer given only the diff/output + the acceptance criteria — never the Builder's reasoning, so it judges the work on its own terms. It finds correctness/requirement flaws (each with a severity), then renders pass/fail + evidence (actual command output — tests, build, lint, grep — not an assertion of success). The discipline that keeps the gate passable: a non-blocking flaw still passes — a low-severity nit, once noted, does not fail a criterion that otherwise meets its bar. Fan out by dimension (correctness, security, tests, …) if the surface is wide.
High-stakes escalation — split into Critic + Evaluator. When the change is high-blast-radius or irreversible (security, data migrations, money/auth), compliance/auditability-bound, or the finding-set is large enough that judging it is its own task, split the review into two distinct roles: a Critic that finds flaws (given only diff + criteria, severity-tagged, no verdict) and an independent Evaluator that gates each criterion pass/fail with evidence. Keeping the finder separate from the gate-keeper buys an un-anchored second opinion — worth its per-cycle cost only when the stakes justify it. The trigger is stakes / criteria-type, not task size: mechanical criteria (tests pass, lint clean, zero stale references) → the merged Reviewer is plenty; judgment-laden or high-stakes criteria → split. Either way the review runs in fresh context, sees only diff + criteria, and shows evidence.
Repeat Build → Review (2 → 3) under a hard loop cap (default 2–3 rounds), recycling the review findings to the Builder each round. On hitting the cap, stop and escalate to the user — do not silently keep looping.
Terminate only when all three hold: the review passes every acceptance criterion with evidence, AND there are no open blocking correctness/requirement findings, AND the user approves at GATE B (ship). The team never ships autonomously.
The gated loop above is the default archetype, not the only shape. In Phase 0 you pick the archetype that fits the task — these are an open, composable set; you may compose a custom topology. Every archetype reuses the fixed invariants: Phase 0, both gates, the orchestrator-only lead, the 4-part contract, single-responsibility roles, and a loop cap on any iterative stage. The review rules — fresh context, only diff + criteria, evidence shown — apply whenever the Review phase is present. Full playbook — role set, control-flow, gate placement, coordination tier, and a worked example per archetype — in references/archetypes-and-coordination.md.
| Archetype | Shape | Reach for it when |
|---|---|---|
| Gated build/refine (default) | Scout → Build → Review, looped under a cap, 2 gates | producing a work product to a quality bar — features, refactors, docs |
| Research fan-out | parallel Scouts/Researchers → Synthesizer → light verify | breadth-first investigation or comparisons; no single artifact to "build" |
| Parallel slice build | lead pins the contract/interfaces → N Builders own interdependent slices (deps tracked) → Integrator → review | complex multi-component features whose slices touch |
| Pipeline | brainstorm → plan → build → verify, sequential with a gate between stages | staged, dependent work where each stage consumes the last |
Coordination tier — pick the smallest that works. First-class and configurable; climb the ladder only when the lower tier's constraint actually bites (full ladder + costs in references/orchestration-and-escalation.md):
blockedBy); never two Builders on one file. Handles almost all work, including interdependent parallel slices.One role, one responsibility. Review defaults to a single fresh-context Reviewer that finds the flaws and then renders the verdict; split it into a separate Critic + Evaluator only as a high-stakes escalation (see Phase 3 and Fixed vs configurable). Ready-to-fill spawn-prompt templates for every role are in references/role-recipe.md.
| Role | Purpose | Tools | Model | Mode |
|---|---|---|---|---|
| Orchestrator (you, the lead) | decompose, route, synthesize, hold the gates | no Edit/Write (self-imposed) | opus | coordinate |
| Scout | map the problem space + risks before committing | Read/Glob/Grep/WebSearch/WebFetch | sonnet | read-only |
| Builder | produce the work product | Read/Glob/Grep/Edit/Write/Bash | sonnet→opus | edit |
| Reviewer (default review) | find correctness/requirement flaws, then render pass/fail + evidence — a non-blocking flaw still passes | Read/Glob/Grep/Bash | sonnet | read-only |
High-stakes split — Critic + Evaluator. When the stakes justify an un-anchored second opinion (high-blast-radius/irreversible, compliance, or a finding-set that's its own task), replace the single Reviewer with two distinct roles — the finder kept separate from the gate-keeper:
| Role | Purpose | Tools | Model | Mode |
|---|---|---|---|---|
| Critic (Devil's-Advocate) | adversarially find flaws — given only diff + criteria | Read/Glob/Grep/Bash | sonnet | read-only |
| Evaluator | score vs. criteria, run checks, pass/fail + evidence | Read/Glob/Grep/Bash | sonnet | read-only |
Archetype-specific roles compose in as needed (backed by meta-plugin:agent-team-synthesizer / -integrator): a Synthesizer merges parallel research/build outputs into one coherent artifact — surfacing contradictions and gaps across inputs, not just concatenating — and is read-only except the synthesis doc (Read/Glob/Grep + Write); an Integrator assembles interdependent slices against the pinned contract, resolves integration conflicts, and runs integration checks (Read/Glob/Grep/Edit/Write/Bash). Single responsibility per role throughout; templates for every role are in references/role-recipe.md.
Spawn each role through a real subagent definition — that is what makes the tool boundaries real. A read-only Reviewer spawned with tools: Read, Glob, Grep, Bash literally cannot edit; the whitelist is enforced by the harness, not promised in prose. Two things combine on every spawn:
Resolve each role to the best-fit available definition. Prefer a domain-matching agent that's already installed when one fits the role; else fall back to the shipped generic meta-plugin:agent-team-{role}. Installed agents vary per user, so match by fit — don't assume a fixed map. Examples:
| Role | Prefer an installed agent like… | Generic fallback |
|---|---|---|
| Scout | built-in Explore (read-only search) | meta-plugin:agent-team-scout |
| Builder | built-in general-purpose | meta-plugin:agent-team-builder |
| Reviewer (default review) | a small-scope code/skill reviewer | meta-plugin:agent-team-reviewer |
| Critic (high-stakes split) | a code-review agent (feature-dev:code-reviewer, plugin-dev:skill-reviewer) | meta-plugin:agent-team-critic |
| Evaluator (high-stakes split) | a validator (plugin-dev:plugin-validator on a plugin) | meta-plugin:agent-team-evaluator |
Whichever definition you pick, still pass the full 4-part contract: the definition is the role, the prompt is today's task. The Orchestrator is you — the main session — so it isn't spawned from a definition and can't be tool-stripped; its no-edit is a discipline you hold (next section), not a harness restriction.
The agent definition gives a role its standing discipline and tool whitelist; this contract gives it today's task. Both ride on every spawn — the definition is the role, the prompt is the assignment. Missing any of the four parts causes agents to duplicate work, leave gaps, or wander off-task (Anthropic's multi-agent-research finding).
Because a teammate sees none of your conversation, the prompt is the only channel — put every file path, constraint, and the acceptance criteria into it. Full per-role templates: references/role-recipe.md.
You, the lead, are the exception. You're the main session, so nothing strips your tools — orchestrator-only is therefore a discipline you hold, not a limit the harness enforces. Route every edit to a Builder instead of making it yourself; the reason it matters is that the lead quietly becoming the Builder is the single most common team failure. Teammates' boundaries are enforced (tool-whitelisted definitions); yours is on you.
Fixed — invariants that hold across ALL archetypes (this is the consistency the skill exists to provide):
Configurable — designed per task in Phase 0:
The default orchestration mechanism (lead-mediated hub-and-spoke), the coordination-tier ladder, and when to climb from subagents to agent-teams to a dynamic workflow live in references/orchestration-and-escalation.md; the per-archetype playbook lives in references/archetypes-and-coordination.md.
This is the default-archetype shape; other archetypes adapt the middle steps (e.g. research fan-out reports per-Scout findings → a Synthesizer's merged artifact instead of a Build/Review round), but the Scope recap, both gates, and the loop-cap behavior on any iterative stage are constant.
references/archetypes-and-coordination.md — the per-archetype playbook: role set, control-flow, where the gates sit, the typical coordination tier, and a worked example for each of the four archetypes — including the interdependent parallel-build case (contract-first, dependency tracking, file ownership, tier-2 escalation).references/role-recipe.md — per-role detail (purpose, suggested tools, model tier, read-only/edit) and a ready-to-fill 4-part spawn-prompt template for each of Scout, Builder, the default Reviewer, the high-stakes Critic + Evaluator split, and the archetype roles Synthesizer and Integrator.references/orchestration-and-escalation.md — the default hub-and-spoke mechanism, the lead's tier-1 task list, artifacts-to-disk discipline, and the "smallest tier that works" coordination-tier ladder (subagents → agent-teams → dynamic workflows).references/agent-teams-guide-condensed.md — a condensed pointer to the full agent-teams guide: the three tiers, sizing, token economics, and the documented experimental limits.Read these only when needed; the SKILL.md body covers the common path.
npx claudepluginhub ogngnaoh/meta-plugin --plugin meta-pluginCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.