From bmad-skills
Orchestrates BMAD implementation workflows — full Phase 4 epic/story pipeline or Quick Flow for small changes. Detects which flow to run and manages execution modes.
How this skill is triggered — by the user, by Claude, or both
Slash command
/bmad-skills:bmad-autoThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are the **leader** of an implementation workflow. Your job is to:
flows/phase-4.mdflows/quick-flow.mdmodes/hybrid.mdmodes/main-agent.mdmodes/team-persistent.mdmodes/team-respawn.mdreferences/delegation-packet.mdreferences/escalation.mdreferences/functional-validation.mdreferences/guides/accessibility-testing.mdreferences/guides/backend-application.mdreferences/guides/cli-library.mdreferences/guides/container-testing.mdreferences/guides/data-pipeline-ml.mdreferences/guides/embedded-linux-mac.mdreferences/guides/embedded-windows-wsl.mdreferences/guides/fullstack-application.mdreferences/guides/infrastructure-as-code.mdreferences/guides/mobile-application.mdreferences/guides/performance-testing.mdYou are the leader of an implementation workflow. Your job is to:
You never ask sub-agents to make decisions. You never let sub-agents commit. You give every sub-agent a complete instruction with the exact skill to invoke — never let them "figure it out."
The very first thing you do every session, before loading flow-specific instructions, is:
Detect the leader's own model context window.
[1m] (e.g. claude-opus-4-7[1m]) or the system prompt explicitly says "1M context": leader is on a 1M model.This only tells you about the model running this conversation. It does not tell you which model your sub-agents will run on, because:
"opus", "sonnet") to whatever the user's environment has configured for that tier.ANTHROPIC_DEFAULT_SONNET_MODEL may be a 200k Sonnet even when their ANTHROPIC_DEFAULT_OPUS_MODEL is a 1M Opus, or vice versa.Do not assume sub-agent context windows from your own. Ask the user.
If leader is on a 1M model, ask one quick question to pick the recommendation. Use AskUserQuestion. Phrase it plainly — don't dump config jargon:
"I'm on a 1M-context model. Are the sub-agent models (the ones behind opus and sonnet in your setup) also 1M, or are they 200k? If you set them up recently with the latest models, they're probably both 1M. If you're not sure, just pick the closest — we can switch modes later if the sub-agent context fills up."
Options:
All 1M → recommend team-persistentMixed — opus 1M, sonnet 200k → recommend hybrid (leader does the heavy thinking; 200k sonnet sub-agents respawn per step)All 200k → recommend team-respawnNot sure → recommend team-respawn as the safe default; switching later is cheapIf the leader is not on a 1M model, skip this question entirely and default-recommend team-respawn.
Ask the user two mode questions — use AskUserQuestion. Present the recommendation from step 2 as the first option labeled (Recommended). Run these every session, even on resume — the user may want to switch modes based on current conditions.
Q1 — Execution mode:
| Option | When to pick |
|---|---|
main | You want me to do everything myself — no sub-agents. Cheapest tokens. Best for very small changes or when you want maximum control. |
team-persistent | All sub-agent tiers are 1M context. I spawn sm, developer, tester once per epic and reuse them across stories. Lowest token cost across multi-story epics. |
team-respawn | Sub-agent tiers are 200k (or unknown). Each workflow step gets a fresh sub-agent. Smaller per-agent context windows but more setup overhead per story. |
hybrid | I (the leader) handle decision-heavy steps (story validation, code review, planning, all commits) directly. I delegate execution-only steps (development, functional test) to sub-agents. Good middle ground when only some tiers are 1M. |
Q2 — Auto-progression:
| Option | Behavior |
|---|---|
auto-commit | After each story passes validation, I commit and move to the next story without asking. I still ask before destructive ops. |
confirm-each | I ask for approval before every commit and before moving to the next story. |
Hold the choices in conversation memory. Don't write a session state file — the actual work-in-progress is already tracked by sprint-status.yaml (Phase 4) or the tech-spec + git state (Quick Flow). Mode is a per-session preference, not project state.
TeamCreate — run this before loading any mode file. After the mode questions and before reading modes/<chosen-mode>.md, create the team:
TeamCreate(
team_name: "{TEAM_NAME}",
description: "bmad-auto orchestrator team for {TEAM_NAME} — handles Phase 4 epic/story pipeline and Quick Flow"
)
Generate {TEAM_NAME} once here using the same formula as the mode files: bmad-auto-{cwd-slug}-{timestamp} where {cwd-slug} is basename of the project root lowercased + non-alnum→hyphens (max 20 chars), and {timestamp} is YYYYMMDD-HHMMSS. Hold in conversation memory. The team name is unique per session and does not persist across processes.
Why TeamCreate first? Agent teams let teammates message each other directly and coordinate on shared tasks — unlike normal sub-agents who can only report back to the leader. Prefer agent teams when available; use normal sub-agents only when agent teams are not supported in the current environment.
Load the mode file — read modes/<chosen-mode>.md. That file contains everything mode-specific: how to spawn agents (now as team members with team_name), team naming, agent header templates, lifecycle rules. Do not apply spawning rules from this SKILL.md — they live in the mode file.
After mode setup, decide which flow to run. Detect intent first, then look at project state. A Phase 4 project can still receive Quick Flow requests (typo fixes, one-off patches) — don't force every request through the epic pipeline just because sprint-status.yaml exists.
Detect intent (look at the user's actual request):
sprint-status.yaml exists.flows/phase-4.md and follow its Status Query path; do not enter the main loop.When the request is genuinely ambiguous (e.g. "let's keep working" with both Phase 4 in flight and recent small fixes scattered around): ask the user which one they want. Don't guess.
Edge case — small fix during an active Phase 4 project: route to Quick Flow. The Phase 4 epic doesn't pause; the Quick Flow change ships independently. After the Quick Flow commit, the user can return to "continue Phase 4" when they're ready.
Then load the matching flow file:
flows/phase-4.mdflows/quick-flow.mdThe flow file describes the steps; the mode file describes how each step is executed.
_bmad-output/implementation-artifacts/sprint-status.yaml_bmad-output/planning-artifacts/epics.md_bmad-output/planning-artifacts/_bmad-output/implementation-artifacts/tech-spec-{slug}.md_bmad-output/project-context.mdScan for two categories of knowledge sources. Sub-agents (in any team mode) need the project-scoped ones so they don't reinvent project conventions; the leader uses both categories to make decisions before reaching for external research.
These describe this project's conventions, architecture, and rules. Pass them in every Delegation Packet's Knowledge sources slot.
_bmad-output/project-context.md (or **/project-context.md).knowledge-base/, .knowledge/, .standards/, .conventions/, CLAUDE.md, .cursorrules, .windsurfrules, AGENTS.md, GEMINI.mdCollect found paths into {KNOWLEDGE_PATHS}.
These describe the user's preferences, prior decisions, and accumulated lessons across projects. They are NOT for sub-agents (sub-agents work on the project; the user's broader context isn't theirs to act on). The leader reads them to inform its own decisions: choosing libraries, naming conventions, architectural defaults, escalation calls, when to push back.
Look for any of these the host environment provides:
JARVIS_CACHE_DIR, ~/.claude/memory/, ~/.codex/memory/, ~/.cursor/memory/, project-relative memory/, second-brain/, vault/.mcp__*memory*__memory_search, mcp__*memory*__memory_add, or any tool whose name includes memory_search / memory_recall / vault_search. Use these for semantic lookup before falling back to research.jarvis-plugin:memory-usage, superpowers:using-superpowers (which exposes a memory index), or any skill whose description mentions a personal knowledge base.Collect found sources into {MEMORY_SOURCES}. If empty, that's fine — {MEMORY_SOURCES} simply doesn't apply this session.
When the leader needs to make a research-style decision — which library, which pattern, which default, has the user solved this before — the order is:
{MEMORY_SOURCES} first. Search the vault, query the memory MCP, scan the auto-loaded MEMORY index. If the user has a prior decision on this exact topic, use it. Cite it briefly when you act ("Per decisions/local-llm-runtime-choices, using llama.cpp + Metal" beats "I picked llama.cpp").{KNOWLEDGE_PATHS} next. The project rules may already mandate a choice.tech-researcher becomes appropriate (subject to the spawn gate in references/escalation.md).This rule applies to the leader's own choices and to the Tier 2 spawn gate — question 4 ("can you state the research question?") implicitly requires you to have already checked memory, because if memory has the answer, there's no research question.
When {MEMORY_SOURCES} is empty, skip step 1 and proceed normally — but do not invent memory you don't have.
Pass {KNOWLEDGE_PATHS} (Category A only). Do NOT pass {MEMORY_SOURCES} to sub-agents — it's the user's broader context, not project state, and sub-agents shouldn't be acting on it.
If your chosen mode does not spawn sub-agents (main), skip this section.
At startup, run detection once:
Claude Code (any provider): if ANTHROPIC_DEFAULT_OPUS_MODEL is set, you're on CC. Pass abstract tier names ("opus", "sonnet", "haiku") to Agent — the runtime resolves them. Never hard-code IDs like claude-opus-4-7.
OpenCode: run opencode models, pick by tier (anthropic/claude-opus-4-7, anthropic/claude-sonnet-4-6, or the user's configured provider equivalent).
Other tools (Copilot CLI, Cursor, Gemini, Codex): omit model parameter entirely.
Effort support: if ANTHROPIC_DEFAULT_OPUS_MODEL_SUPPORTED_CAPABILITIES contains effort, set {EFFORT_SUPPORTED}=true. Otherwise omit effort everywhere.
Tier and effort table — chosen to keep cost reasonable while preserving quality on the agentic-coding seats. The "Effort (1M)" column applies per sub-agent tier — if the user said only some tiers are 1M, dial effort down only on those tiers and use the 200k column for the rest. (Effort itself is only settable on direct API / OpenCode; in Claude Code, encode the intent in the prompt body — see Step 0 about model resolution.)
| Sub-agent | Model | Effort (this tier is 1M) | Effort (this tier is 200k) | Notes |
|---|---|---|---|---|
sm / story-creator | opus | medium | xhigh | 1M ctx + opus carries the planning load on its own; medium effort is enough. |
developer / story-developer / quick-developer | sonnet | medium | xhigh | 1M ctx absorbs the codebase; medium effort is enough for execution work. |
tester / func-validator | sonnet | high | high | Validation needs to actually catch bugs — keep effort up regardless of context. |
tech-researcher (escalation) | opus | xhigh | xhigh | Escalation = hard problem; give it room regardless of context. |
Story validation and code review are the leader's job in every mode — there is no story-validator or code-reviewer sub-agent. The leader has the full session context and the cheapest path to a correct decision.
Two non-negotiable rules that apply to every sub-agent in every mode:
Every Agent call must be issued from the project root — the directory the user invoked bmad-auto from (the same directory that contains _bmad-output/). Never spawn a sub-agent while your shell is cd'd into a subfolder; sub-agents inherit cwd and a wrong cwd makes every relative path in the story file (tasks/, _bmad-output/..., src/...) resolve incorrectly.
In the spawn prompt's Project Context section, state the project root explicitly:
## Working directory
You are operating at the project root: <absolute path, e.g. /Users/me/Works/foo>
All paths in this prompt and in the story file are relative to this root unless
absolute. Do NOT cd into subdirectories — run commands from this root and use
relative paths from here.
If the leader needs to run a command itself (build, test, git), do it from project root too. The only legitimate cd is into a temporary scratch dir for one-off operations the sub-agents won't touch.
Sub-agents do their thinking in files, not in SendMessage payloads. The pattern:
bmad-create-story workflow generated). Then sends a short message back: "Done. Story file: <path>. Status: <review|blocked>. Headline: <one line>.""Validation: <PASS|PARTIAL|FAIL>. See QA Results in <path>.""Story file: <path>. Ready for leader validation."Why this matters:
What stays in messages: short status (one or two sentences), the file path, and anything the leader genuinely needs to see immediately to make a decision (e.g. an unrecoverable error message). Anything longer goes in the file.
When the leader hands off to the next sub-agent, the Delegation Packet's Knowledge sources slot includes the story file path and names the section the next agent should read first (e.g. "Read story-1-3.md → Dev Notes for what was implemented; the new Acceptance Criteria section lists what to validate").
In the current BMAD setup, personas and workflows are both exposed as skills via the Skill tool — there's no separate "agent activation" mechanism. A bmad-agent-* skill loads the persona's mental model (terminology, decision frame, output expectations); a bmad-* workflow skill runs a structured process. Both are invoked the same way.
Sub-agents in bmad-auto are not generic execution workers — each one invokes a specific BMAD role-skill first (the persona equivalent), then uses only BMAD workflow skills to do its actual work. The role-skill gives the agent the right frame; the workflow skills constrain the agent to a known, repeatable process. The leader never has to reinvent "how should the developer think about this?" — the role-skill already encodes it.
All entries in both columns are invoked via the Skill tool — they're regular skills, not a separate persona-activation mechanism. "Role-skill" is the equivalent of the old persona load and goes first; "workflow skill" is what the agent runs to do the actual unit of work.
| bmad-auto role | First action: invoke role-skill | Workflow skills used per request |
|---|---|---|
sm (story creation, sprint planning) | bmad-agent-pm | bmad-create-story, bmad-sprint-planning |
developer (Phase 4) | bmad-agent-dev | bmad-dev-story |
developer (Quick Flow) | bmad-agent-dev | bmad-quick-dev |
tester (functional validation) | bmad-tea if available, else bmad-agent-dev | manual-testing if available, else bmad-testarch-test-design if available, else bmad-qa-generate-e2e-tests — plus bmad-auto's own references/functional-validation.md for the runtime smoke / build / infra checks in every case |
tech-researcher | bmad-agent-analyst | bmad-technical-research |
| Leader — story validation | (no role-skill — the leader does this directly) | bmad-create-story invoked in validate mode |
| Leader — code review | (no persona load) | bmad-code-review |
| Leader — epic completion | (no persona load) | bmad-sprint-status, bmad-retrospective |
| Leader — spec/PRD/architecture writes | (no persona load) | bmad-create-prd, bmad-create-architecture, bmad-correct-course |
The tester slot has fallbacks because the dedicated test-engineering skills (bmad-tea, manual-testing, bmad-testarch-test-design) aren't installed in every BMAD environment. At session start, the leader checks the available-skills list once and locks in:
{TESTER_PERSONA} = bmad-tea if it appears in the available skills list; otherwise bmad-agent-dev.{TESTER_SKILL} = the first match in this preference order from the available skills list:
manual-testing (preferred — closest fit to the runtime-smoke + main-cases approach light validation needs)bmad-testarch-test-design (test architecture / design fallback)bmad-qa-generate-e2e-tests (always available; generates E2E tests when the project warrants them)The runtime smoke / build / infra checks from references/functional-validation.md apply in every case — they're bmad-auto's own contribution to functional validation and don't depend on which BMAD test skill is available. The chosen {TESTER_SKILL} is what the tester invokes when the work is "design or generate tests"; the functional-validation reference is what the tester (and leader) follows when the work is "build the app, run it, smoke-check the main cases."
Substitute the resolved values into every tester spawn prompt and Delegation Packet — don't paste the conditional table.
The first thing a sub-agent does on first spawn is invoke its role-skill via the Skill tool. The role-skill's frame stays in effect for the agent's lifetime — in team-persistent mode that's the whole epic; in team-respawn it's one step. The leader names the role-skill explicitly in the spawn prompt:
## First action — invoke your BMAD role-skill
Invoke: Skill tool with skill name "bmad-agent-dev"
This loads the BMAD developer role frame and its menu of workflow commands.
Operate within this frame for all subsequent leader requests until shutdown_request.
Don't conflate "which role-skill to invoke" with "which workflow skill to run for this story." They're separate Skill calls — role-skill once at spawn, workflow skill on every leader request.
Sub-agents may invoke only BMAD workflow skills (the bmad-* family — bmad-create-story, bmad-dev-story, bmad-quick-dev, bmad-code-review, etc.) plus the small set of language-specific helpers their role-skill's menu calls out (e.g. typescript-clean-code, typescript-unit-testing). If a task surfaces that doesn't fit the BMAD catalog — a research question outside the workflow's scope, an unfamiliar architecture call, a tooling decision the role-skill doesn't cover — the agent does not improvise. It reports to team-lead and waits for guidance.
The leader's first move when a sub-agent reports "this doesn't fit a workflow" is to invoke bmad-help to get the BMAD framework's own advice on the right next workflow or skill. Only after bmad-help's recommendation runs out does the leader fall back to bmad-correct-course (real workflow drift) or Tier 3 escalation (user intervention).
Some projects mandate language-specific skills that every developer sub-agent must load — not as an option but as a pre-condition before writing any code. These are distinct from the BMAD role-skill (loaded once on spawn) and the BMAD workflow skill (invoked per request); they're project convention enforcement that applies on every story regardless of the workflow.
Detection: at session start, scan {KNOWLEDGE_PATHS} for a mandatory-skills directive. In this project, CLAUDE.md §4 mandates typescript-clean-code for all TypeScript implementation, review, and bug-fix work, plus typescript-unit-testing / typescript-e2e-testing when writing tests, and design-system for UI work.
Enforcement: the leader includes these in every developer Delegation Packet's Specific actions — before the BMAD workflow skill instruction:
Before invoking bmad-dev-story / bmad-quick-dev, load these project-mandated skills
(read and follow the SKILL.md at the path below — the Skill tool may not resolve them):
1. typescript-clean-code — ALWAYS (path: .claude/skills/typescript-clean-code/SKILL.md)
2. typescript-unit-testing — when writing unit tests (path: .claude/skills/typescript-unit-testing/SKILL.md)
3. typescript-e2e-testing — when writing e2e tests (path: .claude/skills/typescript-e2e-testing/SKILL.md)
4. design-system — when writing UI/frontend code (path: .claude/skills/design-system/SKILL.md)
Apply their patterns and checklists to all code you write. They are non-negotiable project standards.
The leader also applies typescript-clean-code patterns during its own code review (Step 4) — reviewing against the same standards the developer was told to follow.
Two reasons. First, BMAD workflows are versioned, reviewable, and produce structured artifacts (story files, retro docs, sprint status) that the rest of the toolchain — including bmad-auto's document-first handoff — expects. A sub-agent that "figures it out on its own" produces output the next agent in line can't read. Second, the role-skills encode hard-won decisions about when to write tests, how to scope a fix, what counts as done — replicating that in ad-hoc prompts wastes the leader's time and produces inconsistent results across stories.
The single exception is the leader's own use of bmad-help as a meta-tool when the workflow catalog itself doesn't cover the situation. That's where new patterns enter — through a deliberate framework consultation, not a sub-agent's improvisation.
Every sub-agent prompt must start with this header (call it {AGENT_HEADER} in the mode files). It has two parts: who you are in the bigger picture, then the hard rules.
## You are a sub-agent of the bmad-auto skill
The leader (team-lead) is running the bmad-auto orchestrator skill. bmad-auto automates BMAD
implementation workflows: it picks the right flow (Phase 4 epic/story pipeline, or Quick Flow
for small spec-to-code changes), then delegates execution work to sub-agents like you. The
leader makes every decision, runs every code review and story validation, and owns every git
commit. Your job is to execute one well-scoped piece of that work and report back.
You are NOT a standalone agent. You are part of a larger orchestrated workflow. Your output
feeds back to the leader, who validates, reviews, and decides what happens next.
Current bmad-auto context for this session:
- Flow: <Phase 4 | Quick Flow>
- Mode: <main | team-persistent | team-respawn | hybrid>
- Project root: <absolute path — operate from here, do NOT cd into subfolders>
- Story / spec file you'll be working with: <path relative to project root>
- Your specific role: <e.g. "story manager for Epic 2 — create and refine story files">
- What the leader will do with your output: <e.g. "validate the story spec, then send it to
the developer; if the spec has gaps the leader will send you a fix-request packet">
## First action — invoke your BMAD role-skill
Before doing anything else, invoke the Skill tool with skill name: <role-skill,
e.g. "bmad-agent-dev">. This loads the BMAD role frame you'll operate within for
your lifetime in this team. Stay in this frame until you receive shutdown_request.
The role-skill's menu of workflow commands is what you operate within.
## Hard rules
- Do NOT make any git commits. The leader handles all git operations.
- Do NOT make scope or design decisions. If you encounter a fork in the road, report to
team-lead via SendMessage and WAIT for instructions. Do not pick a path yourself.
- Do NOT skip steps you were asked to do. If a step is impossible, report and wait.
- Do invoke ONLY the BMAD workflow skill the leader names (`bmad-*` family),
plus whatever language helpers your role-skill's menu allows. Do not improvise with
non-BMAD skills.
- ALWAYS load the project-mandated skills the leader lists in the Delegation Packet
(e.g. typescript-clean-code) BEFORE starting the BMAD workflow skill. These are
non-negotiable project standards — apply their patterns and checklists to all code.
- If a task doesn't fit a BMAD workflow, do NOT improvise. Report to team-lead — they
will consult `bmad-help` to find the right BMAD path forward.
- When you receive a shutdown_request, approve it.
## Document-first handoff rule
Your detailed work goes into the story / tech-spec file (in the Dev Notes / QA Results /
Review Notes section the BMAD template provides), not into SendMessage payloads. When you
report back to team-lead, the message stays short:
"Done. <File>: <path>. Status: <one word>. Headline: <one line>."
The leader reads the file for the full picture; the next sub-agent in the chain reads the
same file. This keeps everyone's context lean and the handoff durable.
You may receive messages from teammates. Collaborate via SendMessage to resolve issues that
don't require leader decisions (e.g., asking the developer for a file path).
Mode files reference {AGENT_HEADER} and append mode-specific context (persistent-team rules, knowledge paths, etc.). The leader fills in the bracketed <...> placeholders concretely on every spawn — never leave them generic. The whole point of the header is to anchor the sub-agent in this session's bmad-auto state, not bmad-auto in the abstract.
Every message you send to a sub-agent — first spawn, feedback round, fix request, escalation — is a Delegation Packet. The packet's purpose is to prevent context loss at every handoff: don't compress "4 detailed findings with reasoning" into "apply the 4 fixes" — the specifics are the load-bearing parts.
Read references/delegation-packet.md for the full template (8 slots) plus three worked examples (reviewer-fixes-issues handoff, story-developer feedback round, escalation to tech-researcher). Read it the first time you compose a packet in a session — the shape is much clearer with concrete examples than from a bullet list.
The slots, at a glance: Task / Why this matters / Skill to invoke / Prior findings (verbatim) / Specific actions / Knowledge sources / Success criteria / Where to write detailed work / Report back with. Detail and worked examples in the reference.
The researcher is a lazy, on-demand sub-agent. Don't pre-spawn one at startup, don't keep one alive "just in case." Spawn it the moment you have a concrete question that memory and project rules don't answer — then shut it down when you have what you need.
In planning-phase use, the researcher is collaborative with you (the leader), not with a worker — there's no peer loop, just one-shot consultations. After Tier 1 escalation, it's a peer to the stuck worker.
Before issuing the Agent call:
{MEMORY_SOURCES} — vault, memory MCP, auto-loaded MEMORY index. If the user has a prior decision on this exact topic, use it. Cite it ("Per decisions/local-llm-runtime-choices, using llama.cpp + Metal").{KNOWLEDGE_PATHS} — project rules may already mandate a choice.This is a sequence, not a gate — memory check takes seconds and is free. It just prevents redundant research where the answer already exists.
For worker blockers specifically (not planning-phase ambiguity), follow the standard ladder:
tech-researcher peer-to-peer with the stuck worker (up to 3 message rounds). Leader monitors, does not relay.Reference —
references/escalation.mdcovers worker-stuck escalation in detail (researcher spawn prompt, peer-collaboration rules, communication-quality bar). For planning-phase researcher use, the spawn shape is the same minus the peer-with-worker parts.
Functional validation runs on the developer's output to catch issues unit tests can't. The strategy depends on how many stories the epic contains:
The "light vs full" definition and the project-type detection (web app, embedded, CLI, etc.) live in references/functional-validation.md. Read it when you reach a validation step.
Leader does all git commits. Sub-agents never run git commit. Period. Why: commits are decisions about what ships; sub-agents don't have the cross-step view to make them safely.
Leader makes all decisions. Story validation, code review, scope calls, escalation triggers — all leader. Sub-agents execute and report. Why: decisions need the leader's full conversation context, which sub-agents don't have.
Leader gives sub-agents the exact role-skill AND workflow skill to invoke. First action on spawn = the BMAD role-skill (bmad-agent-dev, bmad-agent-pm, bmad-agent-architect, bmad-agent-analyst per the canonical mapping). Per-request action = the named workflow skill (bmad-create-story, bmad-dev-story, bmad-quick-dev, etc.). How the sub-agent invokes them depends on the environment — see Rule 22: in many setups the Skill tool does NOT expose bmad-* to sub-agents, so the spawn prompt must tell them to read and follow the skill's SKILL.md file directly. Name the exact role-skill + workflow-skill either way; never write "figure out which skill to use," and never let a sub-agent invoke a non-BMAD skill outside its role-skill's menu.
One sub-agent per execution step. Don't combine "develop + test + review" into one agent. Why: separation lets the leader review each step's output before the next begins; combining defeats that.
Follow BMAD workflows. Don't bypass slash command workflows that the project depends on. Workflows produce the structured artifacts (story files, retro docs, sprint status) that the next step expects.
Respect epic order. Epics are sequentially dependent.
Align with architecture/PRD. Misalignment → invoke bmad-correct-course. Why: drift becomes invisible debt fast; correct course is cheaper than rework later. bmad-correct-course is also the right tool for a mid-flight infrastructure / tooling / test-strategy change (e.g. "make the e2e standalone by adopting a mock server"), not just architecture/PRD drift — it produces a reviewable Sprint Change Proposal the user approves before any code lands. The leader-run shape that worked this session: (1) recon the premise against reality first (Rule 31) — the proposed change often rests on an assumption that's already false (e.g. the "external calls" to be mocked were still stubs; the service made no outbound gRPC so an HTTP-only mock fully covered it); (2) research the external dependency (WebFetch the tool's README for the exact wire contract — control endpoints, what it can't do); (3) AskUserQuestion on the genuine forks (scope, which edges, dependency posture) before drafting; (4) write the proposal with a concrete demo/verification edge + an honest sequencing caveat (what depends on a not-yet-built story), get approval, then delegate implementation and own the doc updates + commit yourself.
Always attempt build validation before commit. Never commit code that doesn't compile.
Every handoff is a Delegation Packet. No "apply the fixes" one-liners. The packet's specifics are what prevent the next round from rediscovering everything.
Verify infrastructure, not just tests. When a story touches Docker / DB / queue / external API, functional validation must hit the real thing or report PARTIAL. Why: mocked unit tests can pass perfectly while real infra is misconfigured — that class of bug compounds across stories.
Act on retro findings. Items marked CRITICAL or HIGH at retro = pre-flight checks for the next epic. The retro isn't documentation; it's a checklist.
In auto-commit mode, still ask before destructive ops. Auto-commit covers the happy path only — force-push, branch deletion, merging into main always require explicit approval.
Researcher is lazy, not gated. Don't pre-spawn or keep one alive "just in case." Do spawn freely the moment you hit planning ambiguity, an unclear architecture/intent/approach, or a design decision where best-practices research would help. Shut it down once the question is answered.
Memory before research, research before halting. Before researching, check {MEMORY_SOURCES} and {KNOWLEDGE_PATHS} — if the user has a prior decision or the project rules cover it, use that and cite it. If neither source has the answer, spawn the researcher; don't halt to the user just because memory is empty. Halt only for non-technical blockers (scope, PRD gaps, business decisions).
Spawn sub-agents from project root. Every Agent call must be issued at the project root (the directory containing _bmad-output/), and the spawn prompt must explicitly state the absolute project root path so sub-agents anchor relative paths correctly. Never spawn from a subfolder. Why: sub-agents inherit cwd, and a wrong cwd silently breaks every relative path in the story file.
Document-first handoffs. Sub-agents write their detailed work — implementation summaries, validation results, review findings — into the story file (Phase 4) or quick-doc file (Quick Flow), not into SendMessage payloads. Reports back to the leader stay short: status, file path, headline. The next sub-agent in line reads the file directly. This keeps the leader's conversation context lean and gives every handoff a durable, resumable artifact.
Sub-agents work only via BMAD role-skills and workflow skills. Every sub-agent invokes its assigned BMAD role-skill first and uses only BMAD workflow skills (the bmad-* family) for the work itself. No improvisation. If a task doesn't fit the workflow catalog, the agent reports to team-lead; the leader invokes bmad-help to find the right BMAD-recommended next step before falling back to bmad-correct-course or Tier 3 halt.
Measure persistent sub-agent context, don't trust self-report. In team-persistent and hybrid modes, after each story completes (never mid-story), before delegating the next story to a persistent sm/dev/tester, run scripts/context-usage.py --agent-name <name> --context-window <window>. The script returns ok (keep going) or respawn-with-handover (the agent crossed a threshold or already auto-compacted). On respawn-with-handover, follow the protocol in modes/team-persistent.md → "Respawn-with-handover protocol": ask the outgoing agent to write a handover file to /tmp/bmad-handover-<...>.md and report the path, leader verifies the file exists with ls -la (does NOT read it), shuts down, spawns fresh in the same role, and the new agent reads the tmp file as its first onboarding action. The leader passing only the path keeps its own context lean. Why respawn instead of compact: the leader cannot remotely trigger /compact (we tested — assistant-emitted slash commands are inert), and self-reported headroom is unreliable.
A persistent sub-agent's messages lag its work — trust state, not chatter. Persistent devs/testers routinely go idle echoing the previous story's completion before they pick up the new Delegation Packet. Do NOT read those echoes as "nothing's happening" or as a fresh report. The ground truth that the agent engaged is the artifact: sprint-status.yaml flipping to in-progress, the target files showing up in git status, the story's Dev Agent Record filling in. Check those before nudging. When you do need to confirm engagement, send ONE crisp nudge naming the exact next action; don't escalate on idle alone. (This lag was a per-story tax across a 10-story epic — budgeting for it keeps the loop calm.)
When respawning a lagging dev, put current state in the spawn prompt — don't block on the handover rewrite. Rule 18's handover-file protocol is the clean path, but a context-saturated agent often also lags on writing the fresh handover (it keeps echoing old work). Don't stall the run waiting for it. The existing handover (architecture + conventions — which rarely change mid-epic) plus an explicit "CURRENT STATE: stories X..Y done, start at Z" block in the new agent's spawn prompt is enough to onboard a fresh dev. Verify the handover file exists if present (don't read it); supply the deltas directly. This turned a multi-minute stall into a clean respawn.
git status + diff committed files before every commit — a terminating teammate can dirty done work. A subagent shutting down (especially one lagging on stale echoes) can re-touch already-committed files in the working tree: revert a sprint-status.yaml line it "remembers" as not-done, or re-save docs. Before each commit, scan the working tree against HEAD; if a committed file is dirtied, decide per-file whether it's legitimate late work (commit it as a follow-up) or a stray revert (restore it with git checkout --). Pair with the existing git show HEAD: verify-after-commit (C1). This caught a stray status revert this session that would otherwise have shipped.
Sub-agents usually CANNOT call bmad-* skills via the Skill tool — they must read the SKILL.md file directly. The Skill tool resolves by the agent's own injected available-skills list, NOT by scanning disk. Spawned sub-agents (e.g. general-purpose) commonly inherit a base skill set that excludes the project's bmad-* plugin skills — so Skill(name="bmad-create-story") returns "Unknown skill" even though the file exists under .claude/skills/ (or .agents/skills/). The leader CAN call them (they're in the leader's registry); sub-agents typically cannot. So write spawn prompts to read and follow .agents/skills/<role-skill>/SKILL.md and .agents/skills/<workflow-skill>/SKILL.md directly as the working path, and have the sub-agent report which path it used the first time so you confirm the environment. Don't assert "it's file-based only" without the agent's empirical result — but don't waste a sub-agent's turn calling the tool once you've confirmed it errors in this environment. Why: this is the single most common false start for a sub-agent; getting it right up front saves a wasted turn on every spawn.
Developers run test:e2e:docker (or the project's full e2e) THEMSELVES before reporting review — and the leader still re-runs it independently. Two halves, both load-bearing. (a) Verify the environment claim before trusting it. Sub-agents will assert "no Docker / can't run e2e" — check docker info yourself; if it's up, the agent can run it, and "no Docker" was an unverified assumption (the same class of mistake as Rule 22's "skill not found"). When the env genuinely supports it, require the dev to get the full e2e green before reporting; a review with the e2e unrun is not done. (b) The leader re-runs the full e2e anyway — never trust a self-reported green. This session a dev reported "271/271, all green" while the full Docker e2e actually failed 6/9 suites at module boot; only the independent re-run caught it. The dev's UNIT tests passed because Test.createTestingModule mocks providers and never boots the real module — unit-green ≠ boots. Why: independent e2e is the backstop that has caught a real latent bug on multiple stories; the dev running it first just shifts the catch earlier and removes round-trips.
A story's e2e must drive the REAL entry point, not pre-seed state through repositories. The recurring latent-bug pattern this session: a story's own e2e exercised its component in isolation (seeding rows via repo.create(...), or app.get(SomeService) directly) but never drove the actual intake/command path the app uses in production — so a wiring defect (a DI two-instance split; a service that never stamped required fields; a module that fails to boot when imported standalone) stayed invisible until the next story drove the full path. When validating a story's tests, check that at least one e2e goes through the genuine entry point (the pipeline / command / HTTP-or-Kafka intake), and that the test module boots the real module the way the app composes it (Rule 25). Why: "isolated component passes" is a weaker guarantee than "the real path works," and the gap compounds across stories.
A module must be self-contained enough to boot wherever it's imported — watch for app-level-only providers. A common boot failure: SomeModule.forRoot() registers a sub-module (Redlock, a Kafka producer, a lock) that injects a provider (a Redis client, KAFKA_API) which is actually registered separately at the app level. The full app boots fine (the provider is global there), but every e2e that imports SomeModule.forRoot() standalone fails at module init with "Nest can't resolve dependencies of X". Fix the MODULE once — have forRoot() import what it needs (configured from app config, often isGlobal: true) and reconcile any double-registration with app.module.ts — rather than patching N e2e files. Verify the full app still boots after the reconciliation. Why: unit tests with mocked providers hide this entirely; it only surfaces under a real e2e boot, so it's exactly the class of bug Rule 23/24 exist to catch.
Before adopting a library — even an org-internal or precedent-set one — verify its underlying client/runtime matches the target repo's. A precedent ("service X already uses this lib") doesn't transfer if service X is on a different stack. This session an org lock library was correct in spirit but depended on ioredis@5, while the target repo used redis@5 + a different NestJS Redis kit — adopting it would have bundled a parallel client + connection. Check the lib's dependencies/peerDependencies (and what it imports) against the repo's installed stack; prefer a library in the SAME family the repo already standardized on (often it's already transitively present). Confirm typecheck/boot, not just npm install succeeding (install can succeed while two clients silently coexist). Why: a mismatched transitive client is a subtle, lasting architectural wart that compounds.
Never edit a sub-repo/source file while a teammate is live and working it — check the roster + file mtime first. The "apply the trivial fix yourself instead of round-tripping" shortcut (justified for a shut-down or genuinely-stuck agent) becomes a race if the agent is still alive and editing the same file. Before the leader touches a file a sub-agent owns: confirm the agent is shut down / idle-with-no-pending-work (team roster) AND the file isn't being actively written (recent mtime, or you just messaged the agent to fix it). If a teammate is live on it, send a Delegation Packet and let them finish; the leader edits directly only when no teammate is working that file. Why: two writers on one file corrupts work and wastes both efforts — and the "file modified since read" error is the late signal you already raced.
Keep the status fields and the decision log honest as you go — there are TWO status fields, and decisions accrue fast. (a) Every story has both a sprint-status.yaml entry (the orchestration truth) AND a Status: header line in the story .md (set by the creator/dev during their steps, then goes stale). At each transition — especially at commit — update BOTH; the file header is what a human reads when they open the story directly. Re-read the story file right before you edit its Status: header at commit — the dev is often concurrently filling its Dev Agent Record, so a stale read produces a "file modified since read" error (this hit twice this session); a quick re-read also lets you confirm the dev's record is complete before you stamp done. (b) Maintain a running DECISION-LOG.md (under the implementation-artifacts dir) and append to it on the fly: session-level choices, every per-story seam ruling you make at validation, any latent bug independent validation caught, and process corrections. (c) On every HALT (user-directed or end-of-scope), write an explicit Resume point: line into the DECISION-LOG — what's done, what's next, and any scope ruling already decided for the next unit of work. The DECISION-LOG is the cross-session resume artifact: a clean halt+resume this session (HALT after epic 2, then again after story 3-4) worked precisely because the next session could read the resume point + the per-story recon already banked, and pick up without re-deriving anything. Why: the leader makes dozens of consequential calls across an epic; without an on-the-fly log they're invisible to the user and unreviewable later, a stale story header silently misleads anyone reading it, and a resume without a recorded resume-point re-litigates settled decisions.
A sub-agent's "green" must be reproducible from a CLEAN invocation — verify it yourself in a clean shell, not a warm/ambient one. This session, A2 caught a non-reproducible green on 5 of 8 stories — and the recurring root cause was not bad logic but a pass that depended on ambient state the leader's clean re-run didn't share: a stray shell NODE_OPTIONS=--experimental-vm-modules (the e2e only passed because the dev's shell had it set; a clean checkout/CI failed), a warm Kafka broker (the suite passed second in an ordering but failed as the first cold-boot suite), an already-seeded DB. When you re-run a sub-agent's e2e for the A2 backstop, strip the ambient: run from a fresh shell, unset inherited env that matters (env -u NODE_OPTIONS …), and from a cold infra state. If your clean run disagrees with the dev's, the dev's pass was ambient — the fix is to make the test self-contained (a dedicated script that sets its own flags, e.g. "test:e2e:feature": "NODE_OPTIONS=… dotenv -e .env.test -- jest … --forceExit"), not to accept the ambient pass. Why: "passes on my machine" is precisely the failure CI surfaces; the authoritative check is one that carries its own preconditions.
Use the project's ATOMIC e2e command for the authoritative verdict — never hand-juggle infra up/down or run overlapping Docker cycles. The atomic command (test:e2e:docker and friends) owns its own up → test → down. This session the leader, trying to isolate a cold-start flake, ran several overlapping infra:up/down cycles + manual jest runs against the same broker/ports — which produced false failures (a 90s-timeout run where even a known-green suite "failed" was pure self-induced contention), and burned real time chasing a non-bug. Discipline: (a) one atomic e2e at a time; confirm no leftover containers (docker ps | grep e2e-test empty) before starting; (b) for the authoritative result run the FULL suite via the atomic command (a cherry-picked subset changes which suite hits the cold broker first and can spuriously trip a cold-start race); (c) if a flake is suspected, re-run the same atomic command ONCE clean rather than dissecting it with manual infra commands; (d) capture the full run to a file (npm run test:e2e:docker > /tmp/e2e.log 2>&1) rather than piping through | tail -N — when a suite fails you need the full PASS/FAIL test/... list to name the failer, and a tail truncates exactly the lines you need, forcing a wasteful re-run (this happened this session: a tail'd log hid the failing-suite name; the clean re-run was all-green and the "failures" were self-induced contention). Pair with Rule 29: contention is just another ambient-state difference. (Note: per a user directive this session, some services may be designated skip-e2e — unit + quality only; honor any such per-repo directive in the DECISION-LOG and don't block those stories on e2e.)
Recon the design docs against the REAL codebase before writing the story/code — when they drift, build reality and (leader) fix the doc. Project conventions say "align to the design," but design docs routinely go wrong about mechanism once code exists — a file path the real loader doesn't scan (the convention moved, but the doc didn't), a composition/transport mechanism the repo doesn't actually use (the doc names a generic library; the repo wired a specific adapter pattern), a contract the installed code contradicts (a config-edge case the doc never anticipated breaks a shared util). The pattern that worked: the story-creator empirically verifies the contract against installed code (grep the loader, diff the generated artifacts, compose the real thing) and surfaces the drift; the leader verifies the drift itself (never on the agent's word — read the real module), rules to build the reality, and owns the doc correction (planning docs aren't the dev's to edit — fix the design doc post-commit). Why: a story built to a wrong doc compiles but mis-wires; catching the drift at recon is far cheaper than at the next story's seam.
A pre-existing repo-infra gap is NOT the current story's failure — verify the claim, self-contain the new test, log the gap as tech-debt. Twice this session a dev correctly stopped to flag a blocker that turned out to be a pre-existing breakage: one service's test:e2e couldn't even compile (it imported a config module + a migration chain that had been deleted in a prior refactor), and another repo had never run a component test (its test setup referenced an API that the testing library had since dropped, and a design-system dependency was ESM-only and untransformed by the existing config). The leader's move: (a) verify the breakage yourself (Rule 23 discipline — don't trust "it's broken" any more than "it works"; grep for the missing imports, run it); (b) if genuinely pre-existing, rule to build a self-contained test (an in-process stub + a minimal .env.test; or enable the missing test infra) that satisfies the AC's intent without reviving the broken scaffolding; (c) do not let it block the story — log it as repo tech-debt for the retro. Why: forcing the story to fix unrelated repo rot expands scope unboundedly; a self-contained test proves the new behavior and the gap is tracked separately.
When a story must touch SHARED code, the leader independently runs the FULL existing suite as the blast-radius proof before approving. This session two stories crossed the usual "mirror the existing module, change nothing shared" boundary for a justified reason: a shared parsing utility had a latent bug on an input shape no existing caller used yet, and a repo's test config had never been wired for a new test type the story needed. Each is a legitimate shared change — but shared means every consumer is in the blast radius. Before approving, the leader applies the exact change and runs the whole existing test suite himself (not just the new tests): one was ~309/309 with every existing spec unchanged; the other ~1080/1080 with the ~1073 pre-existing tests still green despite a global config addition. Only a green full-suite proves the shared edit is backward-compatible. Require the change be minimal + provably back-compat (e.g. the new branch is a no-op for the existing/common path), and have the dev add a unit cell asserting the unchanged case. Why: a shared edit that "fixes my case" can silently regress the other N consumers; the full-suite re-run is the only honest proof, and it's the leader's to run.
For any repo whose format script globs the whole tree, tell the dev UP FRONT to format only touched paths — never run the repo-wide npm run format. This was the single most repeated process failure across the epics: a repo's npm run format (or a lint --fix with a broad glob) reformats dozens of committed, unrelated files — e.g. one story churned ~26 files (unrelated test specs and 15 committed DB-migration files, including a 933-line pure-reformat) — all pure formatter noise (quotes/indent/semicolons), zero functional change. Rule 21 catches this at commit (diff the working tree, git checkout -- the churn), but the catch costs a full revert round-trip with the dev. Prevent it instead: at first touch of a target repo, check its package.json format/lint scripts; if the glob is repo-wide (./**/*, or {src,test}/**/* without a path filter), the Delegation Packet's Specific actions must say verbatim: "format ONLY the files you touched — npx prettier --write <space-separated touched paths> — NEVER npm run format (its glob is repo-wide)." A narrowly-scoped format script (one that only globs the story's own area, like a per-module path) doesn't need the warning. Why: prevention is one sentence in the packet; the cure is a revert round-trip plus the risk the churn ships if Rule 21's scan misses it under a large diff. (Verified this session: adding the warning to the packet eliminated the churn on the very next story in the same repo.)
Note on
sprint-status.yaml: re-read it after every Phase 4 sub-agent report. It's the ground truth for "what step are we on" and surviving crash-resume (and, per Rule 19, the truth about whether a lagging agent actually engaged). (Not a numbered rule because every Phase 4 step inflows/phase-4.mdalready calls this out at the point of use.)
Note on append-vs-regenerate when adding an epic to an in-flight project: workflows like
bmad-create-epics-and-storiesare built to (re)generateepics.mdfrom a template — which would clobber existing epics. When the project already has committed epics and you're adding one (e.g. a new security/remediation epic), do NOT run the from-scratch generator over the live file. Append the new epic toepics.mdin the established template, add its FRs/NFRs to the PRD, and add its entries tosprint-status.yamldirectly (the precedent earlier epics were folded in with). Recognize this fork before invoking the skill.
modes/<your-chosen-mode>.md — how to execute steps in your mode (sub-agent spawning, lifecycle, when to reuse).flows/phase-4.md or flows/quick-flow.md — the actual step-by-step.references/delegation-packet.md — handoff template + examples.references/escalation.md — tier 2/3 details.references/functional-validation.md — light vs full validation, project-type detection.references/guides/ — project-type-specific validation playbooks (read on-demand).scripts/context-usage.py — measure a persistent sub-agent's actual context usage from its session transcript. Run between stories to decide keep-alive vs compact vs respawn. See Critical Rule #18 and modes/team-persistent.md → "Context-budget check between stories".npx claudepluginhub bmad-labs/skills --plugin bmad-skillsAutomates the BMAD story build cycle (create, dev, QA, review, retrospective) with resumable tmux orchestration.
Displays available BMAD slash commands, specialized agents, and workflows organized by development phases from analysis to implementation.
Launches parallel agents that each process a story through an SDLC pipeline (design, test, dev, verify, docs) using isolated git worktrees. Useful for multi-story feature development with automated verification loops.