From claude-swe-workflows
Autonomous technical lead that drives projects from commander's intent to completion using an OODA loop. Invokes other skills for implementation, refactoring, review, and bug-hunting.
How this skill is triggered — by the user, by Claude, or both
Slash command
/claude-swe-workflows:lead-projectopusThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Drives a project from a stated intent to completion with minimal user involvement. The user provides commander's intent at startup and reviews at the end (or on andon cord). Between those points, the skill runs an OODA loop — observing project state, orienting against intent, deciding what to work on next, and acting by invoking other skills. It can scope new tickets, implement features, refact...
Drives a project from a stated intent to completion with minimal user involvement. The user provides commander's intent at startup and reviews at the end (or on andon cord). Between those points, the skill runs an OODA loop — observing project state, orienting against intent, deciding what to work on next, and acting by invoking other skills. It can scope new tickets, implement features, refactor, run reviews, hunt bugs, and deliberate on hard decisions. It stops autonomously when intent is fulfilled and quality is acceptable, or pulls the andon cord when it hits a wall.
This skill implements the autonomy discipline documented in references/autonomy.md at the highest level of the orchestrator family. The shared discipline governs the five levers (altitude rule, pre-loaded options, pre-rebutted recommendation, commander's intent, risk budgets), the cascade rule, the shared handoff template, and the "log instead of escalate" pattern. Skill-specific extensions (the OODA loop structure, trajectory audits, mechanical termination gates) are layered on top of that shared discipline.
The user states intent once, in structured form, at startup. Every subsequent decision traces back to it. Intent has five parts:
This five-field schema is the canonical implementation of commander's intent referenced from references/autonomy.md. Other orchestrator-family skills (/implement-project, /lead-refactor, /lead-bug-hunt) use lighter variants — fewer fields because the work is more bounded — but /lead-project's purpose is the most open-ended, so it elicits the full schema.
Without a concrete end state, the loop has no termination condition and will drift into polish. If the user's initial statement is vague, keep asking until intent is crisp — "make it better" is not enough; "all features in backlog.md work end-to-end, go test ./... exits 0, and CHANGELOG covers the changes" is. Intent elicitation is the primary human-interaction point; invest the time.
Every cycle runs explicit phases: Observe → Orient → Decide → Act. The Orient phase carries the most weight — it checks drift (does recent work trace to intent?), termination (is end state met?), and reorients the mental model if observations contradict prior assumptions. /think-* skills live mostly in Orient and Decide.
The skill invokes any other skill, creates tickets, commits freely, and exercises engineering judgment. The andon cord is the only planned escalation path. The user is product owner, not project manager — the skill fills the project-manager role.
The skill may: create and modify branches (except main/master), commit, open tickets via /scope, refactor, invoke review skills, spawn subagents, run tests, install local project deps if the package manifest requires it.
The skill may NOT without explicit permission: push or merge to main/master, create public releases or tags, force-push, install global/system dependencies, run irreversible destructive operations.
/review-* skills produce findings indefinitely at low severity. The skill applies severity thresholds (see below) and defers low-value polish. "Done" is defensible when intent is met and no high-severity issues remain — not when every reviewer is silent.
┌───────────────────────────────────────────────────────────────┐
│ LEAD-PROJECT WORKFLOW │
├───────────────────────────────────────────────────────────────┤
│ 0. Startup │
│ ├─ 0a. Branch and working-tree check │
│ ├─ 0b. Resume existing run or start fresh │
│ ├─ 0c. Elicit commander's intent (five fields, │
│ │ classify end-state as mechanical / subjective) │
│ ├─ 0d. Optional /review-health │
│ └─ 0e. Seed LEAD_PROJECT_STATE.md │
│ │
│ 1. OODA loop (repeat until terminate or andon cord) │
│ ├─ 1a. Observe — snapshot current state │
│ ├─ 1b. Orient — align to intent, check drift, │
│ │ run mechanical termination checks │
│ ├─ 1c. Decide — choose next skill/action │
│ ├─ 1d. Act — invoke, verify, record in state doc │
│ └─ 1e. Trajectory audit (every 10 cycles) │
│ │
│ 2. Termination │
│ ├─ 2a. Final verification pass (mechanical) │
│ └─ 2b. Completion report (includes subjective sign-off) │
└───────────────────────────────────────────────────────────────┘
Follow the shared startup protocol in references/lead-startup.md. Skill-specific values:
0a. Branch and working-tree check — branch-name pattern: lead-project/<descriptive-name>. The descriptive name comes from the operator or is derived from the intent's purpose.
0b. Resume existing run or start fresh — state-doc filename: LEAD_PROJECT_STATE.md. "Resume as-is" semantic: agent re-runs a full Observe + Orient before the next Decide.
0c. Elicit commander's intent — five fields per the canonical schema in references/autonomy.md § "Commander's-intent schemas per skill / /lead-project". Push-back examples specific to this skill:
For each End-state condition, classify it in the state doc as Mechanical (a shell command or other deterministic check the skill can run, e.g., go test ./... exits 0) or Subjective (requires human judgment, e.g., "README reads clearly to a new user"). Mechanical conditions become hard gates at termination; subjective conditions are presented to the user in the completion report for final sign-off.
0d. Optional /review-health — see below (skill-specific step inserted between intent elicitation and state-doc seeding).
0e. Seed LEAD_PROJECT_STATE.md — include the pinned intent (all five fields, verbatim), the initial triage plan (first 3–5 actions), an empty cycle log. Gitignore the state doc per the protocol.
/review-healthDecide whether to run /review-health based on:
If run, capture the findings as input to initial triage planning.
Repeat until termination condition met, andon cord pulled, or hard cap (50 cycles) reached.
Each cycle follows four explicit phases. Keep phase transitions visible in the state doc — the cycle log entry should note what happened in each phase.
Snapshot the current project state. Always check:
git status and current branchDo not interpret yet. Just gather facts.
Interpret the observations. This phase carries the most weight. Do all of the following:
Intent alignment. Read the pinned intent (all five fields) from the state doc. For each key task: is it complete, in progress, or not started? How does current project state compare to the declared end state?
Drift check. Review the last 3–5 cycle log entries. Does recent work trace to purpose + key tasks? Has scope expanded beyond the declared boundaries? Has the skill started working on non-goals? If drift is detected, either course-correct in the Decide phase or — if drift is severe — pull the andon cord.
Termination check. Conditions for autonomous termination:
/review-test and /review-release; add others if end state mentions them). Termination requires these to produce no new high-severity findings. This is the mechanical gate against motivated termination — the agent cannot declare done unless fresh review output agrees.git diff --stat against the cycle N-2 SHA, excluding the state doc and .gitignore).If conditions 1–4 hold → proceed to step 2 (termination). Subjective conditions (5) are surfaced to the user in the completion report rather than gated autonomously.
Otherwise → proceed to Decide.
Model update. If observations contradict prior assumptions (a "known working" feature now fails tests, a refactor broke something unexpectedly, a review surfaced a class of issue not previously considered), update the working understanding in the state doc before deciding.
Use /think-reframe if the problem framing itself seems wrong (e.g., repeated failures suggest the goal is mis-specified). Use /think-diagnose if something is failing and the cause isn't obvious.
Choose the next action based on orientation. Priority order:
/implement over /refactor//review-* until the backlog is empty.Available actions (non-exhaustive):
/scope — draft new tickets when gaps emerge that serve intent/implement or /implement-batch or /implement-project — execute ticketed work/refactor or /lead-refactor — code quality cleanup/review-arch, /review-test, /tidy-docs, /review-release, /review-perf, /review-a11y, /review-security — targeted reviews/lead-review — comprehensive review pass/bug-hunt — proactive bug discovery/bug-fix — diagnosis-first bug fixing/test-mutation — mutation testing/release — cut a versioned release; only invoke when the commander's intent explicitly authorizes publishing, and expect to halt at /release's local→remote boundary confirmation regardless (publishing a release is always-andon territory per this skill's autonomy contract)/think-deliberate — adversarial option selection when two or more choices are materially different and the trade-off is unclear/think-scrutinize — stress-test a plan before executing it, when the plan is risky or novel/think-brainstorm — divergent idea generation when no obvious action presents itselfWhen invoking a sub-skill that supports autonomous mode, use it. When a sub-skill requires interactive input, answer autonomously using engineering judgment anchored to the pinned commander's intent. Only pull the andon cord if the sub-skill itself pulls its andon cord for reasons this skill cannot resolve.
Reviewer tie-breaker. If two review skills produce contradictory findings on the same file within the run — e.g., /refactor removed a helper that /review-arch later recommends restoring, or /review-test wants a test split that /tidy-docs flags as harming readability — do not oscillate. Pull the andon cord. Contradictory review verdicts are a product judgment call the user should make, not a loop the skill should try to resolve. Include both findings in the handoff.
Reviewer invocation cap. A given review sub-skill (e.g., /review-arch) may be re-invoked only if files it previously flagged have been materially modified since. Rerunning a review against unchanged code thrashes. Record each review invocation and its target scope in the state doc; check before re-invoking.
Record the chosen action and rationale (one or two sentences) in the cycle log before proceeding.
Execute the chosen action. After it completes:
Return to step 1a for the next cycle.
Every 10 cycles (cycles 10, 20, 30, 40), perform a trajectory audit at the start of the cycle, before Observe. This is an internal self-check — not user-facing unless it triggers the andon cord.
The audit asks: are we converging toward intent, or drifting/thrashing?
Audit inputs (read directly, not from cycle log narrative):
git log --oneline <branch-start>..HEAD — actual commits madegit diff --stat <branch-start>..HEAD — actual changesAudit questions:
Audit verdict:
Record the audit verdict in the state doc under a ## Trajectory Audits section. Two consecutive Diverging verdicts, or any Thrashing verdict, pulls the cord.
Before declaring done, perform one last verification. This is partly redundant with the Orient-phase termination check — that's intentional. Redundancy here catches race conditions between "check passed in Orient" and "report generated in 2b."
If any check fails, treat as a blocker — return to the loop and fix it.
Produce a completion report for the user. The report is ordered by review priority — the sections most likely to need user scrutiny come first.
Evidence format. For every claim of a completed key task or end-state condition, provide both:
Narrative without an artifact is not evidence. An artifact without narrative is unreadable. Both together let the user skim at narrative level and drill into the artifact when a claim looks suspicious.
## Lead-Project Complete
### Commander's intent
[Verbatim, all five fields]
### Outcome
[One-paragraph summary of whether intent was fulfilled and how. State plainly if
any mechanical end-state condition failed or any subjective condition needs user sign-off.]
### Top things to scrutinize
[Three to five items the user should look at first. Pick the items where the
skill's judgment is most likely to be wrong or where the stakes are highest:
- commits touching sensitive or high-risk code,
- findings the skill downgraded or deferred,
- decisions made via /think-deliberate,
- places where subjective end-state conditions require sign-off.
Each item: one sentence + artifact (SHA, file:line, or state-doc section).]
### End-state verification
**Mechanical conditions:**
- [✓] <condition 1> — <narrative> — `<command>` exit 0 at SHA <short>
- [✓] <condition 2> — <narrative> — `<command>` exit 0 at SHA <short>
**Subjective conditions (awaiting user sign-off):**
- [?] <condition 3> — <narrative of what was done toward it> — see SHA <short>, file:line
### Key tasks status
- [✓] <task 1> — <narrative> — SHA <short> / ticket #N
- [✓] <task 2> — <narrative> — SHA <short>
- [~] <task 3> — partial, see Deferred
- ...
### Deferred items
[Findings and opportunities the skill chose not to address.
Grouped by severity and type. Each item: one-line description,
rationale for deferral, pointer to where it's tracked.]
- [medium | /review-test | cycle 14] <description> — deferred because <reason> — tracked in state doc section X
- [low | /tidy-docs | cycle 22] <description> — deferred because <reason>
### Constraint/non-goal adherence
[Confirm no violations. If any close calls occurred, name them with commit SHAs
so the user can verify.]
### Recommendations
[Suggested next steps: "ready to merge," "consider follow-up iteration for deferred items,"
"one open design question remains (see handoff)," etc.]
### Changes summary
- Branch: <branch name> (SHA <short>)
- Base: <base branch> (SHA <short>)
- Commits on branch: N
- Net lines: +X/-Y
- Tickets created: N (list IDs)
- Tickets closed: N (list IDs)
### Run metadata
- Cycles: N of 50
- Actions taken: [grouped: "2x /implement, 3x /refactor, 1x /review-arch, 1x /review-test"]
- Trajectory audits: [verdicts at each audit point]
- Duration (wall-clock, approximate)
The user decides whether to merge to main, run another iteration, or pause.
One or two sentences stating why this iteration exists. The underlying motivation, not the tactical outcome.
Good: "Ship v1.0 of the MCP server to external users." Weak: "Work on the MCP server."
Non-negotiable outcomes that must be true at the end. Written as state, not activity. Listable — between 2 and 10 items.
Good:
v1.0 are closed."Weak:
Concrete, observable conditions defining completion. Pragmatically: what could the skill check to prove it is done? These become the termination conditions.
Good:
go test ./... exits 0."go build produces a binary that responds to --help."Weak:
Hard limits. The skill must not violate these during the loop.
Examples:
auth."Explicit out-of-scope. Things the skill should leave alone even if it sees opportunity.
Examples:
Reviewers produce findings indefinitely. The skill applies thresholds:
| Severity | Handling |
|---|---|
| Critical/High | Must address before termination. Blocks the termination check. |
| Medium | Address if the fix is bounded (small, obvious, localized). Otherwise defer with a note in the state doc. Does not block termination. |
| Low/Info | Defer by default with a note. Does not block termination. |
The skill records every deferral with rationale in the state doc so the completion report can present them transparently.
Follow the shared handoff template and per-skill extension protocol in references/autonomy.md § "Shared handoff template" and § "Per-skill handoff extensions". Skill-specific values:
## Andon Cord — /lead-project — Cycle N (the cycle number is load-bearing for this skill)./think-* skills invoked and their verdicts, and alternative approaches considered. This is a /lead-project extension; sub-skill orchestrators typically don't have enough decision history to populate it.Mechanical end-state conditions: <K of M passing>Pending key tasks: <summary>Cycle log pointer: see LEAD_PROJECT_STATE.md cycles N-3 through N/think-deliberate or /think-reframe before escalating.Pull the cord when:
/think-deliberate cannot resolve from information available in the repo.LEAD_PROJECT_STATE.mdMaintained at the repo root. Gitignored. Survives across invocations so the skill can resume.
Structure:
# Lead-Project State
Started: <timestamp>
Branch: <branch-name>
Branch SHA at startup: <short SHA>
Base branch: <main-branch>
Base SHA at startup: <short SHA>
Last cycle HEAD: <short SHA>
Current phase: <startup | ooda-cycle | termination | andon-cord>
Cycle: N
Status: <active | paused-on-andon | complete>
## Commander's Intent
### Purpose
<verbatim>
### Key tasks
- <task 1>
- <task 2>
### End state
**Mechanical** (shell-runnable, gate termination):
- <condition 1> — `<command>`
- <condition 2> — `<command>`
**Subjective** (user sign-off at completion):
- <condition 3>
### Constraints
- <constraint 1>
### Non-goals
- <non-goal 1>
## Orientation
### Current understanding
<agent's working mental model — what's true about the project,
updated when observations contradict prior assumptions>
### Drift status
<OK | drifting — why>
## Triage plan
Initial plan set at startup, updated as orientation evolves.
1. <planned action>
2. <planned action>
3. <planned action>
## Cycle log
### Cycle N — <timestamp> — HEAD <short SHA>
- Observe: <one-line summary of state>
- Orient: <one-line summary of interpretation — drift status, termination status, mechanical-condition pass/fail counts>
- Decide: <action chosen + one-line rationale>
- Act: <skill invoked, outcome, changes + commit SHAs>
### Cycle N+1 — <timestamp> — HEAD <short SHA>
...
## Trajectory audits
Internal convergence/divergence/thrashing verdicts at cycles 10, 20, 30, 40.
- Cycle 10: <converging | diverging | thrashing> — <brief rationale>
- Cycle 20: ...
## Review invocation log
Track review sub-skill invocations to enforce the "only re-invoke if flagged files changed" rule.
- /review-arch @ cycle 7 — scope: repo-wide — flagged files: [list]
- /review-test @ cycle 12 — scope: pkg/foo — flagged files: [list]
## Deferred items
Findings and opportunities the skill chose not to address, with rationale.
- [medium | /review-test] <description> — deferred because <reason>
- [low | /tidy-docs] <description> — deferred because <reason>
## Open questions
Questions the skill is tracking but has resolved autonomously for now
(to surface in completion report).
- <question>
Update at every OODA phase transition. The state doc is the persistent Orientation — losing it means losing the agent's memory.
.gitignoreEnsure LEAD_PROJECT_STATE.md is ignored. Commit the .gitignore change on the working branch at startup if needed.
Sequential execution. One cycle at a time, one skill invocation per cycle. No parallel cycles.
Context discipline. The skill is a thin coordinator. It delegates all implementation to sub-skills. It maintains only summary-level state in its context; LEAD_PROJECT_STATE.md holds durable memory.
Sub-skill invocation. Invoke via Skill tool with autonomous overrides where supported. When a sub-skill requires interactive input, answer using engineering judgment referenced to the pinned intent.
Do NOT abort for:
Pull the andon cord for:
Abort the entire workflow for:
Relationship to /implement-project:
/implement-project is a once-through pipeline over a known set of tickets. /lead-project is an open-ended loop that decides what to work on next. /lead-project may invoke /implement-project when it has a coherent batch of tickets to execute.
Relationship to /scope and /scope-project:
/scope and /scope-project create tickets. /lead-project may invoke /scope when it identifies a gap worth ticketing. /scope-project is typically run by the user before /lead-project to establish the initial backlog.
Relationship to /lead-review:
/lead-review is a comprehensive review pass. /lead-project may invoke it near the end of a run to validate end-state conditions, or invoke individual /review-* skills earlier when specific concerns arise.
Relationship to /think-* skills:
/think-reframe — in Orient, when the problem framing seems wrong/think-diagnose — in Orient, when a failure cause is unclear/think-deliberate — in Decide, when options are materially different/think-scrutinize — in Decide, when a plan is risky or novel/think-brainstorm — in Decide, when no obvious next action presents itself/think-reflect is intentionally NOT invoked in the loop — it is calibrated for human consumption after the fact. The user may invoke it themselves on the completion report.
Hierarchy:
/lead-project
├── (startup)
│ └── /review-health (optional)
├── (per cycle, any of:)
│ ├── /scope
│ ├── /implement | /implement-batch | /implement-project
│ ├── /refactor | /lead-refactor
│ ├── /review-arch | /review-test | /tidy-docs | /review-release
│ ├── /review-perf | /review-a11y | /review-security
│ ├── /lead-review
│ ├── /bug-hunt | /bug-fix
│ ├── /test-mutation
│ └── /think-reframe | /think-diagnose | /think-deliberate
│ | /think-scrutinize | /think-brainstorm
└── (termination)
└── Completion report
npx claudepluginhub chrisallenlane/claude-swe-workflows --plugin claude-swe-workflowsOrchestrates full-lifecycle project workflow: batches tickets, implements via /implement-batch, runs smoke tests, then executes quality pipeline (refactor, review-arch, review-test, tidy-docs, review-release). Maximizes autonomy with andon cord escalation.
Routes sessions in long-task projects to the correct phase skill by checking files like bugfix-request.json, feature-list.json, design docs, and codebase state.
Guides developers through the full development pipeline: discover, brainstorm, plan, execute, review, and ship. Invoke when starting work on a bug, feature, improvement, or task.