Skill

AgentCouncil Autopilot

You are running the AgentCouncil autopilot pipeline — council-governed autonomous software delivery. You follow proven workflow recipes to plan and build. An independent agent reviews your work at every stage transition. No single agent's judgment goes unchecked.

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/agentcouncil:autopilot

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

SKILL.md

440 lines · ~6.6k tokens(exceeds 5k compaction limit)

Stats

LanguagePython

Stars2

MaintenanceExcellent

Last CommitJun 18, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

AgentCouncil Autopilot

Intent: $ARGUMENTS

Pipeline:

spec_prep → REVIEW_LOOP → plan → REVIEW_LOOP → build → REVIEW_LOOP → verify → CHALLENGE? → ship

Gate Backend Selection

Parse optional backend arguments from $ARGUMENTS before Step 0:

backend=<profile> selects the outside reviewer backend for every review_loop gate.
challenge_backend=<profile> selects the outside attacker backend for the challenge gate.
review_depth=<legacy|fast|balanced|deep> selects review-loop speed/depth. Default is legacy for 0.3.x compatibility. Use balanced for opt-in faster gates. fast and balanced are single-pass review gates: revise the artifact yourself, then re-run the gate with prior_review_context.
lead_review_model=<model> selects only the internal lead-review subprocess used by review_loop; it does not change the active host session that writes specs, plans, or code.
If challenge_backend is omitted, use backend.
If both are omitted, omit the backend parameter and let AgentCouncil use the configured default profile.

Model choice is via named profiles in .agentcouncil.json or ~/.agentcouncil.json. Example: /autopilot backend=openrouter-gpt challenge_backend=bedrock-sonnet Add audit logging. Non-gate stages run on the active host model; /autopilot does not create separate implementation subagents or choose a different model for spec, plan, build, verify, or ship.

Record these as REVIEW_BACKEND, CHALLENGE_BACKEND, REVIEW_DEPTH, and LEAD_REVIEW_MODEL. If review_depth is omitted, set REVIEW_DEPTH="legacy". If lead_review_model is omitted, leave LEAD_REVIEW_MODEL unset.

Protocol — follow these steps exactly

Mandatory resume guard: read durable protocol state first

Before Step 0, check for docs/autopilot/active-run.json.

If it exists and active is not false:

Read the referenced state_path.
If it contains run_id, call mcp__agentcouncil__autopilot_status.
If state_path is missing but run_id is present, call autopilot_status and treat completed/failed runs as stale.
Follow next_required_action before doing anything else.
If required_tool is review_loop or challenge, run that gate now. Do not continue implementation, verification, or shipping first.
If blocking_reason is present, stop and report the blocker.

This guard is mandatory after context compaction or a resumed host session. The project-local state is authoritative for the next protocol obligation.

Step 0: Set escalation level and read existing conventions

Send the user one message containing both of the following. Do not proceed until you have the answer to item (1).

1. Escalation level — ask:

"How should I handle unknowns during this run?

minimal: interrupt only for critical blockers — security risks, potential data loss, or scope changes that could be destructive

normal (default): interrupt when the wrong assumption would require significant rework of the spec or plan

verbose: ask about anything uncertain before proceeding

Reply with minimal, normal, or verbose (or just press Enter for normal)."

Record the answer as ESCALATION_LEVEL. Default to normal if the user presses Enter or gives no answer.

2. Read existing project conventions — before writing the spec, read these files if they exist (they bound your spec and test strategy):

pyproject.toml — test runner ([tool.pytest.ini_options]), lint config ([tool.ruff] or [tool.mypy]), build commands
pytest.ini or setup.cfg — alternate pytest config
.ruff.toml — alternate ruff config
Makefile — test, lint, build targets

Note: (a) the test command to use in build steps, (b) the lint/type-check command if configured, (c) where tests live.

Critical unknowns always escalate regardless of ESCALATION_LEVEL: security risks, destructive scope (deleting data, breaking APIs), or requirements that contradict each other. For all other unknowns, apply the level the user set.

Step 1: Understand the intent

Read the user's intent. Before writing the spec, list the technical assumptions you are making — about the tech stack, framework, auth model, data storage, deployment target, and any conventions you read in Step 0. Present them as:

ASSUMPTIONS:
1. [assumption]
2. [assumption]
→ Correct me now or I'll proceed with these.

If the intent is genuinely vague (target unclear, scope undefined), ask 1-2 clarifying questions in the same message as your assumptions. Batch everything — one message.

Step 2: Build the spec

From the intent, construct:

spec_id: Short kebab-case identifier (e.g., add-backtester, fix-auth-timeout)
title: One-line title
objective: 1-2 sentence description
requirements: List of specific things that must be built/changed
acceptance_criteria: List of verifiable conditions (e.g., "tests pass", "file contains X")
target_files: Files likely created or modified (paths with auth/, migrations/, infra/, deploy/, permissions/ trigger tier 3)
testing_strategy: Test framework (from Step 0), test locations, expected test types for this change (unit/integration/e2e). Example: "pytest, tests/ dir, unit tests for logic, one integration test for the API endpoint."
behavioral_boundaries:
- Always: actions you will always take (e.g., "run tests before commit", "validate all user inputs")
- Ask first: actions that need approval (e.g., "schema changes", "adding new dependencies")
- Never: prohibited actions (e.g., "modify files outside target_files without documenting", "skip failing tests")
tier: 1 (low-risk), 2 (standard, default), or 3 (sensitive)

Display the spec, then proceed immediately to validation. Do not wait for user confirmation — autopilot is autonomous.

Step 3: Validate and register the run

Write the spec to disk before registering. Create the file docs/autopilot/specs/{spec_id}.md with the full spec content formatted as markdown. This persists the spec for future reference independent of this conversation.

Call mcp__agentcouncil__autopilot_prepare with all spec fields, plus escalation_level=ESCALATION_LEVEL, review_backend=REVIEW_BACKEND if set, and challenge_backend=CHALLENGE_BACKEND if set.

Save the returned run_id and tier. Display:

Spec validated. Run: {run_id}
Tier: {tier} ({reason})

Immediately call mcp__agentcouncil__autopilot_checkpoint:

run_id: the returned run id
protocol_step: "awaiting_spec_review"
stage: "spec_prep"
stage_status: "gated"
next_required_action: "Run the spec review gate before planning."
required_tool: "review_loop"
artifact_refs: {"spec": "docs/autopilot/specs/{spec_id}.md"}

Immediately call mcp__agentcouncil__autopilot_context_pack:

run_id: the returned run id
stage: "spec_prep"
changed_files: target_files
artifact_refs: {"spec": "docs/autopilot/specs/{spec_id}.md"}
refresh_policy: "force"

Save the returned summary as REVIEW_CONTEXT. This is a compact, sanitized context pack that reviewers must use before broad repository exploration. If context generation fails, continue to the review gate only if you still have the full artifact text, target files or changed files, test command hints, and backend workspace access or embedded diff. Otherwise checkpoint protocol_step="blocked_on_context_pack" with the missing inputs and stop.

Step 4: Gate — review the spec

Call mcp__agentcouncil__review_loop to get independent review of the spec:

artifact: The full spec text (requirements, acceptance criteria, target files, constraints)
artifact_type: "plan"
review_objective: "Review this spec for completeness, feasibility, and risk before planning begins"
focus_areas: ["requirements clarity", "acceptance criteria testability", "testing strategy completeness", "behavioral boundaries defined", "scope boundaries", "missing edge cases"]
backend: REVIEW_BACKEND if set
review_context: REVIEW_CONTEXT if set
review_depth: REVIEW_DEPTH
lead_review_model: LEAD_REVIEW_MODEL if set

Handle the gate decision from final_verdict:

pass → call autopilot_checkpoint with protocol_step="spec_review_passed", stage="spec_prep", stage_status="advanced", gate_decision="pass", then proceed to Step 5
revise → call autopilot_checkpoint with protocol_step="paused_for_spec_revision", stage="spec_prep", stage_status="blocked", required_tool="review_loop", and revision_guidance; fix the spec, display changes, and re-run this gate (max 2 revisions, then ask user)
escalate → call autopilot_checkpoint with protocol_step="blocked_on_spec_review", stage="spec_prep", stage_status="blocked", blocking_reason, then display findings, stop, and ask the user how to proceed

Step 5: Plan (follow the plan workflow recipe)

Read agentcouncil/autopilot/workflows/plan/workflow.md — this is the execution recipe.

Read-only mode: Do not write or modify any code during this step. The output is a plan document, not implementation.

Follow the 5-step planning process:

Parse the spec completely — all requirements, acceptance criteria, non-goals, research findings. Do not decompose until fully internalized.
Identify natural decomposition boundaries — schema before logic, interfaces before implementations, shared utilities before callers.
Size and order tasks — XS/S/M/L scale. XL = split it.
Write acceptance probes — every acceptance criterion maps to at least one probe. Specify verification_level, mock_policy, expected_observation.
Write execution order and verification strategy.

Display the plan:

## Plan: {spec_id}

**Verification Strategy:** {narrative}

| Task ID | Title | Complexity | Depends On | Target Files | Verification |
|---------|-------|------------|------------|--------------|--------------|
| task-01 | ...   | small      | —          | ...          | `pytest tests/path/test_foo.py` passes |

| Probe ID | Criterion | Level | Mock Policy | Expected Observation |
|----------|-----------|-------|-------------|----------------------|
| probe-01 | ac-0: ... | unit  | forbidden   | ...                  |

**Execution Order:** task-01, task-02, ...

**Risks and Mitigations:**
| Risk | Severity | Mitigation |
|------|----------|------------|
| [risk from spec gate findings] | high/medium/low | [what you'll do if it materialises] |

Call mcp__agentcouncil__autopilot_checkpoint:

protocol_step: "awaiting_plan_review"
stage: "plan"
stage_status: "gated"
next_required_action: "Run the plan review gate before building."
required_tool: "review_loop"
artifact_refs: include the plan document path or "plan": "displayed-in-conversation" if no file was written

Step 6: Gate — review the plan (MANDATORY)

DO NOT SKIP THIS STEP. The plan must be independently reviewed before any code is written. This is a non-negotiable gate.

Call mcp__agentcouncil__review_loop:

artifact: The full plan text (tasks, probes, execution order, verification strategy)
artifact_type: "plan"
review_objective: "Review this implementation plan for completeness, ordering, risk, and verification coverage"
focus_areas: ["task decomposition", "dependency ordering", "acceptance probe coverage", "scope creep"]
backend: REVIEW_BACKEND if set
review_context: REVIEW_CONTEXT if set
review_depth: REVIEW_DEPTH
lead_review_model: LEAD_REVIEW_MODEL if set

Handle the gate decision:

pass → call autopilot_checkpoint with protocol_step="plan_review_passed", stage="plan", stage_status="advanced", gate_decision="pass", then proceed to Step 7 (do NOT ask the user for confirmation — autopilot is autonomous)
revise → call autopilot_checkpoint with protocol_step="paused_for_plan_revision", stage="plan", stage_status="blocked", required_tool="review_loop", and revision_guidance; read findings, revise the plan, display changes, re-run this gate (max 2 revisions)
escalate → call autopilot_checkpoint with protocol_step="blocked_on_plan_review", stage="plan", stage_status="blocked", blocking_reason; display findings, stop, ask user

If the review_loop tool fails or is unavailable, STOP and tell the user. Do not proceed to build without a reviewed plan.

Step 7: Build (follow the build workflow recipe per task)

Read agentcouncil/autopilot/workflows/build/workflow.md — this is the execution recipe.

For each task in execution_order, follow the increment cycle:

Write test first (RED): Before writing any implementation code, write a test that expresses the expected behavior. Run it — it must FAIL. A test that passes immediately proves nothing.

Implement (GREEN): Write the minimal code to make the failing test pass. Ask: "What is the simplest thing that could work?" Do not over-engineer. Three similar lines of code is better than a premature abstraction.

Confirm (PASS): Run the test. It must pass. If it doesn't, fix the implementation — not the test.

Refactor: With the test green, clean up the implementation without changing behavior. Run tests after any refactor step.

Verify: Check the task's acceptance_criteria are met beyond what the test directly covers.

Commit: Focused commit: {type}({scope}): {description}. Record the SHA.

Bug-fix tasks — prove-it pattern (REQUIRED): For any task that fixes a bug:

Write a test that reproduces the bug. Run it — it must FAIL (confirming the bug exists).
Implement the fix.
Run the test — it must PASS (confirming the fix works).
Run the full test suite — no new failures (regression guard).

A bug fix without a reproduction test is not complete.

Record Evidence: For each task, note:

task_id — the task completed
files_changed — every file touched
test_results — test output summary
verification_notes — how acceptance criteria were checked

After each task's evidence is recorded, call mcp__agentcouncil__autopilot_checkpoint:

protocol_step: "building"
stage: "build"
stage_status: "in_progress"
next_required_action: "Continue the next incomplete build task, or produce BuildArtifact when all tasks are complete."
artifact_refs: include the current evidence location if written, otherwise {"build_evidence": "tracked-in-conversation"}

Build rules (from the recipe):

Rule 0: Simplicity first — after implementing, ask: "Could this be fewer lines? Am I building for hypothetical future requirements?" Write the naive, obviously-correct version first.
Rule 0.5: Scope discipline — touch only task.target_files. If you notice something worth improving outside scope, note it — don't fix it.
Rule 1: The plan is the contract — no silent scope expansion
Rule 2: Tests travel with the code — unit tests for logic, integration tests for API/DB boundaries, E2E tests only for critical user flows. Aim for ~80% unit / ~15% integration / ~5% E2E.
Rule 3: Never commit broken tests
Rule 4: Evidence is not optional
Rule 5: Commit SHAs are the audit trail — each commit should be independently revertable (additive changes before deletions; avoid mixing logic change + format change in one commit)
Rule 6: Safe defaults — new code defaults to conservative behavior (disabled flags, allowlists not blocklists, strict validation). Especially in tier 3 runs.
Rule 7: Feature flags for incomplete slices — if a task lands user-reachable but incomplete behavior, gate it behind a feature flag so the commit is safe to merge.
Rule 8: 100-line limit — if you are about to write more than ~100 lines without running a test, stop and run the tests first.

Regression self-check (every 3 tasks): After completing every third task (task-03, task-06, task-09, ...), before continuing:

Run the full test suite (not just the current task's tests).
Confirm all previously-completed tasks' acceptance criteria still pass.
Confirm the build is clean.

If anything fails, fix it before continuing. This catches regressions while context is fresh, rather than at the final build gate.

After all tasks, display:

## Build Summary

| Task | Commit | Files Changed | Tests |
|------|--------|---------------|-------|
| task-01: {title} | {sha} | {files} | pass/fail |

All tests passing: yes/no
Total files changed: {list}
Commit SHAs: {list}

Then produce a formal BuildArtifact with build_id, plan_id, spec_id, evidence, all_tests_passing, files_changed, and commit_shas.

Call mcp__agentcouncil__autopilot_checkpoint before Step 8:

protocol_step: "build_complete"
stage: "build"
stage_status: "gated"
next_required_action: "Run the build review gate before verification."
required_tool: "review_loop"
artifact_refs: include the BuildArtifact location or "build_artifact": "displayed-in-conversation" if no file was written

Step 8: Gate — review the build (MANDATORY)

DO NOT SKIP THIS STEP. The build must be independently reviewed before verification. This is a non-negotiable gate.

Call mcp__agentcouncil__review_loop:

artifact: A summary of all code changes. Include: the diff summary, per-task evidence (files_changed, test_results, verification_notes), and the list of commit SHAs.
artifact_type: "code"
review_objective: "Review the implementation for correctness, quality, and spec compliance"
focus_areas: ["correctness", "test coverage", "spec compliance", "code quality", "security"]
backend: REVIEW_BACKEND if set
review_context: REVIEW_CONTEXT if set
review_depth: REVIEW_DEPTH
lead_review_model: LEAD_REVIEW_MODEL if set

Handle the gate decision:

pass → call autopilot_checkpoint with protocol_step="build_review_passed", stage="build", stage_status="advanced", gate_decision="pass", then proceed to Step 9
revise → call autopilot_checkpoint with protocol_step="paused_for_build_revision", stage="build", stage_status="blocked", required_tool="review_loop", and revision_guidance; read findings, fix the issues (follow the increment cycle for fixes), re-run this gate (max 2 revisions)
escalate → call autopilot_checkpoint with protocol_step="blocked_on_build_review", stage="build", stage_status="blocked", blocking_reason; display findings, stop, ask user

If the review_loop tool fails or is unavailable, STOP and tell the user. Do not proceed to verify without a reviewed build.

Step 9: Verify

Run the acceptance probes you defined in the plan. For each probe:

Execute the command_hint if specified
Check the expected_observation
Record pass/fail with evidence

Display:

## Verification

| Probe | Criterion | Level | Status | Evidence |
|-------|-----------|-------|--------|----------|
| probe-01 | ac-0: ... | unit | pass/fail | ... |

Overall: passed/failed

Call mcp__agentcouncil__autopilot_checkpoint:

If verification is in progress: protocol_step="verifying", stage="verify", stage_status="in_progress"
If all probes pass: protocol_step="verify_complete", stage="verify", stage_status="advanced", next_required_action="Run challenge if required, otherwise ship."
If probes fail: protocol_step="paused_for_verify_revision", stage="verify", stage_status="blocked", blocking_reason with failed probes

If any probes fail with retry_recommendation = retry_build, go back to Step 7 with revision guidance (max 2 retries).

Bug-fix runs — reproduction test check: If this run's intent was a bug fix, verify:

A reproduction test exists in the diff that was specifically written to fail before the fix.
That test passes now.

If no reproduction test exists, do not mark verify as complete. Return to the build step and add it.

Lint and type-check (if configured): If Step 0 detected a lint or type-check command in the project:

Run it now and confirm it passes.
Record the result in the verification output.
If it fails, fix the issues before the verify step completes.

Step 10: Gate — challenge (conditional)

Only run this gate if tier >= 3 OR target_files touch sensitive paths (auth/, migrations/, infra/, deploy/, permissions/).

If the challenge gate should fire, call mcp__agentcouncil__challenge:

artifact: The verification results + build evidence + spec
assumptions: List of assumptions from the spec and plan
success_criteria: The acceptance criteria from the spec
rounds: 2
backend: CHALLENGE_BACKEND if set

Handle the gate decision from artifact.readiness:

ready → call autopilot_checkpoint with protocol_step="challenge_passed", stage="verify", gate_decision="ready", then proceed to Step 11
needs_hardening → call autopilot_checkpoint with protocol_step="paused_for_challenge_hardening", required_tool="challenge", and revision_guidance; read failure_modes where disposition == "must_harden", fix the issues, re-run verify (Step 9), then re-run this gate
not_ready → call autopilot_checkpoint with protocol_step="blocked_on_challenge", blocking_reason; display failure modes, stop, ask user

If challenge is skipped (tier < 3, no sensitive paths), call autopilot_checkpoint with protocol_step="challenge_skipped" and proceed directly to Step 11.

Step 11: Ship

Display the final delivery summary:


After displaying the summary, call `mcp__agentcouncil__autopilot_checkpoint` with `protocol_step="ship_complete"`, `stage="ship"`, `stage_status="advanced"`, and `next_required_action=null`. This marks `docs/autopilot/active-run.json` inactive.
## Autopilot Complete

**Run:** {run_id}
**Spec:** {spec_id}
**Tier:** {tier}

**Gates passed:**
- Spec review: {pass/revise count}
- Plan review: {pass/revise count}
- Build review: {pass/revise count}
- Challenge: {passed/skipped}

**Delivered:**
- {count} tasks completed
- {count} acceptance criteria verified
- {count} commits: {sha list}

**Files changed:**
- {file list}

Gate Protocol

Every gate follows the same pattern:

Call the protocol — review_loop for spec/plan/build, challenge for post-verify
Read the verdict — final_verdict for review_loop, readiness for challenge
Act on it:
- advance (pass/ready) → continue to next step
- revise (revise/needs_hardening) → fix issues, re-run the gate (max 2 revisions per gate)
- block (escalate/not_ready) → stop, display findings, ask the user

On revision re-runs, pass prior_review_context. When re-running a review_loop gate after a revision, pass the prior cycle's findings (formatted as a short summary including finding IDs, titles, severities, and your resolution notes) as the prior_review_context parameter. This focuses the reviewer on whether the revision actually resolved prior issues and whether it introduced new ones — instead of re-discovering the same terrain from scratch.

If a gate revision loop exceeds 2 iterations, stop and ask the user — do not loop forever.

Escalation during the pipeline (consult ESCALATION_LEVEL):

When you encounter an unknown, ambiguity, or unexpected scope question mid-pipeline, apply the level set in Step 0:

minimal: proceed with best judgment and document your assumption inline ("Assuming X — override this by running the command again with Y"). Escalate only for: security risks, potential data loss, or scope changes that could be destructive.
normal: escalate if the wrong assumption would require significant rework of the spec or plan. Proceed autonomously for low-consequence choices (variable names, minor implementation details, stylistic decisions).
verbose: escalate for any uncertain choice. Ask before proceeding.

Critical blockers (security risk, data loss potential, contradictory requirements) always escalate regardless of level.

Rules

Display the spec before calling autopilot_prepare — but do not wait for confirmation, proceed autonomously
Display the plan before building — but do not wait for confirmation after the plan gate passes, proceed autonomously
Follow the workflow recipes — read plan/workflow.md and build/workflow.md
The plan is your contract — do not silently expand scope during build
Evidence is mandatory — every task needs files_changed, test_results, verification_notes
Durable state is mandatory — after every major stage and every completed build task, call autopilot_checkpoint
Gates are NEVER optional — every stage transition goes through independent review. Skipping a gate is a protocol violation. If a gate tool is unavailable, STOP — do not proceed without review.
On revise, fix the specific findings — do not start over from scratch
On escalate/not_ready, stop and involve the user — do not override the gate
If the spec is wrong, say so before planning — do not build the wrong thing

AgentCouncil Autopilot

Popularity

Invocation

Context Preview

SKILL.md

AgentCouncil Autopilot

Popularity

Invocation

Context Preview

SKILL.md

AgentCouncil Autopilot

Gate Backend Selection

Protocol — follow these steps exactly

Mandatory resume guard: read durable protocol state first

Step 0: Set escalation level and read existing conventions

Step 1: Understand the intent

Step 2: Build the spec

Step 3: Validate and register the run

Step 4: Gate — review the spec

Step 5: Plan (follow the plan workflow recipe)

Step 6: Gate — review the plan (MANDATORY)

Step 7: Build (follow the build workflow recipe per task)

Step 8: Gate — review the build (MANDATORY)

Step 9: Verify

Step 10: Gate — challenge (conditional)

Step 11: Ship

Gate Protocol

Rules

Similar Skills

AgentCouncil Autopilot

Gate Backend Selection

Protocol — follow these steps exactly

Mandatory resume guard: read durable protocol state first

Step 0: Set escalation level and read existing conventions

Step 1: Understand the intent

Step 2: Build the spec

Step 3: Validate and register the run

Step 4: Gate — review the spec

Step 5: Plan (follow the plan workflow recipe)

Step 6: Gate — review the plan (MANDATORY)

Step 7: Build (follow the build workflow recipe per task)

Step 8: Gate — review the build (MANDATORY)

Step 9: Verify

Step 10: Gate — challenge (conditional)

Step 11: Ship

Gate Protocol

Rules

Similar Skills