From session-orchestrator
Runs an autonomous session-orchestration loop chaining session-start → plan → wave-executor → session-end with 10 kill switches for safe multi-iteration execution.
How this skill is triggered — by the user, by Claude, or both
Slash command
/session-orchestrator:autopilotsonnetThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> Skip silently when `persistence: false` in Session Config.
Skip silently when
persistence: falsein Session Config.
Before any Phase 1 work, run the parallel-aware preamble per skills/_shared/parallel-aware-preamble.md. The preamble detects other active sessions in the worktree-family via findPeers(repoRoot, { mySessionId }), classifies the caller's mode via classifyMode(callerMode) against the exclusivity-matrix, and either:
PASS_THROUGH (no other session / always-ok mode) → continue to Phase 1EXCLUSIVE_BLOCKED → fires Exclusive-Conflict AUQ from skills/_shared/parallel-aware-auq.mdPROMOTION_OFFER → fires Worktree-Promotion AUQ (via enterWorktree() from scripts/lib/autopilot/worktree-pipeline.mjs — see parallel-aware-auq.md outcome-handling)On any non-PASS_THROUGH outcome that does not result in immediate exit, append a Deviation to STATE.md via appendDeviationOnDisk(repoRoot, isoTimestamp, message) from scripts/lib/state-md.mjs.
Implementation reference: skills/_shared/parallel-aware-preamble.md § Implementation.
AUQ reference: skills/_shared/parallel-aware-auq.md.
Phase C-1.b complete (2026-04-25, issues #295 + #300). Runtime at
scripts/lib/autopilot/kill-switches.mjs:18-32 (the frozen KILL_SWITCHES enum is
SSOT) enforces all 10 kill-switches:
max-sessions-reached, max-hours-exceeded,
resource-overload, low-confidence-fallback (with iter-1-fallback /
iter-2+-exit asymmetry), user-abort, token-budget-exceeded (cumulative tokens
≥ --max-tokens).stall-timeout — no progress marker in
autopilot.jsonl within the threshold (default 600s; missing file → no kill).spiral, failed-wave, carryover-too-high.
Read schema-canonical fields off the sessionRunner return shape:
agent_summary.{spiral, failed} (numeric counts) and effectiveness.{carryover, planned_issues}. Absent fields → no kill (forward-compatible: a sessionRunner
that does not yet emit those fields silently no-ops the post-session gates).Atomic autopilot.jsonl writer (tmp+rename, schema_version 1) and silent-clamp
parseFlags shipped in C-1. autopilot_run_id is passed into sessionRunner
via args.autopilotRunId; production callers MUST persist it into the per-iteration
sessions.jsonl record (additive optional field, schema_version 1 compatible).
See skills/wave-executor/SKILL.md § Return Shape Contract and
skills/session-end/SKILL.md § Phase 3.7.
Autopilot collapses the per-session attention cost when Mode-Selector is confident enough to make routine decisions autonomously. A productive day commonly ships 3–7 sessions; each manual session-start costs the user 10–60 seconds of context-switch attention. When the session is genuinely routine (mechanical refactor, post-merge housekeeping, repeated follow-ups from a planned epic), that attention cost is pure overhead.
/autopilot reads the Mode-Selector recommendation, executes the recommended session if
confidence clears the threshold, then loops — checking kill-switches between iterations.
The user invokes the loop once and walks away; autopilot stops itself when work runs out
or quality degrades.
This is opt-in by design: autopilot never starts itself. The user must run
/autopilot explicitly. Configuration thresholds (--max-sessions, --max-hours,
--confidence-threshold) are CLI flags, not Session Config defaults — the user signals
intent for THIS run, not a standing policy.
/autopilot [--max-sessions=N] [--max-hours=H] [--confidence-threshold=0.X] [--dry-run]
| Flag | Default | Bounds | Meaning |
|---|---|---|---|
--max-sessions | 5 | 1..50 | Iteration cap (graceful exit when reached) |
--max-hours | 4.0 | 0.5..24.0 | Wall-clock budget for entire loop |
--confidence-threshold | 0.85 | 0.0..1.0 | Minimum selectMode confidence for auto-execute |
--dry-run | false | — | Print planned iterations without executing |
Out-of-range values silently clamp to bounds. --dry-run exits after printing — never
invokes session lifecycle.
state := { iterations_completed: 0, started_at: now(), kill_switch: null, sessions: [] }
WHILE state.iterations_completed < max-sessions:
# Pre-iteration kill-switches (6)
IF aborted: kill_switch := 'user-abort'; break
IF state.iterations_completed >= max-sessions:
kill_switch := 'max-sessions-reached'; break
IF (now() - state.started_at) > max-hours:
kill_switch := 'max-hours-exceeded'; break
IF cumulative_tokens_used >= max-tokens:
kill_switch := 'token-budget-exceeded'; break
IF resource_verdict() == 'critical' AND peer_count() > autopilot-peer-abort:
kill_switch := 'resource-overload'; break
recommendation := mode-selector.selectMode(<live signals from session-start Phase 7.5>)
IF recommendation.confidence < confidence-threshold:
IF state.iterations_completed == 0:
fallback_to_manual() # iteration 1: hand off cleanly to manual /session flow
ELSE:
kill_switch := 'low-confidence-fallback' # iteration 2+: exit, let user decide
break
cap := resource_adaptive_cap()
session_result := run_session(mode=recommendation.mode, agents_per_wave_cap=cap)
state.sessions.append(session_result.session_id)
# Post-iteration kill-switch (1)
IF stalled(autopilot.jsonl) >= stall-timeout: kill_switch := 'stall-timeout'; break
# Post-session kill-switches (3)
IF session_result.spiral_detected: kill_switch := 'spiral'; break
IF session_result.failed_waves > 0: kill_switch := 'failed-wave'; break
IF session_result.carryover_ratio > 0.50: kill_switch := 'carryover-too-high'; break
state.iterations_completed += 1
write_autopilot_jsonl(state, kill_switch)
print_summary(state, kill_switch)
Atomicity rule: iteration boundaries are atomic. A session must complete (/close
including the post-session writes) before the next iteration starts. Autopilot does NOT
abort sessions mid-flight; kill-switches are checked AFTER each session completes.
All 10 kill-switches, grouped by check phase (mirrors the KILL_SWITCHES enum in
scripts/lib/autopilot/kill-switches.mjs:18-32):
| Kill-switch | Phase | Trigger | Recovery hint |
|---|---|---|---|
max-sessions-reached | pre-iteration | iterations_completed >= --max-sessions | Graceful — not an error. |
max-hours-exceeded | pre-iteration | Wall-clock exceeds --max-hours | Re-run with higher --max-hours or address slow waves. |
resource-overload | pre-iteration | verdict==critical AND peers > autopilot-peer-abort | Wait for peer sessions to complete or close them. |
low-confidence-fallback | pre-iteration | confidence < threshold (iteration 2+) | Re-run with lower --confidence-threshold or run next session manually. |
user-abort | pre-iteration | Ctrl+C / Esc (AbortSignal) | Re-run when ready. |
token-budget-exceeded | pre-iteration | cumulative_tokens >= --max-tokens (#355) | Re-run with a higher --max-tokens budget or split the work. |
stall-timeout | post-iteration | No progress marker in autopilot.jsonl within threshold (ADR-364 §3; default 600s) | Inspect the stalled iteration; missing telemetry file is NOT a kill. |
spiral | post-session | wave-executor spiral detection fires (agent_summary.spiral > 0) | Triage the spiraling wave manually; autopilot will not retry. |
failed-wave | post-session | Any wave reports agent_summary.failed > 0 | Investigate failure mode (test contract drift, env issue). Re-run after fix. |
carryover-too-high | post-session | carryover/planned > 0.50 | Last session under-delivered. Reduce scope or split issues before resuming. |
Autopilot does NOT hard-block on peer Claude processes. It adapts agents-per-wave cap
per iteration based on the most-restrictive resource signal.
| Tier | RAM free | Swap | Peers | macOS memory_pressure | cap |
|---|---|---|---|---|---|
| green | ≥ 6 GB | < 1 GB | ≤ 2 | ≥ 30% free | Session Config default |
| warn | 4–6 GB | 1–2 GB | 3–4 | 15–30% free | 4 |
| degraded | 2–4 GB | 2–3 GB | 5–6 | 5–15% free | 2 |
| critical | < 2 GB | > 3 GB | > 6 | < 5% free | 0 (coord-direct) |
Most-restrictive-signal-wins: [ram=8GB, swap=0, peers=7] → critical (peer rule wins).
Defaults are conservative initial estimates. Phase C-3 follow-up calibrates the swap and memory_pressure thresholds against real autopilot-run effectiveness data.
Phase C-1 ships runLoop as a pure controller. Phase C-1.c ships buildLiveSignals as
the canonical signals-assembly helper. This section documents the in-process driver
protocol (Option B from #301): how Claude — running as the coordinator in a chat
session — drives runLoop between manual /session invocations. The headless wrapper
(Option A, scripts/autopilot.mjs CLI spawning claude -p) is reserved for Phase C-5.
runLoop requires four injected dependencies:
| Field | Signature | Source |
|---|---|---|
modeSelector | () => Promise<{mode, confidence, rationale?}> | wraps selectMode(await buildLiveSignals()) |
sessionRunner | ({mode, autopilotRunId}) => Promise<{session_id, agent_summary?, effectiveness?}> | wraps a /session <mode> invocation; reads sessions.jsonl tail to construct return value |
resourceEvaluator | () => {verdict} | wraps evaluate(await probe(), thresholds) from resource-probe.mjs |
peerCounter | () => number | reads claude_processes_count from a fresh probe() snapshot |
abortSignal is optional (Ctrl+C / Esc → user-abort kill-switch).
import { runLoop, parseFlags } from '$PLUGIN_ROOT/scripts/lib/autopilot.mjs';
import { buildLiveSignals } from '$PLUGIN_ROOT/scripts/lib/build-live-signals.mjs';
import { selectMode } from '$PLUGIN_ROOT/scripts/lib/mode-selector.mjs';
import { probe, evaluate } from '$PLUGIN_ROOT/scripts/lib/resource-probe.mjs';
const flags = parseFlags(process.argv.slice(2));
const modeSelector = async () => {
// Each iteration rebuilds signals from current disk state. STATE.md will be
// freshly idle-reset by the previous /close, sessions.jsonl will have the
// new tail entry, etc. This is the contract: live signals every iteration.
const signals = await buildLiveSignals({ backlogLimit: 50 });
return selectMode(signals);
};
const resourceEvaluator = () => {
const snapshot = probeSync(); // or cached snapshot if probe is async
return evaluate(snapshot, thresholds);
};
const peerCounter = () => {
// Synchronous-friendly count from a recent probe snapshot.
return latestSnapshot.claude_processes_count ?? 0;
};
const sessionRunner = async ({ mode, autopilotRunId }) => {
// The coordinator (Claude) invokes /session <mode> manually here. After the
// session completes (/close runs, sessions.jsonl appended), this function
// reads the tail entry and projects it into the runLoop return-shape.
const tail = readSessionsJsonlTail(1); // last line, normalized
return {
session_id: tail.session_id,
agent_summary: tail.agent_summary, // {complete, partial, failed, spiral}
effectiveness: tail.effectiveness, // {planned_issues, carryover, completion_rate, ...}
};
};
const result = await runLoop({
...flags,
modeSelector,
sessionRunner,
resourceEvaluator,
peerCounter,
});
The in-process driver has Claude (the coordinator) call /session <mode> between
runLoop iterations, with runLoop orchestrating the kill-switches. Trade-offs:
buildLiveSignals
against real Phase 7.5 swap before headless complexity. Each iteration carries
inter-session memory through STATE.md / sessions.jsonl / learnings.autopilot_run_id PropagationWhen runLoop invokes sessionRunner({mode, autopilotRunId}), the per-iteration
sessions.jsonl record MUST carry autopilot_run_id: <id>. session-end Phase 3.7
writes this field. Manual sessions write null or omit it — readers treat both
identically per the v1 schema additive convention. See
skills/session-end/session-metrics-write.md.
A live /autopilot invocation against this wiring produces a non-zero confidence
recommendation when at least one signal source is populated (state-md rec fields,
sessions.jsonl tail, learnings, or backlog). Confidence at 0.0 with all four sources
populated is a Mode-Selector heuristic bug (file as [Mode-Selector v1.x quirk] issue),
not an autopilot bug.
One record per /autopilot invocation, written to .orchestrator/metrics/autopilot.jsonl
via atomic tmp + rename. See docs/prd/2026-04-25-autopilot-loop.md § Output for the
full schema.
Each iteration's sessions.jsonl entry gets an additional optional field
autopilot_run_id (string or null) so retros can join across the two files without
schema changes.
Manual sessions write autopilot_run_id: null (or omit the field — both treated
identically by readers per the v1 schema additive convention).
mode-selector.mjs::selectMode — sole source of mode + confidence per iteration.
Autopilot does not implement its own mode logic; v1.x quirks affect autopilot exactly
as they affect manual session-start.resource-probe.mjs::probe + evaluate — extended in Phase C-2 with swap and
memory_pressure signals. Existing consumers (manual session-start, wave-executor)
benefit from the new signals automatically.session-start / session-plan / wave-executor / session-end — invoked
unmodified. Autopilot is a controller around the existing session lifecycle, not a
replacement.session-registry.mjs — peer-count signal source. Autopilot reads but does not
write to the registry beyond the standard hook.mode-selector-accuracy — autopilot iterations write accuracy learnings exactly
like manual sessions (Phase B-4 contract). The chosen field reflects autopilot's
auto-execute decision, which equals recommendation.mode when confidence ≥ threshold./close defaults. /close already pushes to
origin; autopilot does not add PR creation, merge, or force-push behavior.autopilot.jsonl is the SOLE writer's responsibility of autopilot.mjs. Other
skills must not append to or rewrite this file.selectMode
output, does not re-rank alternatives, does not patch confidence values./autopilot from inside a running session. The skill is a top-level
command; nested invocation is undefined behavior.autopilot.jsonl schema additively without bumping schema_version.
Readers MUST treat unknown fields as a forward-compat signal, not as corruption.selectMode to force-run a specific mode. If you want to run a specific
mode, use /session [mode] manually — that is autopilot's fallback path.--confidence-threshold below 0.5 in production. The Mode-Selector
fallback table treats < 0.5 as suggestion-only; autopilot at that threshold becomes
a random-walk over modes.scripts/lib/autopilot.mjs. The skill documents the contract; the runtime enforces it.The autopilot block in Session Config (CLAUDE.md / AGENTS.md) accepts the following fields. All fields are optional; omitting a field applies the documented default.
autopilot:
bg-isolation: worktree # worktree | none (default: worktree) — see #431
Type: worktree | none — Default: worktree
Controls whether autopilot --multi-story creates a per-story git worktree before spawning sub-sessions.
worktree (default): Each story pipeline receives its own isolated git worktree via EnterWorktree. Parallel writes are safe because every agent edits a private working copy. Cost: disk space proportional to the number of concurrent stories plus the latency of worktree creation at story-start.
none (opt-in): No worktrees are created. Sub-sessions spawn directly in the main working tree. Useful for monorepos where worktree creation is impractical due to large node_modules, sparse-checkout setups, or build caches that must be shared. Requires file-scope discipline: when max-stories > 1, every story must edit a disjoint set of files. If two stories touch the same file simultaneously, edits will collide silently. To enforce acknowledgement of this discipline, autopilot-multi requires --deconflict-paths=<glob> whenever bg-isolation: none AND max-stories > 1; omitting the flag is a hard error (exit 1). See .claude/rules/parallel-sessions.md PSA-001/002/003.
Operator-awareness note: CC 2.1.133 silently flipped worktree.baseRef default from head to origin/<default>, breaking users who relied on unpushed commits being included in their worktree base. The same class of upstream change can affect bg-isolation semantics in a future CC release. Treat CC changelog entries related to worktree or --bg session behaviour as requiring a re-read of this section before upgrading.
docs/prd/2026-04-25-autopilot-loop.mdscripts/lib/autopilot.mjs — exports runLoop, parseFlags, writeAutopilotJsonl, KILL_SWITCHES, FLAG_BOUNDS, SCHEMA_VERSION, DEFAULT_PEER_ABORT_THRESHOLD, DEFAULT_JSONL_PATH, DEFAULT_CARRYOVER_THRESHOLDtests/lib/autopilot.test.mjscommands/autopilot.mdskills/mode-selector/SKILL.mdscripts/lib/resource-probe.mjsscripts/lib/session-registry.mjsskills/wave-executor/SKILL.md § Return Shape Contractskills/session-end/session-metrics-write.mddocs/prd/2026-04-24-state-md-recommendations-contract.mddocs/prd/2026-04-25-mode-selector.md--confidence-threshold=auto — let autopilot self-tune from accumulated
mode-selector-accuracy learnings? Requires ≥ 20 accuracy learnings before useful.autopilot-active: true field — should other Claude sessions detect via the
session-registry and refuse to start during an autopilot run? Dogfooding will inform.failed-wave granularity — distinguish "agent failed but was retried successfully"
from "wave ended with un-recovered failures"? Requires wave-executor schema audit.npx claudepluginhub kanevry/session-orchestrator --plugin session-orchestratorOrchestrates multi-wave session execution with role-based subagents, quality gates, plan adaptation, and progress tracking. Core engine for feature and deep sessions.
Executes long-running tasks autonomously across Claude Code sessions using headless bash loops or in-session hooks. Supports structured decomposition for projects and Ralph-style iteration for TDD, fixes, refactoring.
Orchestrates multi-phase project execution by dispatching dedicated persona agents for planning, execution, verification, and review. Use after spec approval for automated phase chaining.