From multiclaude
Use at the start of any substantive task (implementation, review, research, analysis, heavy reasoning) to route work to Codex and AGY before spending Claude Code's own tokens. Claude acts as orchestrator - dispatch, verify with mechanical gates, synthesize.
How this skill is triggered — by the user, by Claude, or both
Slash command
/multiclaude:orchestrateThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Spend the wallet with headroom. Claude Code's own 5-hour quota is the scarce,
Spend the wallet with headroom. Claude Code's own 5-hour quota is the scarce, shared resource; AGY and Codex run on separate, independently-paid quotas that normally sit idle. Default to delegating substantive work to them and reserve Claude's own quota. Claude's job: classify, dispatch, verify cheaply, synthesize, spot-fix.
What runs on whose quota — read this first. The Agent tool, every Workflow
agent(), and every subagent run a Claude driver on an Anthropic model — your own quota (by default the inherited main-loop model).agentType/subagent_typeswap only the system prompt and tools, not the provider — a custom agent's own definition may pin a different Anthropic model, but nothing here ever moves execution onto AGY / Codex. Work lands on AGY / Codex quota ONLY when something actually executes theagy/codexCLI via Bash. A fan-out of N "rescue" subagents is N Claude drivers = N× your own quota, not offload (observed: a 15-wayagy:agy-rescueworkflow burned ~800k of own Opus quota and offloaded nothing). Real offload = a Bash-only node whose prompt is command-shaped and that carries noschemaand no Read/Edit tools (any of those makes the driver do the work itself). Mechanics in §7.
Two directional rules sharpen this:
§0 is mechanical — a script does it, you read the result. Don't run availability, tier-resolution, or health commands by hand and interpret them; that's exactly the AI work this plugin pushes into code.
Run the probe once, read its block:
node "${CLAUDE_PLUGIN_ROOT}/scripts/probe.mjs"
(Pure Node — runs the same on Linux, macOS, and Windows; no bash/python3.)
It emits one labeled block covering, deterministically:
orchestrate: OFF if the project CLAUDE.md has orchestrate: off. If OFF: tell the user once, work normally, skip §1–§8.codex / agy present or MISSING (with the exact install
command), and whether the superpowers / claude-mem plugins are enabled.agy models (names drift — the script matches Claude.*Opus
etc. so you never hardcode). Use the printed names verbatim. A tier shown
as none carries its step-down (Opus→Sonnet→Gemini-high→Gemini-medium).Add --smoke to also round-trip each CLI (reply with exactly: OK) for a
HEALTHY/DEGRADED verdict — install ≠ authenticated ≠ working, and a CLI can
pass command -v yet hang on a real call (observed: a real --print hung
despite agy models listing tiers). The round-trip costs a little external
quota, so it's opt-in; run node "${CLAUDE_PLUGIN_ROOT}/scripts/probe.mjs" --smoke once when you want to catch auth/backend breakage before depending on
a CLI.
Act on the probe block (this part is judgment — yours):
node "${CLAUDE_PLUGIN_ROOT}/scripts/setup.mjs" apply (idempotent — merges the
marketplace + enabledPlugins entries from the repo's setup/settings.json).none → use the step-down the probe noted. DEGRADED (with
--smoke) → route around that CLI this session. (This within-AGY step-down
is distinct from §5 quota re-routing.)Wallet headroom snapshot (proactive). The multiclaude hooks inject a
compact headroom line automatically — at session start and before every
Task/Workflow dispatch ([multiclaude wallets @ …] CODEX … | CLAUDE … | AGY …). Normally you run nothing: read that line from context and let it bias
routing — spend the wallet with headroom (§2) and don't start work a wallet
can't finish (§5). It carries:
The snapshot is cached and refreshed in the background, so it can lag a few
minutes and the first of a session may read CLAUDE refreshing until the
ccusage value lands — never block on it. If the line is absent (hooks
disabled / hooks/hooks.json not loaded) run it by hand; the full readout is
always at /multiclaude:quota:
node "${CLAUDE_PLUGIN_ROOT}/scripts/usage-snapshot.mjs"
Default action for any substantive task: delegate. Keep a task local ONLY when one of these holds:
If none hold, delegate — even when unsure.
Two task families:
File surface & mechanical checkability are verification aids, not gates. For edit tasks they decide how cheaply you verify (clean file list + passing gates → skim only, per §3). Their ABSENCE never bars delegation — it just means you verify by reading the diff or judging the output (degraded, note it). Non-edit tasks never have them.
Self-check floor. Before running a substantive task on your OWN quota, ask: would Codex or AGY handle this? If yes, route it out. AGY is the most-missed target, and its Claude tier the easiest to forget — heavy reasoning feels like your own job. Heavy-reasoning work that belongs on AGY Claude / Gemini-high: multi-step planning, tricky-bug analysis, architecture trade-offs, synthesizing large material, deep code review. If several substantive tasks have passed and AGY or Codex stayed idle, you're under-delegating — fix it on the next task.
| Task type | Agent | Model |
|---|---|---|
| Code implementation, refactors, test writing | Codex | Codex default |
| Code review, research, analysis, docs | AGY | Gemini-medium tier |
| Hard review/research, architecture analysis | AGY | Gemini-high tier |
| Heavy reasoning (would use own Sonnet) | AGY | Sonnet tier |
| Hardest reasoning (would use own Opus) | AGY | Opus tier |
Dispatch mechanics:
Codex: use the Agent tool with subagent_type: "codex:codex-rescue"
(preferred — shared runtime), or codex exec via Bash for fully scripted
runs. Codex runs in the configured bypass mode and edits files directly. The
Agent tool returns Codex's result inline — no polling.
AGY — always dispatch for an INLINE result; never poll a job. Two paths, both hand the result back in the same turn:
Default tier, bounded research / review / analysis → the Agent tool
with subagent_type: "agy:agy-rescue". It runs AGY in the foreground and
returns the output directly — no job id, no status file, no polling. Use
this whenever you don't need a specific model tier.
Specific tier / edits / long timeout → the agy --print CLI in a
backgrounded Bash (run_in_background: true); the harness notifies you
on completion and you read the captured stdout (don't hand-roll a waiter —
§8). This is the ONLY way to
select a tier (--model) or to edit (--dangerously-skip-permissions) —
the agy:agy-rescue subagent always runs AGY's default tier.
Write the prompt to a temp file (via the Write tool, so no shell parses
it). Always wrap the call in an OS-level timeout — --print-timeout
is NOT a reliable backstop (observed: a --print-timeout 30m job hung 59
minutes at near-zero CPU; timeout(1) is the only cap that actually
fired). Two passing forms by prompt size:
Small prompt (≲100 KB) — the prompt is the value of --print
(alias --prompt), read back inside double quotes (NOT a trailing
positional):
timeout 600 agy --print="$(cat /tmp/agy_task.md)" \
--model "<resolved tier name>" \
--dangerously-skip-permissions # edit tasks only — omit for read-only
Large / grounded prompt (>~100 KB) — pass it on stdin (--print
with no value reads stdin). argv caps a single argument at ~128 KB
(MAX_ARG_STRLEN), so --print="$(cat bigfile)" dies with Argument list too long and never runs — and an inlined-code prompt (§"Ground the
prompt") routinely exceeds that:
cat /tmp/agy_task.md | timeout 600 agy --print \
--model "<resolved tier name>" \
--dangerously-skip-permissions # edit tasks only — omit for read-only
Check the CLI's own exit code — never chain agy …; echo done. A
trailing echo's exit 0 masks the CLI's failure (observed: an Argument list too long that died was reported as success by a following echo).
⚠️ Do NOT use the AGY MCP tools (
agy_rescue/agy_review/agy_adversarial_review) or--background+agy_status/agy_resultpolling. The plugin writes each job's per-job<id>.jsonexactly once asstatus: "queued"and never updates it — only the separatestate.jsonindex advances — so a waiter that polls the job file (or a hand-rolledsleep+catloop) hangs onqueuedforever, even after the job has finished and written its result. The two inline paths above sidestep the broken state tracker entirely. (Observed: a completed review left a poller stuck ~14 min.) The AGY plugin may also auto-start an MCP server (agy-mcp-server.mjs) so its tools show up as available — ignore them regardless; the two inline CLI paths above are the only supported AGY routes.
⚠️ Prompt-passing failure modes (CLI path):
agy --print --print-timeout 30m … "<prompt>"(prompt as a trailing positional) makes--print-timeoutas its value — AGY acts on the literal text "--print-timeout" and ignores your real prompt. And--print="$(cat file)"without the surrounding double quotes word-splits a multi-line prompt. Always use the quoted--print="$(cat <file>)"form for small prompts, or the stdin form above for large ones.
Every delegation prompt must contain: the task spec, the expected file list, project conventions that matter (test command, lint command, style notes), and the instruction to run the project's tests before finishing.
Ground the prompt, and forbid fabrication. Delegated agents — AGY especially, when it fans out to its own subagents — will sometimes fabricate file contents rather than admit a read failed, and the result reads as confident, plausible, and wrong. Two defenses, both required:
Run in order after a delegated edit task returns:
git diff --stat — touches only the expected file set (± clearly
justified additions like new test files).All pass: skim git diff --stat plus changed hunks in critical files
only. Accept. NO deep read.
Any fail: read the failing output + relevant hunks. Then: (a) spot-fix if
small; (b) dispatch rework to the OTHER agent with a failure summary; (c)
after 2 failed reworks, take over directly.
Projects without tests/typecheck/lint: diff-scope check + actually read the diff (degraded, more expensive — note it).
Non-edit delegations (review/research): the gates above are for edit tasks. A review or research result has no mechanical gate — accept it by judging completeness and usefulness yourself. A weak result is simply re-requested (optionally with a sharper prompt or a different tier); it does NOT enter the edit-rework loop in §6, which applies only to failed edit tasks.
Verify code-fact claims before trusting them. When a research/review deliverable asserts specific code facts — file contents, function signatures, schema columns, enum values, route names — do NOT accept them on read. Spot-check the load-bearing claims against the real files (cheap and targeted: a couple of Reads/greps). If any were fabricated, discard the deliverable and re-dispatch with the real facts embedded in the prompt (see "Ground the prompt" in §2). This is the one gate non-edit work does get, because a confidently-wrong spec is worse than none. (Observed: AGY's subagents invented a whole data model and DAL signatures after silent read failures; only a diff against the real files caught it.)
git checkout . && git clean -fd (safe — tree was clean at
dispatch), THEN send the rework.isolation: 'worktree'). Two agents writing one working tree at once breaks
this protocol — serialize them instead. Non-edit tasks have no writes and
parallelize freely (§7).Prefer the wallet with the most headroom and keep your own Claude block in reserve. Headroom biases routing; it does not override task-fit (§2) — a near-cap wallet is deprioritized, not banned, and a tiny task that obviously fits is still fine.
Exhaustion is also detected via quota/rate-limit errors from a call. On one:
A quota failure does NOT consume a rework hop (no work product was produced).
Session exhaustion marks: after a quota error, mark that agent/tier exhausted for the REST OF THE SESSION and skip it immediately for later tasks. Tell the user once per session which agents are marked. Fresh sessions re-probe naturally.
Codex fails gates → AGY reworks (failure summary in prompt). AGY fails gates → Codex reworks (failure summary in prompt). Hard limit: 2 rework attempts total, then Claude takes over. Rework prompts MUST summarize what the previous attempt got wrong.
Delegation is only fast if independent tasks run concurrently. Default to dispatching in parallel; serialize ONLY when one task's output feeds the next.
agy:agy-rescue Agent calls (and/or
Codex agents, and/or backgrounded agy --print Bash jobs) in a SINGLE turn,
then collect the inline results. A review across N modules or several
independent research questions should all be in flight at once, not one after
another. (No agy_status polling — inline dispatch means each result arrives
as its agent returns; bound any genuine external poll — §8.)agy --print / codex exec Bash jobs over
a Workflow of rescue agents. The Bash jobs shell out to external quota directly
with none of the N-drivers trap; a Workflow of *-rescue agents spends N
Claude drivers on YOUR quota unless every node is hand-built to shell out (see
the next paragraph) — needless risk when you only want parallel CLI calls.isolation: 'worktree', and — crucially — the
harness tracks each agent's completion, so there is no waiter to hand-roll and
none of the §8 hung-poller / self-matching-pgrep failure class can occur. But
a Workflow does not offload by itself — each agent() runs a Claude driver
on your own quota exactly like the Agent tool (see the top-of-file box and the
driver-thin rule below); setting agentType alone offloads nothing. To put
workflow work on external quota each node must shell out: a Bash-only
agentType: 'codex:codex-rescue' / 'agy:agy-rescue' (or a plain Bash codex exec / agy --print --model "<tier>") node with a command-shaped prompt and
no schema. Do NOT reach AGY through its MCP tools inside a workflow (the
broken poller, §2). Workflows spend many agents and tokens; use them for
genuinely parallel, substantial work, not trivial pairs.Keep offload drivers thin — a *-rescue agent is a Claude driver, not the
external model. Every Agent / Workflow agent() runs an Anthropic model as its
agent loop; it spends the external quota ONLY if that loop actually shells out to
the CLI. Three things make a driver do the work itself instead of delegating —
never put them on an offload node:
schema — forces a StructuredOutput tool call, so the driver composes
the answer itself;agy --print --model "<tier>" "<task>"".Use the Bash-only agentType: 'codex:codex-rescue' / 'agy:agy-rescue' (or a
plain Bash codex exec / agy --print node) with a command-shaped prompt and no
schema; the work then lands on Codex/AGY quota.
❌ ANTI-PATTERN (Observed: ~800k tokens of own quota burned):
parallel(GROUPS.map(g => agent(prompt, {agentType: 'agy:agy-rescue', schema})))spawned 15 main-model drivers that read the files themselves — not 15 AGY processes. Never pair a*-rescueagentType with aschema+ read tools.
Verify the quota, not just the result. Offload failures are silent — the
result still comes back, just billed to the wrong wallet. After an offload
dispatch, confirm it landed externally: for CLI jobs, that a real process did
the work (ps aux | grep -E '[a]gy|[c]odex' while it runs); for Workflow /
Agent jobs, open /workflows — agents showing your main model with high
token counts mean you are NOT offloading. Treat unexpectedly-high local spend as
a delegation failure, fix the node (command-shaped prompt, no schema, Bash-only),
and re-run before continuing.
Acceptance gates (§3) and the one-writer protocol (§4) still apply to every delegated change, parallel or not: gate each result and land it as its own commit.
Backgrounded Bash jobs and tracked agents re-invoke you automatically when they
finish — so the default is to not poll at all. Dispatch
(run_in_background: true, or an Agent / Workflow call), end your turn, and act
on the completion notification. A hand-rolled wait loop is usually pure waste,
and it is where the runaway-waiter failures come from.
When you genuinely must poll something the harness can't notify you about (an external queue, a status that lives only behind a tool call), every wait MUST be:
pgrep -f "<task-id>" matches the
waiter's own command line (which contains that id), so it never sees the
worker exit and spins forever. Capture the worker PID and watch that.agy_status / per-job <id>.json trap
in §2 is exactly this — which is why both AGY paths there are inline, not
polled).Canonical bounded, PID-based wait (only when notifications aren't available):
# $WORKER_PID = the PID to wait on (NOT a -f substring of it)
deadline=$(( SECONDS + 1800 )) # 30-min hard cap
while kill -0 "$WORKER_PID" 2>/dev/null; do
[ "$SECONDS" -ge "$deadline" ] && { echo "wait: deadline hit for $WORKER_PID" >&2; break; }
sleep 5
done
⚠️ Observed failure: a
pgrep -f "<task-id>"waiter self-matched its own command line and spun onsleep 5for an hour — the worker had exited long before. If a waiter ever outlives its job,kill -9it (a self-matchingpgreploop won't die on a plainkill) and use the notification path next time.
Hung vs. working. A healthy agentic job accrues CPU as it makes tool calls; a stalled model/auth call does not — so compare accumulated CPU to elapsed time:
ps -o pid,etime,time,stat -p <PID> # or /proc/<PID>/stat fields 14–15 (utime/stime)
Near-zero accumulated CPU + long elapsed + zero output = hung — kill it
(kill -9) and re-dispatch single-shot (inline the content; see §2). This is
how the 59-minute agy --print hang was diagnosed: 0.41 CPU-seconds against ~59
minutes elapsed.
Recovering a killed / partial Workflow. Completed agent() outputs survive
a TaskStop — they're already written to the run's transcript dir. When you
only need the finished subset, extract their StructuredOutput inputs with jq
instead of resumeFromRunId (cheaper, no straggler re-runs on your own quota):
jq -c '.. | objects
| select(.type? == "tool_use" and .name? == "StructuredOutput") | .input' \
agent-*.jsonl
(Recovered 13/15 structured maps this way after a kill — no re-run needed.)
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub somecodecat/multiclaude --plugin multiclaude