Drive a Codex sub-agent from Claude Code over a herdr pane. ALWAYS invoke when delegating to a Codex sub-agent ("have codex do X", "run this in the background", "start this task", "send this payload", "continue this session") or to Pi/Claude/OpenCode/Hermes, running agents in parallel, spawning a side pane, waiting for an agent to finish, reading another agent's output, approving its prompts, or when the user mentions herdr (pane, agent, send, wait, run, split) or HERDR_PANE_ID is set. For Codex, drive everything through ONE tool — scripts/codex.py (start/watch/send/reply/await/status/end/sessions). The recommended flow is event-driven: `start --no-wait` to spawn, then arm the Monitor tool with `codex.py watch` to STREAM one JSON verdict per state change (question, plan, completion); react with `reply`; it auto-approves permission gates, auto-engages plan mode when the task mentions a plan, and self-closes on verified success. (A one-shot mode — background a blocking verb, read its single verdict — still works.) Python encodes the spawn-readiness race, verified send, full-width capture, marker+verification completion, never-truncated plans, session continuity across pane-slot shifts, structured pause reasons (question vs plan-menu vs blocked widget), and cleanup. References cover the herdr substrate (namespaces, parallel fleets, raw IPC, traps) beyond a single Codex.
How this skill is triggered — by the user, by Claude, or both
Slash command
/claude-code-herdr-plugin:claude-to-codexThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You can spawn **Codex** (also Pi/Claude/OpenCode/Hermes) into a **herdr** side pane and drive it from Bash. For Codex, you need **one tool: `scripts/codex.py`** — it absorbs every sharp edge (spawn timing, lost sends, "finished vs paused?", plan truncation, pane renumbering, cleanup) and returns **structured JSON verdicts**. Two ways to drive it:
references/agent-vs-pane.mdreferences/architecture.mdreferences/cli-and-ipc-reference.mdreferences/codex-and-agents.mdreferences/events-and-subscribe.mdreferences/fake-and-custom-agents.mdreferences/herdr-cli.mdreferences/multi-agent-patterns.mdreferences/pane-lifecycle.mdreferences/permission-handling.mdreferences/pitfalls-and-traps.mdreferences/reading-output.mdreferences/scripting-patterns.mdreferences/sending-input.mdreferences/status-model.mdreferences/waiting-and-async.mdscripts/_core.pyscripts/codex.pyscripts/herdr_client/LICENSEscripts/herdr_client/NOTICEYou can spawn Codex (also Pi/Claude/OpenCode/Hermes) into a herdr side pane and drive it from Bash. For Codex, you need one tool: scripts/codex.py — it absorbs every sharp edge (spawn timing, lost sends, "finished vs paused?", plan truncation, pane renumbering, cleanup) and returns structured JSON verdicts. Two ways to drive it:
start --no-wait to spawn, then arm the Monitor tool with the returned result.monitor command (codex.py watch). It streams one JSON verdict per state change — a question, a plan, completion — as notifications. You react with reply --no-wait; it auto-approves permission gates and self-closes the pane on verified success. See "The recommended loop" below.start/send/reply/await, run_in_background: true). It blocks until Codex settles, prints one JSON verdict, and exits — which auto-notifies you. Read result.state, run result.next_action.command, repeat until completed, then end.Either way you never scrape a screen, poll status, or sequence raw sends. Everything beyond one Codex (fleets, other agents, raw herdr) lives in references/.
The user delegates to Codex or another agent ("have codex do X", "let pi handle it", "run it in another pane", "in the background", "send this payload", "continue this session", "wait for it to finish"), wants two+ agents at once, types herdr …, or you're inside a herdr pane (HERDR_ENV=1 / HERDR_PANE_ID set).
herdr status must show running. If not, tell the user — you can't start it from Bash (it needs a TUI). Default socket: ~/.config/herdr/herdr.sock. Inside a herdr pane, HERDR_ENV=1 and HERDR_PANE_ID (e.g. p_5) are set.
codex.pySKILL_DIR = ${CLAUDE_PLUGIN_ROOT}/skills/claude-to-codex (the plugin sets CLAUDE_PLUGIN_ROOT when running). Use the absolute path if the expansion doesn't resolve in your shell.
| Verb | Does |
|---|---|
start --task "<p>" --slug <name> | Spawn Codex (pane/tab/space per --in) + inject task (auto marker + "ask, don't guess") + wait + analyze → returns a session id + verdict. --no-wait returns immediately with a ready-to-arm result.monitor watch command instead. |
watch --session <id> | Stream one JSON verdict per state change (JSONL) until the task completes/exits — arm it with the Monitor tool. Auto-approves permission gates, surfaces plans/questions as events, self-closes on verified success. |
send --session <id> --message "<p>" | Follow-up to a live session + wait + analyze. --no-wait to fire-and-let-watch-report. |
reply --session <id> (--text "…" | --choice N | --approve | --reject) | Answer a question / pick an option / approve / reject + wait + analyze. --no-wait to fire-and-let-watch-report. |
await --session <id> | Re-enter the wait, no new input (e.g. after a longer task) |
status --session <id> | One-shot snapshot, no wait |
end --session <id> | Close the pane + delete state (cleanup) |
sessions | List live sessions; prune dead ones |
Minimum call: codex.py start --task "<plain-language task>" --slug <safe-name> — Python injects the completion contract; you supply no markers or prompt scaffolding. --slug is required: it names the pane/tab/workspace/worktree deterministically across spawns and is the multi-agent contract.
Flags: --slug <safe-name> (required) · --in pane|tab|space (default pane; env CODEX_IN) · --worktree (orthogonal; new git worktree on codex/<slug>, env CODEX_WORKTREE=1) · --keep (skip teardown on end, env CODEX_KEEP=1) · --keep-worktree (env CODEX_KEEP_WORKTREE=1) · --plan / --no-plan (force / disable plan mode; auto-engaged when the task says "plan") · --expect <path> (repeatable) · --timeout <sec> (default 600) · --cwd <dir> · --marker <STR> · --no-wait (return immediately with a result.monitor watch command). watch flags: --no-auto-approve (surface gates), --no-close (keep pane on success), --timeout <sec> (watch lifetime, default 1800). send/reply: --no-wait (fire, let an armed watch report).
--in)All three modes spawn unfocused — the human's view never shifts. Pick by intent:
--in | When | Naming | Footprint |
|---|---|---|---|
pane (default) | Quick side-task; cheapest; fleet up to ~10 panes in one tab is fine (references/pane-lifecycle.md:188). | Pane labeled <slug> via pane.rename. | Splits caller's tab; no new tab/workspace. |
tab | Long-running task the human will want to revisit visually. | Tab labeled <caller-space>-<caller-tab>-<slug>. | New unfocused tab in caller's workspace. |
space | Different repo / risky / fully isolated context. | Workspace labeled <caller-tab-name>; inner tab labeled <slug>. | New unfocused workspace. |
Width caveat for pane: agent.start --split halves the focused pane. With one pane in the caller's tab, Codex gets ~half-width (clean parsing). 4+ shared panes shrink each below the ~28-col threshold where Codex plans/menus ellipsize — there is no auto-fallback; pick --in tab if you need full width with many concurrent spawns.
--worktree is orthogonal. Add it when the task changes code that shouldn't pollute the caller's working tree: creates <repo>/.worktrees/codex-<slug> on a fresh codex/<slug> branch from HEAD, then passes that path as cwd to the chosen --in mode. On end, the worktree is removed only if the branch is fully merged into the caller branch AND the working tree is clean; otherwise it is kept and the verdict reports worktree.{kept, ahead, dirty, reason}. --keep-worktree forces keep regardless.
Drive Codex event-driven: spawn without waiting, then arm the Monitor tool with codex.py watch. Each state change (a question, a plan, completion) streams to you as one notification — you never poll or re-issue a wait.
# 1. Spawn + send the task; returns immediately with result.monitor (the watch command).
python3 ${SKILL_DIR}/scripts/codex.py start --no-wait \
--task "Build /tmp/site/index.html for a dentist; ask me 2-5 questions first." \
--slug dentist --in tab --expect /tmp/site/index.html
result.monitor (it carries command + timeout_ms + persistent). Each stdout line the watch prints becomes one notification — a JSON verdict.awaiting_clarification) → codex.py reply --session <id> --text "<answer>" --no-waitawaiting_approval) → present result.plan to the user, then reply --session <id> --approve --no-wait (or --reject, or --text "<revisions>")auto_approved event; nothing to do)--expect artifact present) the watch emits completed then ended/auto_closed and closes the pane itself — the session is done.The watch is read-only (except auto-approving gates and the final close), so a reply/status runs fine while it's live (the lock only guards spawns). Pass --no-close to keep the pane open, --no-auto-approve to surface gates instead.
When you'd rather drive turn-by-turn (or can't use Monitor), background a blocking verb and read its single verdict:
# Background it (run_in_background: true) so the harness notifies you on exit.
python3 ${SKILL_DIR}/scripts/codex.py start \
--task "Refactor src/foo.py to remove duplication; keep behavior identical." --slug fix-foo --expect src/foo.py
# Read the verdict, run result.next_action.command, repeat until state: completed, then:
python3 ${SKILL_DIR}/scripts/codex.py end --session <id>
Between launch and notification you're free to do other work. Always background the blocking verbs (start/send/reply/await) — codex.py blocks server-side until Codex settles, then prints the envelope and exits, which is what triggers the auto-notification.
{ "ok": true, "schema_version": "v1", "command": "start", "session": "cdx-3f9a",
"result": {
"state": "completed|awaiting_clarification|awaiting_approval|permission_gate|working|no_signal|exited",
// watch also streams two info events: "auto_approved" (a gate it approved) and "ended" (self-closed)
"reason": "marker_verified|marker_unverified|artifacts_present|reported_done|free_text_question|multiple_choice|plan_approval|permission_request|working|timeout|no_signal|pane_gone|auto_closed",
"summary": "<=200 chars", "plan": "<full plan text — NEVER truncated — when present>",
"questions": ["…"], "options": [{"key":"1","label":"…","recommended":true}],
"marker_found": true, "artifacts": [{"path":"…","exists":true,"bytes":4561}],
"transcript_tail": "<cleaned final message — chrome/noise stripped, bounded>",
"next_action": {"intent":"answer|approve|choose|verify|wait|start|nothing","command":"…","why":"…"}
}, "error": null }
Act on next_action, per state/reason:
state / reason | What it means | What to do |
|---|---|---|
completed / marker_verified | Marker printed and every --expect artifact exists | Done. end. |
completed / marker_unverified | Marker printed but a promised file is missing | Verify — read transcript_tail; maybe send a fix. |
completed / artifacts_present | No marker, but expected files exist | Confirm content; usually done. |
completed / reported_done | No marker/--expect, but the agent's last message reports it finished | Verify — read transcript_tail; pass --expect next time for a hard check. |
awaiting_clarification / free_text_question | Codex asked a question | reply --text "<answer>" (questions in result.questions). |
awaiting_clarification / multiple_choice | A pick-one menu, or the blocked "Question N/N … submit" widget | reply --choice N (keys in result.options). |
awaiting_approval / plan_approval | Plan-mode approval menu | reply --approve (or --reject). Plan in result.plan. |
permission_gate / permission_request | Tool/command permission prompt | Read transcript_tail, then reply --approve / --reject. |
working / timeout | Didn't settle within --timeout (still running) | await --session <id> (optionally larger --timeout). |
no_signal / no_signal | Turn ended with no marker/question/menu, no resume | Read transcript_tail; may be done w/o a marker, or stuck. status to recheck. |
exited / pane_gone | The pane/process is gone | start a fresh task (no resume in v1). |
auto_approved / permission_request | (watch only) A permission gate was auto-approved for you | Nothing — the watch keeps streaming. |
ended / auto_closed | (watch only) Verified complete; pane closed + state deleted | Done — the watch exited. |
Clarifications can chain — Codex may ask several in a row (e.g. a free-text question, then a multiple-choice widget); each reply can surface the next one. That's expected, not a swallowed answer — keep following next_action until completed (or awaiting_approval).
Exit codes: 0 = valid verdict (any state — read result.state); 2 = usage; 3 = herdr env (HERDR_DOWN, retryable); 4 = session/pane not found/dead; 5 = internal. Stdout is always the single envelope; diagnostics go to stderr.
codex.py handles for you (each a live-verified failure mode)--slug (required) drives per-mode labeling (--in pane|tab|space); --worktree materializes a fresh git worktree on codex/<slug> and end auto-removes it only when the branch is merged and the tree is clean.--expect files are checked on disk.no_signal.session id.transcript_tail is the agent's final message with chrome/MCP-noise/diff-log stripped, capped by lines and chars (no bloat).end closes the pane (and its dedicated tab) and deletes state; sessions prunes dead ones.watch emits one JSON verdict per real state change (de-duped by content signature) for the Monitor tool, so you react to questions/plans/completion as they happen, never polling./plan automatically (override --no-plan); the plan is captured in full and surfaced for approve/revise.watch auto-approves Codex's (rare, YOLO) permission gates and closes the pane on verified success (marker + --expect), so a clean run finishes hands-free.watch never blocks a concurrent reply/status, and parallel sessions don't serialize.Plan mode engages automatically when the task mentions a plan (the word plan/plans/planning, case-insensitive) — or force it with --plan, disable with --no-plan. Codex proposes a plan and the verdict is awaiting_approval/plan_approval with the full plan in result.plan and the menu in result.options. reply --approve to implement, --reject to keep planning, --choice N for another path, or --text "<revisions>" to refine. Under watch the plan arrives as a streamed event (present it to the user, then approve/revise); under the one-shot flow, await after approving rides the build to completed.
For parallel fleets, other agents (Pi/Claude/OpenCode/Hermes), or custom tooling, drop to raw herdr — you compose the herdr commands yourself. Start with references/herdr-cli.md (the CLI surface: workspaces / tabs / panes / read / split / wait, from inside a pane); the full substrate (agent-vs-pane namespace, send-keys vocabulary, status model, events/subscribe, pane lifecycle, hidden IPC, traps) is the rest of references/.
codex-and-agents.md — everything verified live about driving Codex: the why behind every verdict (event→state mapping, spawn-readiness, agent.read recent vs visible, full-width capture, single-line sends, the three pause shapes, idle-blips, YOLO permissions). Read before reasoning about a Codex verdict.scripting-patterns.md — codex.py + _core.py internals and the verb/envelope/exit-code contract in depth.herdr-cli.md — controlling herdr itself from inside a pane (workspaces/tabs/panes/read/split/wait) — the generic CLI surface and the gateway beyond one Codex. Confirms HERDR_ENV=1; links the socket API.architecture · agent-vs-pane · status-model · waiting-and-async · sending-input · reading-output · events-and-subscribe · permission-handling · pane-lifecycle · multi-agent-patterns · fake-and-custom-agents · cli-and-ipc-reference · pitfalls-and-traps. Open the one that matches your problem; pitfalls-and-traps.md is the diagnostic ladder.codex.py first. Prefer the Monitor-driven flow (start --no-wait → arm the Monitor tool with result.monitor's watch command → react with reply --no-wait); otherwise background the blocking verbs (that's what gives you the one-shot notification).watch event is a background notification, not a user reply — act on it (answer/approve), don't treat it as the human answering you.--slug (required). Pick --in pane for quick side-tasks (default), --in tab for visually-revisited work, --in space for fully isolated runs. Add --worktree when the task changes code and you want git isolation.result.next_action, not a screen scrape. idle/done ≠ "complete" — trust state/reason.session id across calls (never a captured pane_id — slots renumber). Always end when finished.agent attach from Bash; never run two events.subscribe on one socket; never rename a pane to a reserved type name (pi/claude/codex/opencode/hermes) — see pitfalls-and-traps.md.herdr status shows the server down, tell the user — you can't start it from Bash.Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub yigitkonur/claude-code-herdr-plugin --plugin claude-code-herdr-plugin