Skill

claude-to-codex

Drive a Codex sub-agent from Claude Code over a herdr pane. ALWAYS invoke when delegating to a Codex sub-agent ("have codex do X", "run this in the background", "start this task", "send this payload", "continue this session") or to Pi/Claude/OpenCode/Hermes, running agents in parallel, spawning a side pane, waiting for an agent to finish, reading another agent's output, approving its prompts, or when the user mentions herdr (pane, agent, send, wait, run, split) or HERDR_PANE_ID is set. For Codex, drive everything through ONE tool — scripts/codex.py (start/watch/send/reply/await/status/end/sessions). The recommended flow is event-driven: `start --no-wait` to spawn, then arm the Monitor tool with `codex.py watch` to STREAM one JSON verdict per state change (question, plan, completion); react with `reply`; it auto-approves permission gates, auto-engages plan mode when the task mentions a plan, and self-closes on verified success. (A one-shot mode — background a blocking verb, read its single verdict — still works.) Python encodes the spawn-readiness race, verified send, full-width capture, marker+verification completion, never-truncated plans, session continuity across pane-slot shifts, structured pause reasons (question vs plan-menu vs blocked widget), and cleanup. References cover the herdr substrate (namespaces, parallel fleets, raw IPC, traps) beyond a single Codex.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/claude-code-herdr-plugin:claude-to-codex

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

You can spawn **Codex** (also Pi/Claude/OpenCode/Hermes) into a **herdr** side pane and drive it from Bash. For Codex, you need **one tool: `scripts/codex.py`** — it absorbs every sharp edge (spawn timing, lost sends, "finished vs paused?", plan truncation, pane renumbering, cleanup) and returns **structured JSON verdicts**. Two ways to drive it:

Supporting Files

SKILL.md

169 lines · ~4.8k tokens

Stats

LanguagePython

Stars0

MaintenanceExcellent

Last CommitJun 2, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

herdr — drive a Codex sub-agent from Claude Code

What this is

You can spawn Codex (also Pi/Claude/OpenCode/Hermes) into a herdr side pane and drive it from Bash. For Codex, you need one tool: scripts/codex.py — it absorbs every sharp edge (spawn timing, lost sends, "finished vs paused?", plan truncation, pane renumbering, cleanup) and returns structured JSON verdicts. Two ways to drive it:

Event-driven (recommended): start --no-wait to spawn, then arm the Monitor tool with the returned result.monitor command (codex.py watch). It streams one JSON verdict per state change — a question, a plan, completion — as notifications. You react with reply --no-wait; it auto-approves permission gates and self-closes the pane on verified success. See "The recommended loop" below.
One-shot: background a blocking verb (start/send/reply/await, run_in_background: true). It blocks until Codex settles, prints one JSON verdict, and exits — which auto-notifies you. Read result.state, run result.next_action.command, repeat until completed, then end.

Either way you never scrape a screen, poll status, or sequence raw sends. Everything beyond one Codex (fleets, other agents, raw herdr) lives in references/.

Invoke this skill when

The user delegates to Codex or another agent ("have codex do X", "let pi handle it", "run it in another pane", "in the background", "send this payload", "continue this session", "wait for it to finish"), wants two+ agents at once, types herdr …, or you're inside a herdr pane (HERDR_ENV=1 / HERDR_PANE_ID set).

First, verify herdr

herdr status must show running. If not, tell the user — you can't start it from Bash (it needs a TUI). Default socket: ~/.config/herdr/herdr.sock. Inside a herdr pane, HERDR_ENV=1 and HERDR_PANE_ID (e.g. p_5) are set.

The one tool — `codex.py`

SKILL_DIR = ${CLAUDE_PLUGIN_ROOT}/skills/claude-to-codex (the plugin sets CLAUDE_PLUGIN_ROOT when running). Use the absolute path if the expansion doesn't resolve in your shell.

Verb	Does
`start --task "<p>" --slug <name>`	Spawn Codex (pane/tab/space per `--in`) + inject task (auto marker + "ask, don't guess") + wait + analyze → returns a `session` id + verdict. `--no-wait` returns immediately with a ready-to-arm `result.monitor` watch command instead.
`watch --session <id>`	Stream one JSON verdict per state change (JSONL) until the task completes/exits — arm it with the Monitor tool. Auto-approves permission gates, surfaces plans/questions as events, self-closes on verified success.
`send --session <id> --message "<p>"`	Follow-up to a live session + wait + analyze. `--no-wait` to fire-and-let-watch-report.
`reply --session <id> (--text "…" \| --choice N \| --approve \| --reject)`	Answer a question / pick an option / approve / reject + wait + analyze. `--no-wait` to fire-and-let-watch-report.
`await --session <id>`	Re-enter the wait, no new input (e.g. after a longer task)
`status --session <id>`	One-shot snapshot, no wait
`end --session <id>`	Close the pane + delete state (cleanup)
`sessions`	List live sessions; prune dead ones

Minimum call: codex.py start --task "<plain-language task>" --slug <safe-name> — Python injects the completion contract; you supply no markers or prompt scaffolding. --slug is required: it names the pane/tab/workspace/worktree deterministically across spawns and is the multi-agent contract. Flags: --slug <safe-name> (required) · --in pane|tab|space (default pane; env CODEX_IN) · --worktree (orthogonal; new git worktree on codex/<slug>, env CODEX_WORKTREE=1) · --keep (skip teardown on end, env CODEX_KEEP=1) · --keep-worktree (env CODEX_KEEP_WORKTREE=1) · --plan / --no-plan (force / disable plan mode; auto-engaged when the task says "plan") · --expect <path> (repeatable) · --timeout <sec> (default 600) · --cwd <dir> · --marker <STR> · --no-wait (return immediately with a result.monitor watch command). watch flags: --no-auto-approve (surface gates), --no-close (keep pane on success), --timeout <sec> (watch lifetime, default 1800). send/reply: --no-wait (fire, let an armed watch report).

Mode selection (`--in`)

All three modes spawn unfocused — the human's view never shifts. Pick by intent:

`--in`	When	Naming	Footprint
`pane` (default)	Quick side-task; cheapest; fleet up to ~10 panes in one tab is fine (`references/pane-lifecycle.md:188`).	Pane labeled `<slug>` via `pane.rename`.	Splits caller's tab; no new tab/workspace.
`tab`	Long-running task the human will want to revisit visually.	Tab labeled `<caller-space>-<caller-tab>-<slug>`.	New unfocused tab in caller's workspace.
`space`	Different repo / risky / fully isolated context.	Workspace labeled `<caller-tab-name>`; inner tab labeled `<slug>`.	New unfocused workspace.

Width caveat for pane: agent.start --split halves the focused pane. With one pane in the caller's tab, Codex gets ~half-width (clean parsing). 4+ shared panes shrink each below the ~28-col threshold where Codex plans/menus ellipsize — there is no auto-fallback; pick --in tab if you need full width with many concurrent spawns.

--worktree is orthogonal. Add it when the task changes code that shouldn't pollute the caller's working tree: creates <repo>/.worktrees/codex-<slug> on a fresh codex/<slug> branch from HEAD, then passes that path as cwd to the chosen --in mode. On end, the worktree is removed only if the branch is fully merged into the caller branch AND the working tree is clean; otherwise it is kept and the verdict reports worktree.{kept, ahead, dirty, reason}. --keep-worktree forces keep regardless.

The recommended loop — stream events with the Monitor tool

Drive Codex event-driven: spawn without waiting, then arm the Monitor tool with codex.py watch. Each state change (a question, a plan, completion) streams to you as one notification — you never poll or re-issue a wait.

# 1. Spawn + send the task; returns immediately with result.monitor (the watch command).
python3 ${SKILL_DIR}/scripts/codex.py start --no-wait \
  --task "Build /tmp/site/index.html for a dentist; ask me 2-5 questions first." \
  --slug dentist --in tab --expect /tmp/site/index.html

Arm the Monitor tool with result.monitor (it carries command + timeout_ms + persistent). Each stdout line the watch prints becomes one notification — a JSON verdict.
React to each streamed event with a short side command, and let the watch report the next state:
- a question (awaiting_clarification) → codex.py reply --session <id> --text "<answer>" --no-wait
- a plan (awaiting_approval) → present result.plan to the user, then reply --session <id> --approve --no-wait (or --reject, or --text "<revisions>")
- a permission gate → auto-approved for you (an auto_approved event; nothing to do)
On verified success (marker printed AND every --expect artifact present) the watch emits completed then ended/auto_closed and closes the pane itself — the session is done.

The watch is read-only (except auto-approving gates and the final close), so a reply/status runs fine while it's live (the lock only guards spawns). Pass --no-close to keep the pane open, --no-auto-approve to surface gates instead.

One-shot alternative — background a blocking verb

When you'd rather drive turn-by-turn (or can't use Monitor), background a blocking verb and read its single verdict:

# Background it (run_in_background: true) so the harness notifies you on exit.
python3 ${SKILL_DIR}/scripts/codex.py start \
  --task "Refactor src/foo.py to remove duplication; keep behavior identical." --slug fix-foo --expect src/foo.py
# Read the verdict, run result.next_action.command, repeat until state: completed, then:
python3 ${SKILL_DIR}/scripts/codex.py end --session <id>

Between launch and notification you're free to do other work. Always background the blocking verbs (start/send/reply/await) — codex.py blocks server-side until Codex settles, then prints the envelope and exits, which is what triggers the auto-notification.

The verdict — one JSON envelope (stdout only)

{ "ok": true, "schema_version": "v1", "command": "start", "session": "cdx-3f9a",
  "result": {
    "state":  "completed|awaiting_clarification|awaiting_approval|permission_gate|working|no_signal|exited",
    //         watch also streams two info events: "auto_approved" (a gate it approved) and "ended" (self-closed)
    "reason": "marker_verified|marker_unverified|artifacts_present|reported_done|free_text_question|multiple_choice|plan_approval|permission_request|working|timeout|no_signal|pane_gone|auto_closed",
    "summary": "<=200 chars", "plan": "<full plan text — NEVER truncated — when present>",
    "questions": ["…"], "options": [{"key":"1","label":"…","recommended":true}],
    "marker_found": true, "artifacts": [{"path":"…","exists":true,"bytes":4561}],
    "transcript_tail": "<cleaned final message — chrome/noise stripped, bounded>",
    "next_action": {"intent":"answer|approve|choose|verify|wait|start|nothing","command":"…","why":"…"}
  }, "error": null }

Act on next_action, per state/reason:

`state` / `reason`	What it means	What to do
`completed` / `marker_verified`	Marker printed and every `--expect` artifact exists	Done. `end`.
`completed` / `marker_unverified`	Marker printed but a promised file is missing	Verify — read `transcript_tail`; maybe `send` a fix.
`completed` / `artifacts_present`	No marker, but expected files exist	Confirm content; usually done.
`completed` / `reported_done`	No marker/`--expect`, but the agent's last message reports it finished	Verify — read `transcript_tail`; pass `--expect` next time for a hard check.
`awaiting_clarification` / `free_text_question`	Codex asked a question	`reply --text "<answer>"` (questions in `result.questions`).
`awaiting_clarification` / `multiple_choice`	A pick-one menu, or the blocked "Question N/N … submit" widget	`reply --choice N` (keys in `result.options`).
`awaiting_approval` / `plan_approval`	Plan-mode approval menu	`reply --approve` (or `--reject`). Plan in `result.plan`.
`permission_gate` / `permission_request`	Tool/command permission prompt	Read `transcript_tail`, then `reply --approve` / `--reject`.
`working` / `timeout`	Didn't settle within `--timeout` (still running)	`await --session <id>` (optionally larger `--timeout`).
`no_signal` / `no_signal`	Turn ended with no marker/question/menu, no resume	Read `transcript_tail`; may be done w/o a marker, or stuck. `status` to recheck.
`exited` / `pane_gone`	The pane/process is gone	`start` a fresh task (no resume in v1).
`auto_approved` / `permission_request`	(watch only) A permission gate was auto-approved for you	Nothing — the watch keeps streaming.
`ended` / `auto_closed`	(watch only) Verified complete; pane closed + state deleted	Done — the watch exited.

Clarifications can chain — Codex may ask several in a row (e.g. a free-text question, then a multiple-choice widget); each reply can surface the next one. That's expected, not a swallowed answer — keep following next_action until completed (or awaiting_approval).

Exit codes: 0 = valid verdict (any state — read result.state); 2 = usage; 3 = herdr env (HERDR_DOWN, retryable); 4 = session/pane not found/dead; 5 = internal. Stdout is always the single envelope; diagnostics go to stderr.

What `codex.py` handles for you (each a live-verified failure mode)

Spawn readiness — waits for a genuinely input-ready composer (a task sent during Codex's ~2s MCP/TUI init is silently lost).
Verified single-line send — confirms each submit landed and re-sends if it was eaten; prompts are one line (an embedded newline can submit early).
Full-width capture — Codex gets its own full-width tab (a narrow split mangles plans/option labels); output is read from scrollback, so long plans and end-of-task reports aren't truncated.
Structured naming / isolation — --slug (required) drives per-mode labeling (--in pane|tab|space); --worktree materializes a fresh git worktree on codex/<slug> and end auto-removes it only when the branch is merged and the tree is clean.
Completion = marker AND artifacts — the marker matches only a standalone output line (never the prompt echo); --expect files are checked on disk.
Plans never truncated; pauses classified (question / plan-menu / blocked widget), not dumped; idle blips between work bursts get a grace window before no_signal.
Session continuity — keyed on the stable terminal_id and re-resolved each call, so pane-slot renumbering never breaks a handle; pass only the durable session id.
Bounded, clean verdict — transcript_tail is the agent's final message with chrome/MCP-noise/diff-log stripped, capped by lines and chars (no bloat).
Cleanup — end closes the pane (and its dedicated tab) and deletes state; sessions prunes dead ones.
Event streaming — watch emits one JSON verdict per real state change (de-duped by content signature) for the Monitor tool, so you react to questions/plans/completion as they happen, never polling.
Auto-plan — a task mentioning a plan engages /plan automatically (override --no-plan); the plan is captured in full and surfaced for approve/revise.
Auto-approve + self-close — watch auto-approves Codex's (rare, YOLO) permission gates and closes the pane on verified success (marker + --expect), so a clean run finishes hands-free.
Concurrency — the cross-process lock guards only the spawn (label/branch check-then-create), so a long-running watch never blocks a concurrent reply/status, and parallel sessions don't serialize.

Plan mode

Plan mode engages automatically when the task mentions a plan (the word plan/plans/planning, case-insensitive) — or force it with --plan, disable with --no-plan. Codex proposes a plan and the verdict is awaiting_approval/plan_approval with the full plan in result.plan and the menu in result.options. reply --approve to implement, --reject to keep planning, --choice N for another path, or --text "<revisions>" to refine. Under watch the plan arrives as a streamed event (present it to the user, then approve/revise); under the one-shot flow, await after approving rides the build to completed.

Beyond one Codex

For parallel fleets, other agents (Pi/Claude/OpenCode/Hermes), or custom tooling, drop to raw herdr — you compose the herdr commands yourself. Start with references/herdr-cli.md (the CLI surface: workspaces / tabs / panes / read / split / wait, from inside a pane); the full substrate (agent-vs-pane namespace, send-keys vocabulary, status model, events/subscribe, pane lifecycle, hidden IPC, traps) is the rest of references/.

References (load by name, only when relevant)

codex-and-agents.md — everything verified live about driving Codex: the why behind every verdict (event→state mapping, spawn-readiness, agent.read recent vs visible, full-width capture, single-line sends, the three pause shapes, idle-blips, YOLO permissions). Read before reasoning about a Codex verdict.
scripting-patterns.md — codex.py + _core.py internals and the verb/envelope/exit-code contract in depth.
herdr-cli.md — controlling herdr itself from inside a pane (workspaces/tabs/panes/read/split/wait) — the generic CLI surface and the gateway beyond one Codex. Confirms HERDR_ENV=1; links the socket API.
herdr substrate (for fleets/other agents/custom tooling): architecture · agent-vs-pane · status-model · waiting-and-async · sending-input · reading-output · events-and-subscribe · permission-handling · pane-lifecycle · multi-agent-patterns · fake-and-custom-agents · cli-and-ipc-reference · pitfalls-and-traps. Open the one that matches your problem; pitfalls-and-traps.md is the diagnostic ladder.

Hard rules

For Codex, reach for codex.py first. Prefer the Monitor-driven flow (start --no-wait → arm the Monitor tool with result.monitor's watch command → react with reply --no-wait); otherwise background the blocking verbs (that's what gives you the one-shot notification).
A streamed watch event is a background notification, not a user reply — act on it (answer/approve), don't treat it as the human answering you.
Always pass --slug (required). Pick --in pane for quick side-tasks (default), --in tab for visually-revisited work, --in space for fully isolated runs. Add --worktree when the task changes code and you want git isolation.
Act on result.next_action, not a screen scrape. idle/done ≠ "complete" — trust state/reason.
Pass only the durable session id across calls (never a captured pane_id — slots renumber). Always end when finished.
Never agent attach from Bash; never run two events.subscribe on one socket; never rename a pane to a reserved type name (pi/claude/codex/opencode/hermes) — see pitfalls-and-traps.md.
If herdr status shows the server down, tell the user — you can't start it from Bash.

claude-to-codex

Invocation

Context Preview

Supporting Files

SKILL.md

claude-to-codex

Invocation

Context Preview

Supporting Files

SKILL.md

herdr — drive a Codex sub-agent from Claude Code

What this is

Invoke this skill when

First, verify herdr

The one tool — `codex.py`

Mode selection (`--in`)

The recommended loop — stream events with the Monitor tool

One-shot alternative — background a blocking verb

The verdict — one JSON envelope (stdout only)

What `codex.py` handles for you (each a live-verified failure mode)

Plan mode

Beyond one Codex

References (load by name, only when relevant)

Hard rules

Similar Skills

herdr — drive a Codex sub-agent from Claude Code

What this is

Invoke this skill when

First, verify herdr

The one tool — `codex.py`

Mode selection (`--in`)

The recommended loop — stream events with the Monitor tool

One-shot alternative — background a blocking verb

The verdict — one JSON envelope (stdout only)

What `codex.py` handles for you (each a live-verified failure mode)

Plan mode

Beyond one Codex

References (load by name, only when relevant)

Hard rules

Similar Skills

claude-to-codex

Invocation

Context Preview

Supporting Files

SKILL.md

claude-to-codex

Invocation

Context Preview

Supporting Files

SKILL.md

herdr — drive a Codex sub-agent from Claude Code

What this is

Invoke this skill when

First, verify herdr

The one tool — codex.py

Mode selection (--in)

The recommended loop — stream events with the Monitor tool

One-shot alternative — background a blocking verb

The verdict — one JSON envelope (stdout only)

What codex.py handles for you (each a live-verified failure mode)

Plan mode

Beyond one Codex

References (load by name, only when relevant)

Hard rules

Similar Skills

herdr — drive a Codex sub-agent from Claude Code

What this is

Invoke this skill when

First, verify herdr

The one tool — codex.py

Mode selection (--in)

The recommended loop — stream events with the Monitor tool

One-shot alternative — background a blocking verb

The verdict — one JSON envelope (stdout only)

What codex.py handles for you (each a live-verified failure mode)

Plan mode

Beyond one Codex

References (load by name, only when relevant)

Hard rules

Similar Skills

The one tool — `codex.py`

Mode selection (`--in`)

What `codex.py` handles for you (each a live-verified failure mode)

The one tool — `codex.py`

Mode selection (`--in`)

What `codex.py` handles for you (each a live-verified failure mode)