From install-codex
Install the OpenAI Codex CLI + bubblewrap, and use Codex as a headless agent Claude can hand clearly-specified code-writing off to — spending Codex/ChatGPT credits instead of Claude budget (credit arbitrage), not because Codex is a weaker model. Use when the user asks to install/set up Codex, or when a code task has a precise input/output spec and the user wants to conserve Claude budget ("hand this to codex/gpt", "offload this", "save tokens", "use my codex credits").
How this skill is triggered — by the user, by Claude, or both
Slash command
/install-codex:install-codexThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Two capabilities:
Two capabilities:
@openai/codex) + bubblewrap sandbox.codex exec): Claude specs and lightly reviews, Codex does the typing.The win is credit arbitrage, not a cheaper model. Codex (gpt-5.5 at high
reasoning) is a capable model — the point is which budget pays: offloading to
codex exec spends Codex/ChatGPT credit headroom instead of Claude budget. Use
it when Codex credits are the more plentiful resource.
The gate is spec clarity, not model tier. Codex is reliable on tasks with a precise, defined input/output spec (functions to a contract, scaffolding, mechanical refactors, test stubs). Trust it less for open-ended thinking, design, or creative problem-solving — keep those on Claude. A tight spec is also what makes Claude's later review cheap, which is where the savings hold up.
Run the bundled installer (idempotent — skips what's present):
bash "$CLAUDE_SKILL_DIR/install-codex.sh"
If $CLAUDE_SKILL_DIR is unset, run install-codex.sh next to this file.
Report codex --version, bwrap --version, and codex login status.
codex exec needs auth. The installer only reports status.
Default: device auth. Always use codex login --device-auth unless the
user explicitly asks for the API-key path. Do not prompt for or ask about
an API key — most users authenticate with a ChatGPT subscription, and device
auth is the only flow that works in headless / remote containers.
codex login --device-auth as a
background process (it must keep running to poll). Read its log for the
one-time code + https://auth.openai.com/codex/device, give both to the
user, wait for the background task to finish, then verify with
codex login status.printenv OPENAI_API_KEY | codex login --with-api-key. Never default here,
never ask the user to supply a key unprompted.codex login opens localhost:1455 and will NOT work in an
isolated remote container — never use it; prefer --device-auth always.Confirm end-to-end: codex exec "acknowledge this message".
Offer a handoff only when ALL of these hold:
Do NOT delegate: ambiguous/under-specified work, security-sensitive code, cross-cutting architecture, anything needing deep repo context or live iteration, or tasks where reviewing the output would cost more than just doing it.
Never auto-delegate silently. Use AskUserQuestion to offer the handoff,
stating: the task in one line, why it's a good candidate (clear spec), and that
it spends Codex/ChatGPT credits instead of Claude budget. The user opts in
per handoff (or can say "always, stop asking" — honor that for the rest of
the session).
Before delegating, Claude writes a short handoff manifest to a temp file — a terse bullet list of the exact deliverables and acceptance criteria. This one artifact is both Codex's task prompt and Claude's review checklist, so the later check is a cheap cross-reference instead of fresh analysis. Keep it bullets, not prose.
# Watermark Claude's transcript tail BEFORE delegating (for the token report).
bash "$CLAUDE_SKILL_DIR/token-report.sh" mark
codex exec \
--cd "$PWD" \
--sandbox workspace-write \
-c approval_policy="never" \
-c model_reasoning_effort="high" \
"$(cat /tmp/handoff-manifest.md)" \
< /dev/null 2>&1 | tee /tmp/codex-handoff.log
< /dev/null is mandatory. codex exec reads stdin and concatenates it
with the prompt, then blocks on stdin EOF. With no controlling tty
(background task, CI, nested agent) stdin never closes and Codex hangs
forever at 0% CPU with no output. Always redirect stdin.-c approval_policy="never" is mandatory. A user ~/.codex/config.toml
may set approval_policy = "on-request", which overrides exec's
non-interactive default and silently blocks waiting for an approval that
never comes. Force it off explicitly; don't rely on the default.-c model_reasoning_effort="high" — Codex is a capable model; high reasoning
makes it reliable on well-specified work, which is what lets Claude's review
stay lightweight.--sandbox workspace-write confines writes to the workdir; it cannot escape
the Claude sandbox dir. Never use danger-full-access. (Check flags with
codex exec --help.)2>&1 | tee /tmp/codex-handoff.log keeps Codex's output visible and
captures it so the token report (below) can read Codex's tokens used total.The whole point is to save tokens, so do not burn them on verbose analysis or narrating Codex's output. Codex ran at high reasoning on a precise spec — trust but verify, proportionate to risk.
Default (mechanical / well-specified task) — lightweight semantic check: read the diff against the handoff manifest and confirm only:
That's it. No line-by-line audit, no diff dumps back to the user.
Ramp up only as complexity/risk rises (security-sensitive, cross-cutting, non-obvious logic, large surface): add a full correctness/edge-case/codebase-fit read and run the real test / lint / type suite.
Outcome:
codex exec prompts.The token win comes from Claude neither generating nor re-deriving the code — keep the spec tight and the review as light as the task safely allows.
The moment a handoff returns and review is done, start a new turn and run the bundled reporter, then print its one-line output verbatim — nothing else:
bash "$CLAUDE_SKILL_DIR/token-report.sh" report --codex-log /tmp/codex-handoff.log
(If $CLAUDE_SKILL_DIR is unset, run token-report.sh next to this file.)
It diffs Claude's session transcript between the watermark set by mark (run
just before codex exec) and the current tail — so it measures exactly the
delegation + review span, not a guessed turn boundary — then parses the
captured Codex log, and prints exactly:
claude: 1,234in/567out --> codex: 15,192 tok
in for Claude is total tokens read that turn (fresh prompt + cached
context); Codex at 0.131 reports a single total, not an in/out split.~/.codex/ and are wiped when an ephemeral
container is reclaimed; re-auth is expected in fresh sessions.Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub mushst/claudeskill_codexhandoff --plugin install-codex