From ai-devkit
Bring in a second AI agent — OpenAI's Codex — for an independent review, a second opinion, deeper debugging, a security audit, or a delegated implementation. Your primary agent gathers repo context, runs Codex non-interactively, then critically cross-checks what comes back. Works on any repo.
How this skill is triggered — by the user, by Claude, or both
Slash command
/ai-devkit:codex-buddy What you want the second agent to do (e.g., 'review my changes on this branch', 'second opinion on this design', 'debug this failing test')What you want the second agent to do (e.g., 'review my changes on this branch', 'second opinion on this design', 'debug this failing test')This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Two agents are better than one. Your primary agent stays in the driver's seat; when you want an independent check, it hands a context-rich prompt to a **second agent — OpenAI's Codex** — runs it non-interactively, then **critically evaluates** the result instead of taking it as gospel. Use it for a second opinion, a code review, a deeper debugging pass, a security audit, a test-gap analysis, or...
Two agents are better than one. Your primary agent stays in the driver's seat; when you want an independent check, it hands a context-rich prompt to a second agent — OpenAI's Codex — runs it non-interactively, then critically evaluates the result instead of taking it as gospel. Use it for a second opinion, a code review, a deeper debugging pass, a security audit, a test-gap analysis, or to delegate a focused implementation.
This is the opposite of "let one model do everything." The value comes from a different model with fresh eyes, with the findings filtered by an agent that actually knows your repo. Codex is a colleague, not an authority.
$ARGUMENTS
Skip it for trivial tasks the primary agent can finish in one step, and always respect "don't use codex".
| Mode | You're asking to… | Touches files? |
|---|---|---|
review | review / check / "second opinion" | No (read-only) |
diagnose | debug / investigate / "why is this failing" | No |
plan | design / architect / "how should we" | No |
audit | security / adversarial / find vulnerabilities | No |
test-analysis | find test gaps / missing coverage | No |
refactor | suggest a restructuring | No (suggest) |
implement | implement / fix / apply changes | Yes (write) |
Read-only is the default. Only implement (and an explicit "apply this refactor") may write. When a request combines read and write — "review and fix", "check and update" — the write intent wins: route to implement.
Map the request to a mode above. Choose a model and reasoning effort — bias to higher effort for plan and audit (architecture and security reward deep reasoning), normal effort for the rest. If the user didn't specify, ask once with AskUserQuestion rather than guessing.
Curated context beats dumping the whole repo. Collect only what the mode needs:
git diff --stat (use git diff --stat <base>...HEAD for a branch review).Point Codex at the conventions the changed code touches — your dependency-injection approach, your test framework, your data layer — so it reviews against your patterns, not generic ones.
Give Codex a concrete <task> and a short <project_context> (conventions, patterns, any house rules). A few targeted sentences of context are the difference between a generic review and one that fits your codebase.
Read-only modes (review, diagnose, plan, audit, test-analysis, refactor-suggest):
STDERR_LOG=$(mktemp)
codex exec \
-m <model> \
--config model_reasoning_effort="<effort>" \
--sandbox read-only \
--skip-git-repo-check \
"<enriched prompt>" < /dev/null 2>"$STDERR_LOG"
There's also a built-in review entry point — codex review runs a code review non-interactively.
Write-capable modes (implement, refactor-apply) — opt in explicitly with --full-auto (it bundles --sandbox workspace-write):
codex exec \
-m <model> \
--config model_reasoning_effort="<effort>" \
--full-auto \
--skip-git-repo-check \
"<enriched prompt>" < /dev/null 2>"$STDERR_LOG"
Rules that keep it reliable:
--skip-git-repo-check and pipe < /dev/null so Codex never blocks waiting on stdin (essential for background runs).2>"$STDERR_LOG", not 2>/dev/null): it carries noisy thinking and the real error diagnostics. On failure, read it.--full-auto only for write modes. Read-only modes stay on --sandbox read-only.- sentinel instead of as an argument.This is the step that makes the pattern worth it:
To argue a point, resume the same Codex session instead of starting fresh:
codex exec --skip-git-repo-check resume --last \
"Following up — I disagree with [X] because [evidence]. Reconsider." < /dev/null 2>"$STDERR_LOG"
Report three things: Codex's findings, your assessment (cross-checks, filtered false positives, disagreements), and suggested next steps — then let the user choose what to act on. Clean up the temp log when done (rm -f "$STDERR_LOG").
npm install -g @openai/codex. Sign in: !codex login (the ! runs it in your own terminal)./codex:* slash commands belong to the official Codex plugin — this skill is for natural-language "ask codex" requests.npx claudepluginhub alexandremorgado/ai-devkit --plugin ai-devkitGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.