From hyperskills
Runs cross-model code reviews using the external Codex CLI tool from a Claude session. Catches bugs that single-model self-review would miss by leveraging a different reviewer architecture.
How this skill is triggered — by the user, by Claude, or both
Slash command
/hyperskills:codex-reviewThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Cross-model validation using the `codex` binary directly. Claude writes code, Codex reviews it. Different architecture, different training distribution, no self-approval bias.
Cross-model validation using the codex binary directly. Claude writes code, Codex reviews it. Different architecture, different training distribution, no self-approval bias.
Core insight: Single-model self-review is systematically biased. Cross-model review catches different bug classes because the reviewer has fundamentally different blind spots than the author.
How to read this skill: the patterns and decision trees below are guidelines. Pick what fits, blend when needed. The rules marked ⚠️ are different: they're real codex CLI behaviors, not procedural ceremony. Skipping the scope flag genuinely hangs the call; piping to tail genuinely loses output. Treat ⚠️ rules as facts about the tool, not opinions about workflow.
Prerequisite: The codex CLI must be installed and authenticated. Verify with codex --version. User defaults live in ~/.codex/config.toml; respect them.
On Pi: when the pi-nova xreview extension is installed, prefer /xreview — it wraps external codex exec --sandbox read-only with stdin closed and returns structured Verdict/Findings/Fix Queue. Manual bash calls from Pi follow the same ⚠️ rules below.
Direction: Claude → Codex only. For the bidirectional skill (also handles Codex → Claude with the claude -p gotchas around yield_time_ms and variadic flags), use /hyperskills:cross-model-review instead.
codex reviewA bare codex review (no scope) is the #1 cause of failures: it hangs or produces 100KB+ blob output. Always specify exactly one scope flag:
| Want to review | Command |
|---|---|
| Branch since main | codex review --base main |
| Single commit | codex review --commit <SHA> |
| Working tree (unstaged) | codex review --uncommitted |
For anything outside this trio (spec docs, single files, custom personas, focused passes), use codex exec "PROMPT" with explicit scope in the prompt, never bare codex review.
If output exceeds ~100KB, the diff is too large for one pass. Split per commit, or use codex exec with a narrower prompt ("Review error handling only").
tailNever pipe a review to | tail -N. Three failure modes:
tail reads the whole stream before producing output, so the agent gets nothing until codex exits or times out, no progress signal mid-review.tail -300 cuts exactly the part you want.tail -f /tmp/review.txt in another terminal streams the review in real time, completely independent of the agent's call.Right pattern: pick a non-colliding filename, redirect, then read it back.
# mktemp so parallel/repeat reviews don't clobber each other.
# Bake the scope into the slug so it's self-describing under tail -f.
out=$(mktemp -t codex-review-pre-pr.XXXXXX) && echo "$out"
codex review --base main > "$out" 2>&1
codex exec --sandbox read-only "PROMPT" > "$out" 2>&1
If mktemp isn't handy: out=/tmp/codex-review-$$-$(date +%s).txt. Echo the path before the redirect so a human running tail -f knows where to look. After exit, Read (or cat) the file. It persists across turns, re-read instead of re-running.
| Mode | Command | Best For |
|---|---|---|
codex review | Structured diff review with prioritized findings | Pre-PR reviews, commit reviews, WIP checks |
codex exec | Freeform non-interactive deep-dive with full prompt control | Security audits, architecture, focused investigation, specs |
codex review only)| Flag | Purpose |
|---|---|
--base <BRANCH> | Diff against base branch |
--commit <SHA> | Review a specific commit |
--uncommitted | Review working tree changes |
| Flag | When |
|---|---|
--sandbox read-only | Default for review work, no writes |
--sandbox workspace-write | Review + apply suggested fixes |
--full-auto | Alias for --ask-for-approval never --sandbox workspace-write |
--dangerously-bypass-approvals-and-sandbox | Last resort; explicit user request only |
-C <DIR> / --cd <DIR> | Run in another worktree without cd |
--skip-git-repo-check | Running from a non-repo directory |
--add-dir <DIR> | Extend read access to another path |
--ephemeral | One-shot session, no persistence |
--json / --output-last-message <FILE> | Capture structured output to a file |
-c model_reasoning_effort="xhigh" | Spec/RFC review only (see Effort Policy) |
| Reviewing | Effort flag |
|---|---|
| Code (commit / diff / PR / WIP) | None, defer to ~/.codex/config.toml |
| Spec / RFC / design doc | -c model_reasoning_effort="xhigh" |
Specs are higher-stakes than diffs, a subtle architectural mistake compounds across the eventual implementation. Code diffs are smaller scope and the user's configured effort is fine.
Never specify --model, -m, or -c model= to override the model itself. User config is authoritative.
The standard review before opening a PR. Use for any non-trivial change.
Step 1, structured review (catches correctness + general issues):
codex review --base main
Step 2, security deep-dive (if code touches auth, input handling, or APIs):
codex exec "<security prompt from references/prompts.md>"
Step 3, fix findings, then re-review:
codex review --base main
Quick check after each meaningful commit.
codex review --commit <SHA>
Review uncommitted work mid-development. Catches issues before they're baked in.
codex review --uncommitted
Surgical deep-dive on a specific concern (error handling, concurrency, data flow).
codex exec --sandbox read-only \
"You are a senior <DOMAIN> engineer. Analyze <CONCERN> in the changes
between main and HEAD. For each issue: cite file and line, explain the
risk, suggest a concrete fix. Confidence threshold: 0.7."
Reviewing prose (markdown design docs) before code is written.
codex exec -c model_reasoning_effort="xhigh" --sandbox read-only \
"You are a senior staff engineer doing a candid pre-implementation review of
<PATH>. The author wants sharp, unsentimental analysis. For each finding:
severity (BLOCKER / HIGH / MEDIUM / LOW), confidence (>= 0.7 only), location
(file path + section heading), the issue, a concrete fix.
End with a one-paragraph go/no-go verdict."
Review one file or directory rather than a full diff.
codex exec --sandbox read-only \
"Review only <PATH> for <CONCERN>. Skip style and ergonomics.
Return PASS if no real issues; otherwise concise FAIL findings with
file:line evidence."
Iterative quality enforcement. Three iterations is the practical ceiling; past that, returns diminish and you start re-litigating findings rather than fixing real bugs.
Iteration 1:
Claude → implement feature
codex review --base main → findings
Claude → fix critical/high findings
Iteration 2:
codex review --base main → verify fixes + catch remaining
Claude → fix remaining issues
Iteration 3 (final):
codex review --base main → clean or accept trade-offs
Thorough reviews benefit from multiple focused passes rather than one vague pass. Single passes dilute attention across dimensions and produce shallow findings on each. Each pass gets a specific persona and concern domain.
| Pass | Focus | Mode |
|---|---|---|
| Correctness | Bugs, logic, edge cases, race conditions | codex review |
| Security | OWASP Top 10:2025, injection, auth, secrets | codex exec with security prompt |
| Architecture | Coupling, abstractions, API consistency | codex exec with architecture prompt |
| Performance | O(n²), N+1 queries, memory leaks | codex exec with performance prompt |
Run passes sequentially. Fix critical findings between passes to avoid noise compounding.
| Change size | Strategy |
|---|---|
| < 50 lines, single concern | Single codex review |
| 50-300 lines, feature work | codex review + security pass |
| 300+ lines or architecture change | Full 4-pass |
| Security-sensitive (auth, payments, crypto) | Always include security pass |
digraph review_decision {
rankdir=TB;
node [shape=diamond];
"What's the artifact?" -> "Code (diff)" [label="git changes"];
"What's the artifact?" -> "Spec (markdown)" [label="design doc"];
"What's the artifact?" -> "Single file/dir" [label="focused"];
node [shape=box];
"Code (diff)" -> "When?" [shape=diamond];
"When?" -> "Pre-commit" [label="writing"];
"When?" -> "Pre-PR" [label="branch ready"];
"When?" -> "Post-commit" [label="just committed"];
"When?" -> "Investigating" [label="specific concern"];
"Pre-commit" -> "Pattern 3: WIP Check";
"Pre-PR" -> "How big?" [shape=diamond];
"Post-commit" -> "Pattern 2: Commit Review";
"Investigating" -> "Pattern 4: Focused Investigation";
"How big?" -> "Pattern 1: Pre-PR Review" [label="< 300 lines"];
"How big?" -> "Full Multi-Pass" [label=">= 300 lines"];
"Spec (markdown)" -> "Pattern 5: Spec Review";
"Single file/dir" -> "Pattern 6: Focused-Path";
}
These reliably improve review signal quality:
Ready-to-use prompt templates are in references/prompts.md.
| Anti-Pattern | Why It Fails | Fix |
|---|---|---|
Bare codex review (no scope flag) | Hangs or produces 100KB+ blob output | Always pass --base <ref>, --commit <SHA>, or --uncommitted |
codex review output > 100KB | Diff too large for one pass | Split per commit, or use codex exec with narrower prompt |
timeout 30 codex review | Reviews legitimately take 30s–5min | No timeout, or timeout 300 minimum |
codex exec "PROMPT" | tail -300 or codex review ... | tail -N | Pipe buffers until EOF (no progress); cuts the summary/verdict (usually near top); dumps full review into agent context | Redirect to file: ... > /tmp/review.txt 2>&1, then head, rg severity, sed-by-range. Human can tail -f separately. |
| "Review this code" (no specifics) | Vague, produces bikeshedding | Specific domain prompts with persona |
| Single pass for everything | Context dilution, shallow on every dimension | Multi-pass with one concern per pass |
| Self-review (Claude reviews Claude's code) | Systematic bias, models approve their own patterns | Cross-model: Claude writes, Codex reviews |
| No confidence threshold | Noise floods signal, 0.3 confidence wastes time | Only act on ≥ 0.7 confidence |
| Style comments in review | LLMs default to bikeshedding | "Skip: formatting, naming, minor docs" |
| > 3 review iterations | Diminishing returns, increasing noise, overbaking | Stop at 3. Accept trade-offs. |
| Review without project context | Generic advice disconnected from codebase | Run from repo root |
MCP wrapper around codex | Unnecessary indirection over a CLI binary | Call codex directly via Bash |
Hardcoding --model / -m / -c model= | Overrides user config; stale model names | Defer to ~/.codex/config.toml |
| Effort override on routine code review | Wastes tokens, ignores user defaults | -c model_reasoning_effort="xhigh" is for spec review only |
--full-auto for pure review | Grants write access the review doesn't need | --sandbox read-only for review; --full-auto only when applying fixes |
For ready-to-use prompt templates, see references/prompts.md.
npx claudepluginhub hyperb1iss/hyperskills --plugin hyperskillsInvokes a different AI model for independent code review, catching bugs that self-review misses. Useful for PR review or second opinions on code.
Cross-model review using OpenAI Codex to independently verify plans or code diffs, iterating up to 5 rounds. Useful for architecture decisions, non-trivial refactors, and critical config changes.
Runs a structured code review using Codex, Claude, or other engines as a closeout check before commit or ship.