Skill

Review Orchestration

This skill should be used when the user asks about "multi-AI code review", "codex review", "gemini code review", "claude code review", "review loop", "AI code review workflow", "validate review findings", "three reviewer", "codex and gemini review", or needs guidance on orchestrating code reviews with multiple AI reviewers (Codex CLI, Gemini CLI, Claude Sonnet subagent), validating their findings against actual code, and managing iterative review-fix cycles.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/multi-ai-reviewer:review-orchestration

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Orchestrate code reviews using three AI reviewers (Codex CLI, Gemini CLI, and a Claude Sonnet 4.6 subagent), validate findings against actual source code, and manage iterative review-fix cycles.

SKILL.md

225 lines · ~2.6k tokens

Stats

Parent stars0

MaintenanceGood

Last CommitMar 2, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Multi-AI Review Orchestration

Orchestrate code reviews using three AI reviewers (Codex CLI, Gemini CLI, and a Claude Sonnet 4.6 subagent), validate findings against actual source code, and manage iterative review-fix cycles.

Core Concepts

The review orchestration workflow coordinates three independent AI reviewers, validates their findings with evidence from the actual codebase, and drives an iterative loop of fix-and-re-review until the code is clean. Communication with AI reviewers is always in English. Communication with the user is always in 繁體中文 (Taiwan Chinese).

Reviewer Summary

Reviewer	Invocation	Model	Session Resume	File Access
Codex	Bash (`codex exec review`)	OpenAI	Yes (`codex exec resume --last`)	Via git diff (internal)
Gemini	Bash (`gemini -p --yolo`)	Google	Yes (`gemini --resume latest`)	Via shell commands (`git diff`, `cat`) enabled by `--yolo` flag
Claude	Task tool (subagent)	Sonnet 4.6	No (pass full context each time)	Direct via Read/Grep/Glob tools

All three run in parallel: two Bash tool calls + one Task tool call in a single response.

Codex CLI Patterns

Important: Use `codex exec review`, not `codex review`

codex review is a TUI-only slash command — it does NOT work as a standalone CLI command. For non-interactive (headless) code review, always use codex exec review.

Additionally, 2>/dev/null alone produces empty output for review and resume commands. You must use --json + jq to extract the final review.

Initial Code Review

Run a code review against a base branch with JSONL output:

codex exec review --base main --json 2>/dev/null | jq -rs '
  "=== CODEX REASONING ===\n"
  + ([.[] | select(.item.type == "reasoning") | .item.text] | unique | join("\n"))
  + "\n\n=== CODEX REVIEW ===\n"
  + ([.[] | select(.item.type == "agent_message")] | last | .item.text)
'

codex exec review --base main — headless code review comparing current branch against main
--json — outputs structured JSONL to stdout (required for extraction)
2>/dev/null — suppresses stderr noise (progress indicators)
jq extracts the final review (agent_message) and deduplicated thinking summaries (reasoning)
This creates a resumable session for follow-up

JSONL Output Structure

The --json flag produces one JSON object per line. Relevant item types:

`item.type`	Count	Description
`reasoning`	Many (duplicates exist)	Thinking step summaries. Appear in pairs with identical text but different IDs — deduplicate with `unique`.
`command_execution`	Many	Internal commands Codex ran (git diff, cat, etc.). Usually not needed.
`agent_message`	1	The final review verdict. This is what you want.

Wrapper events (thread.started, turn.started, turn.completed) have no item.type and can be ignored.

Follow-up Review (Resume Session)

Resume the previous codex session with a follow-up prompt:

codex exec resume --last "We fixed X because Y. Please re-review and check if previous concerns are addressed." --json 2>/dev/null | jq -rs '
  "=== CODEX REASONING ===\n"
  + ([.[] | select(.item.type == "reasoning") | .item.text] | unique | join("\n"))
  + "\n\n=== CODEX REVIEW ===\n"
  + ([.[] | select(.item.type == "agent_message")] | last | .item.text)
'

codex exec resume --last — resumes most recent session, keeps full transcript and plan history
The follow-up prompt is appended to the existing conversation
--json 2>/dev/null | jq ... — same extraction pattern as initial review

Key Flags

--json — machine-readable JSONL output (required for headless extraction)
-s read-only — sandbox: prevent any file modifications
-a never — never ask for approval (fully autonomous)
-o output.txt — write final message to file for downstream processing
-m <model> — specify model
-C <dir> — set working directory

Gemini CLI Patterns

Review Style

Gemini reviews are framed as PR-style reviews toward the base branch. The prompt uses a Principal Software Engineer persona with structured severity classification (CRITICAL/HIGH/MEDIUM/LOW) and strict output formatting. Gemini runs with --yolo flag to enable shell command execution, allowing it to run git diff, cat, and other read commands directly instead of receiving large diffs in the prompt (which causes OOM crashes).

Initial Non-Interactive Review

gemini -p "<PR-style review prompt instructing Gemini to run git diff BASE_BRANCH...HEAD itself>" --yolo -o text 2>&1

See commands/review-loop.md Step 2 for the complete prompt template.

Follow-up Review (Resume Session)

Resume a previous Gemini session. Gemini can run shell commands to inspect the current code state:

gemini --resume latest -p "We made changes to address your review. FIXED: [...] NOT FIXED: [...] Run git diff BASE_BRANCH...HEAD to see the current state. Re-review using the same standards." --yolo -o text 2>&1

--resume latest — resumes most recent session for the current project
-p "prompt" — non-interactive mode with the given prompt
--yolo — auto-approves all tool calls, enabling Gemini to run shell commands (git diff, cat, etc.)
If --resume latest fails, fall back to a fresh call with the full PR-style prompt plus previous context. Always include --yolo.

Key Flags

-p "prompt" — non-interactive (headless) mode
--yolo — auto-approve all actions, enables shell command execution (REQUIRED to avoid OOM from large injected diffs)
-o text — plain text output format
--resume latest — resume most recent session

Claude Subagent Patterns

Overview

The Claude reviewer runs as a Task tool subagent with subagent_type: "general-purpose" and model: "sonnet". Unlike Codex and Gemini, it has direct access to Read, Grep, and Glob tools, allowing it to inspect files beyond just diffs.

Initial Code Review

Launch a Task tool subagent with this prompt structure:

You are a code reviewer. Review the code changes between <base-branch> and HEAD in this repository.

Changed files:
[file list]

Recent commits:
[commit log]

Instructions:
1. Use the Read tool to examine each changed file
2. Use `git diff <base-branch>...HEAD -- <file>` via Bash to see the exact changes per file
3. Review all changes thoroughly. Focus on:
   - Bugs and logic errors
   - Security vulnerabilities
   - Performance issues
   - Code quality and maintainability
   - Edge cases and error handling

For each finding, provide:
- File path and line number
- Clear description of the issue
- Severity: critical / warning / info
- Suggested fix

If no issues found, state that explicitly.
Do NOT fix any code. Only report findings.

Follow-up Review (No Session Resume)

Claude subagents have no session resume. For re-reviews, include full context in the prompt:

All previous findings (from all reviewers, with source attribution)
What was fixed (file, change description, reason)
What was NOT fixed (with explanation)
The changed files list (same as initial)

The subagent will use Read/Grep tools to verify the current state of the code.

Key Constraints

Read-only: The subagent prompt must explicitly say "Do NOT fix any code. Only report findings."
Model: Always use model: "sonnet" for Claude Sonnet 4.6
No isolation needed: Runs against the current working directory (no worktree)

Finding Validation Methodology

Never blindly trust AI reviewer findings. Validate each finding:

Validation Steps

Read the source — Use Read tool to load the file at the referenced line
Check context — Read surrounding lines (±10-20 lines) for full context
Verify the claim — Does the issue actually exist in the code?
Assess severity — Is the severity rating accurate?
Identify root cause — If valid, what is the underlying cause?
Evaluate fix — Is the suggested fix correct and complete?

Classification

有效 (Valid) — Issue confirmed in code, reviewer's analysis is correct
無效 (Invalid) — False positive, misunderstanding, or outdated observation
部分有效 (Partially Valid) — Concern has merit but details are inaccurate

Common False Positives

Style/convention issues that match the project's established patterns
"Missing" error handling that is handled at a higher level
Performance concerns in non-hot paths
Security flags on internal-only code
Suggestions that conflict with framework conventions

Re-review Communication Pattern

When sending follow-up reviews to AI reviewers, include:

What was fixed — List each fix with what was changed and why
What was NOT fixed — List each skipped finding with clear reasoning
Request — Ask reviewer to check if previous concerns are addressed and identify remaining or new issues

Codex and Gemini support session resume, so prior context is preserved automatically. Claude subagent has no session resume — pass full context (previous findings + fixes) in the prompt each time.

Timeout and Error Handling

Use 300000ms (5 minute) timeout for all CLI calls (Codex, Gemini)
If a reviewer times out, report the timeout and continue with the other reviewers' findings
If all three time out or fail, report to user and suggest retry
If codex exec resume --last fails, fall back to a fresh codex exec review --base main --json 2>/dev/null | jq ...
If gemini --resume latest fails, fall back to fresh gemini -p "..." with full context
If the Claude subagent fails, report the error and continue with Codex and Gemini findings

Review Orchestration

Invocation

Context Preview

SKILL.md

Review Orchestration

Invocation

Context Preview

SKILL.md

Multi-AI Review Orchestration

Core Concepts

Reviewer Summary

Codex CLI Patterns

Important: Use codex exec review, not codex review

Initial Code Review

JSONL Output Structure

Follow-up Review (Resume Session)

Key Flags

Gemini CLI Patterns

Review Style

Initial Non-Interactive Review

Follow-up Review (Resume Session)

Key Flags

Claude Subagent Patterns

Overview

Initial Code Review

Follow-up Review (No Session Resume)

Key Constraints

Finding Validation Methodology

Validation Steps

Classification

Common False Positives

Re-review Communication Pattern

Timeout and Error Handling

Similar Skills

Multi-AI Review Orchestration

Core Concepts

Reviewer Summary

Codex CLI Patterns

Important: Use codex exec review, not codex review

Initial Code Review

JSONL Output Structure

Follow-up Review (Resume Session)

Key Flags

Gemini CLI Patterns

Review Style

Initial Non-Interactive Review

Follow-up Review (Resume Session)

Key Flags

Claude Subagent Patterns

Overview

Initial Code Review

Follow-up Review (No Session Resume)

Key Constraints

Finding Validation Methodology

Validation Steps

Classification

Common False Positives

Re-review Communication Pattern

Timeout and Error Handling

Similar Skills

Important: Use `codex exec review`, not `codex review`

Important: Use `codex exec review`, not `codex review`