From parallel-adversarial-review
Runs parallel code reviews with multiple AI models (claude, codex, gemini), performs cross-critiques to detect hallucinations and severity issues, then synthesizes a deduplicated report for high-stakes reviews like security or pre-merge.
How this skill is triggered — by the user, by Claude, or both
Slash command
/parallel-adversarial-review:multi-model-adversarial-reviewThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A three-stage review pipeline that uses multiple installed coding-agent CLIs as independent reviewers, then has them critique each other's findings, then synthesizes a final report. Catches model-specific blind spots and hallucinations that single-model PAR cannot.
A three-stage review pipeline that uses multiple installed coding-agent CLIs as independent reviewers, then has them critique each other's findings, then synthesizes a final report. Catches model-specific blind spots and hallucinations that single-model PAR cannot.
| Situation | Use |
|---|---|
| Routine review, normal stakes | parallel-adversarial-review (faster, cheaper) |
| Pre-merge review on hot path code | MMAR |
| Security review | MMAR |
| Production incident postmortem code change | MMAR |
| You suspect a model has a blind spot for this kind of code | MMAR |
| Compliance / audit artifact | MMAR |
MMAR costs more (N+1 model invocations + cross-critique). Don't reach for it on every commit.
Stage 1: Parallel Reviews
┌──────────────┬──────────────┬──────────────┐
diff ──────► │ claude │ codex │ gemini │ ───► findings_<model>.md
└──────┬───────┴──────┬───────┴──────┬───────┘
│ │ │
Stage 2: Cross-Critique (NxN-1 grid)
┌──────────────┬──────────────┬──────────────┐
│ codex critiq │ gemini critiq│ codex critiq │
│ of claude │ of claude │ of gemini │ ───► critique_<a>_of_<b>.md
│ ... │
└──────┬───────┴──────┬───────┴──────┬───────┘
│ │ │
Stage 3: Synthesis
┌──────────────────────────────────────────────┐
│ synthesizer (Claude as subagent or CLI) │
│ - dedupe across reviewers │
│ - drop hallucinations flagged by critics │
│ - apply severity-disagreement rule │
│ - produce final findings report │
└──────────────────────────────────────────────┘
You do not call CLIs by hand. Run the driver:
${CLAUDE_PLUGIN_ROOT}/scripts/mmar.py review <diff_path_or_- > [options]
Options:
--reviewers claude,codex,gemini — which CLIs to use (default: auto-detect)--workdir <dir> — repo to run reviewers in (default: cwd)--out <dir> — where to write per-stage artifacts (default: ./.mmar/<timestamp>/)--mock-dir <dir> — read pre-recorded responses from <dir>/<stage>/<reviewer>.txt instead of calling CLIs (for evals and CI)--skip-critique — stage 1 + stage 3 only, no cross-critique (degrades to multi-model PAR)--domain-prompt <file> — path to a file with the domain-specific reviewer instructions (e.g. "review for security bugs"); defaults to a generic code-quality promptThe driver writes per-stage artifacts and a final findings.md in the output directory. Read it. Pass it to the implementer or whoever owns the next stage.
CLI invocations are defined in scripts/adapters.toml. Each entry maps a CLI name to a command template. To add a new CLI or fix an invocation that broke (CLIs change their flags), edit that file. The driver reads it on every run.
Default-on reviewers (enabled if installed): claude, codex, gemini, pi, opencode.
Opt-in reviewers (enabled=false; flip in adapters.toml after configuring credentials): amp, droid.
If a CLI is not installed or not enabled, it is silently skipped. The driver requires at least 2 reviewers to proceed; with 1 it errors and tells you to install another or use plain PAR.
Stage 1 reviewers get the prompt in reviewer-wrapper.md. Stage 2 critics get the prompt in critic-wrapper.md. Stage 3 synthesis uses synthesizer-prompt.md. The driver assembles these from the diff and findings; you do not invoke them directly.
The synthesizer applies:
[high] if found by ≥2 reviewers and not flagged by any critic; [medium] if found by 1 reviewer and not flagged; [low] if found by 1 and partially flagged.These rules are enforced by synthesizer-prompt.md. Do not negotiate with them.
| Failure | Response |
|---|---|
| A reviewer CLI hangs | Driver has a per-CLI timeout (default 5 min). Kill that adapter; continue with the rest. |
| A reviewer returns empty/unparseable output | Treated as "no findings"; do not retry — that biases the result. |
| Cross-critique pairs balloon (6 reviewers → 30 critiques) | Driver caps critiques per reviewer at 3 random others. Configurable. |
| Cost concerns | Use --reviewers to pick fewer, or use plain PAR. |
| Network down | Run with --mock-dir over a recorded fixture, or use plain PAR with the local Claude session. |
The eval suite in evals/ measures recall and precision against fixtures with planted defects. Run:
${CLAUDE_PLUGIN_ROOT}/evals/runner.py --mode mock # cheap, deterministic, CI-safe
${CLAUDE_PLUGIN_ROOT}/evals/runner.py --mode live # actually invokes CLIs (costs $$)
If you change the wrappers or synthesizer, re-run evals before merging. A regression on recall or precision is a blocker.
reviewer-wrapper.md, critic-wrapper.md, synthesizer-prompt.md — the prompts the driver uses.parallel-adversarial-review skill — single-model PAR for routine reviews.scripts/adapters.toml — how to add or fix a CLI invocation.npx claudepluginhub prime-radiant-inc/parallel-adversarial-reviewDispatches two same-model reviewer subagents in parallel under competitive scoring to review diffs, commits, branches, or specs against requirements, aggregating findings with worst severity on disagreements.
Coordinates multi-agent code reviews across quality, security, architecture, performance, and compliance. Useful for comprehensive artifact analysis beyond single-perspective review.
Orchestrates multi-agent code review with Codex CLI, Gemini CLI, and five Claude specialist subagents (security, performance, logic, regression, robustness) then synthesizes findings into verified fixes. Use for deep reviews, second opinions, or council reviews on PRs, commits, or branches.