Skill

debating-code-reviews

Use this skill when orchestrating multi-agent adversarial code reviews using Claude Code agent teams. Activates when conducting thorough code reviews, setting up review debate teams, spawning specialized review personas, or synthesizing findings from multiple reviewers. Provides spawn prompts, debate protocol, and synthesis templates.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/debate-skills:debating-code-reviews

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Orchestrate a team of 4 specialized reviewers who independently analyze code, then debate each other's findings. Adversarial debate between diverse review perspectives catches significantly more bugs than single-pass review.

Supporting Files

references/REFERENCE.mdreferences/debate-protocol.mdreferences/reviewer-personas.mdreferences/synthesis-template.md

SKILL.md

158 lines · ~1.8k tokens

Stats

Parent stars0

MaintenanceGood

Last CommitMar 3, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Adversarial Debate Code Reviews

Why Debate Works

Single-reviewer code review suffers from blind spots - each reviewer has biases and areas of focus. Adversarial debate forces findings to survive scrutiny:

Independent analysis prevents groupthink
Cross-examination eliminates false positives
Diverse techniques (path tracing, pre-mortem, adversarial testing) cover different bug categories
Devil's advocate challenges ensure every finding has real evidence

Prerequisites

Agent teams are experimental and disabled by default. Enable them by adding the following to your settings.json or shell environment:

{
  "env": {
    "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
  }
}

When to Use

Use the full debate protocol for:

High-risk changes - Authentication, payment, data migration, security-sensitive code
Complex cross-module changes - Changes touching 5+ files or multiple system boundaries
AI-authored code - Code generated by AI that needs confidence before merging
Critical bug fixes - Fixes where getting it wrong is worse than the original bug

See "When to Skip" at the bottom for lighter alternatives.

The Review Team

Reviewer	Role	Primary Technique	Focus
The Tracer	Correctness	Execution path tracing	Logic errors, edge cases, state bugs
The Architect	Design	Pre-mortem analysis	Patterns, complexity, coupling
The Breaker	Adversarial Tester	Adversarial input + test review	Breaking inputs, test quality gaps
The Prosecutor	Devil's Advocate	Five Whys + assertion verification	False positives, overstated severity

Each reviewer is available as a custom agent in ../../agents/ and is configured with Opus, read-only tools, and this skill preloaded. See ./references/reviewer-personas.md for detailed persona rationale and thinking styles.

Running the Debate

Step 1: Create the Agent Team

Create an agent team with the 4 reviewer agents. They are pre-configured in this plugin's agents/ directory (tracer, architect, breaker, prosecutor).

Teammates don't inherit the lead's conversation history - they only get project context (CLAUDE.md, skills, MCP servers) plus the spawn prompt. Include enough detail about the code change in the prompt for reviewers to work independently.

Create an agent team to review [describe the code change - include file paths,
what the change does, and why].

Spawn 4 teammates using the tracer, architect, breaker, and prosecutor agents.
Require plan approval for the prosecutor (it must wait for Round 1 findings).

Create tasks with dependencies:
- Round 1 tasks (no dependencies, run in parallel):
  - "Tracer: review [files] for correctness" → assign to tracer
  - "Architect: review [files] for design" → assign to architect
  - "Breaker: review [files] for adversarial inputs and test quality" → assign to breaker
- Round 2 tasks (depend on all Round 1 tasks):
  - "Prosecutor: challenge Round 1 findings" → assign to prosecutor
  - "All reviewers: respond to challenges and each other's findings"
- Round 3 task (depends on Round 2):
  - "All reviewers: state final positions with confidence levels"

After Round 3, synthesize all findings into a single report.

Step 2: Round 1 - Independent Review (Parallel)

Tracer, Architect, and Breaker review the code independently. All three run in parallel via the shared task list.

The Prosecutor is in plan approval mode during Round 1 - the lead won't approve its plan until Round 1 tasks are complete.

Each reviewer produces findings with:

Severity: BLOCKING / WARNING / NOTE
Evidence: Code quotes, execution traces, or specific scenarios
Impact: What happens if not fixed

Step 3: Round 2 - Challenge (Sequential)

Round 2 tasks automatically unblock when Round 1 completes (via task dependencies).

Lead broadcasts Round 1 findings to all teammates (use broadcast sparingly - it sends to every teammate and costs scale with team size)
Prosecutor's plan is approved - it reviews every finding and provides verdicts (CONFIRMED / DISPUTED / DISMISSED)
Lead messages Tracer, Architect, and Breaker individually with the Prosecutor's challenges and each other's findings. Each responds.

See ./references/debate-protocol.md for detailed messaging patterns.

Step 4: Round 3 - Final Positions

Each reviewer states final findings with confidence levels (HIGH / MEDIUM / LOW).

The Prosecutor provides final triage for each finding.

Step 5: Synthesize

The lead produces the final report using the template in ./references/synthesis-template.md.

The report sections:

Confirmed Findings - Survived debate, ranked by severity
Disputed Findings - Reviewers disagreed, needs human judgment
Dismissed Findings - False positives or retracted
Test Quality Assessment - From the Breaker's test review
Debate Highlights - Notable disagreements and discoveries

Severity Definitions

Severity	Meaning	Action
BLOCKING	Incorrect behavior, data loss, or security issue	Must fix before merge
WARNING	Design issue, missing edge case, or test gap	Should fix, may defer with justification
NOTE	Minor improvement or observation	Consider, no action required

Verdicts

Verdict	When to Use
APPROVE	No BLOCKING findings, warnings are minor
APPROVE WITH CHANGES	No BLOCKING findings, but warnings worth addressing
REQUEST CHANGES	One or more confirmed BLOCKING findings

When to Skip

The full 4-reviewer debate is thorough but has a cost (time and tokens). Use lighter approaches for lower-risk changes:

Change Type	Recommended Approach
High-risk, cross-module, AI-authored	Full debate (4 reviewers, 3 rounds)
Medium-risk, single module	2 reviewers (Tracer + Breaker), no Prosecutor
Low-risk, small change	Single reviewer (Tracer for logic, Architect for design)
Documentation, config, formatting	Standard code review, no debate needed

Quick Reference

Reviewer agents: ../../agents/ (tracer, architect, breaker, prosecutor)
Persona rationale and thinking styles: ./references/reviewer-personas.md
Round structure and messaging: ./references/debate-protocol.md
Report template and triage: ./references/synthesis-template.md

debating-code-reviews

Invocation

Context Preview

Supporting Files

SKILL.md

debating-code-reviews

Invocation

Context Preview

Supporting Files

SKILL.md

Adversarial Debate Code Reviews

Why Debate Works

Prerequisites

When to Use

The Review Team

Running the Debate

Step 1: Create the Agent Team

Step 2: Round 1 - Independent Review (Parallel)

Step 3: Round 2 - Challenge (Sequential)

Step 4: Round 3 - Final Positions

Step 5: Synthesize

Severity Definitions

Verdicts

When to Skip

Quick Reference

Similar Skills

Adversarial Debate Code Reviews

Why Debate Works

Prerequisites

When to Use

The Review Team

Running the Debate

Step 1: Create the Agent Team

Step 2: Round 1 - Independent Review (Parallel)

Step 3: Round 2 - Challenge (Sequential)

Step 4: Round 3 - Final Positions

Step 5: Synthesize

Severity Definitions

Verdicts

When to Skip

Quick Reference

Similar Skills