AI Review Arena

AI Review Arena is a CLI-first review harness for running multiple AI code reviewers, attaching local RAG evidence, comparing results with deterministic benchmarks, and exporting integration files for Claude Code, Codex, and Gemini CLI workflows.

The project does not depend on hosted provider APIs. Provider execution is intentionally CLI-based so it can run with the same tools a developer already uses locally.

What it does

Runs Codex, Gemini, and Claude-oriented review flows through a typed Python runtime.
Retrieves local evidence chunks with BM25, symbol, import, changed-file, and file-hint scoring.
Attaches evidence chunks to findings so reports can show why a finding was produced.
Benchmarks severity calibration, retrieval recall, harness ablations, and live provider samples.
Emits structured harness events for tracing, cost, latency, tool calls, RAG, debate, aggregation, report generation, benchmark, and auto-fix phases.
Exports OpenTelemetry-compatible JSONL or OTLP JSON, and can push OTLP JSON to an HTTP collector endpoint.
Provides a policy-gated MCP runtime, including a JSON-RPC stdio server for configured MCP tools.
Installs Claude Code hooks and agent files for project-local integration when explicitly enabled.

Architecture

flowchart LR
    A[User /<br/>Claude Code Hook] --> B[Runner<br/>state machine]
    B --> C[RAG Engine<br/>BM25 + symbol/import]
    C --> D{Provider Runner}
    D --> E[Claude CLI]
    D --> F[Codex CLI]
    D --> G[Gemini CLI]
    E --> H[Aggregator<br/>+ Debate]
    F --> H
    G --> H
    H --> I[Report Gen<br/>+ Auto-fix]
    I --> J[OTel Export<br/>JSONL/OTLP/HTTP]
    B -.->|policy gate| K[MCP Runtime<br/>tool allowlist]

    style A fill:#dbeafe,stroke:#2563eb
    style D fill:#fef3c7,stroke:#d97706
    style H fill:#dcfce7,stroke:#16a34a
    style J fill:#f3e8ff,stroke:#9333ea

The runner is a typed Python state machine that owns phases, retries, timeouts, and error recovery. CLI providers are isolated adapters. RAG runs locally over the project tree (no external vector store). MCP tool calls pass through a policy gate with allowlist and side-effect approval.

Runtime model

The modern runtime lives in arena_runtime/ and is entered through:

python3 scripts/arena-runtime.py <command> [args]

Shell is not used as the orchestration layer. Remaining shell files are tests or developer support scripts, not the review runtime.

Core areas:

arena_runtime/entrypoint.py: command dispatch.
arena_runtime/provider_runner.py: CLI provider adapters.
arena_runtime/rag_runtime.py: evidence retrieval.
arena_runtime/benchmarking.py: deterministic and live benchmark commands.
arena_runtime/harness.py: event bus and OTel export/push.
arena_runtime/mcp_runtime.py: MCP tool-call wrapper and stdio server.
arena_runtime/exporters.py: Claude, Codex, and Gemini integration exports.
config/default-config.json: default model, policy, RAG, benchmark, and MCP settings.

Quick start

Install the package from a checkout:

python3 -m pip install -e .
arena validate-config config/default-config.json

Validate the local configuration:

python3 scripts/arena-runtime.py validate-config config/default-config.json

Run CLI diagnostics without calling live models:

arena cli-diagnostics --config config/default-config.json
arena provider-smoke --models codex,gemini,claude --timeout 30

Index and retrieve local RAG evidence:

python3 scripts/arena-runtime.py rag-indexer . --config config/default-config.json
python3 scripts/arena-runtime.py rag-evidence . security "credential handling" --config config/default-config.json --top-k 5

Run deterministic benchmark paths:

python3 scripts/arena-runtime.py retrieval-benchmark --config config/default-config.json --max-cases 3
python3 scripts/arena-runtime.py benchmark-harness-ablation --config config/default-config.json --max-cases 3

Run a bounded live provider sample when the CLIs are installed and authenticated:

arena benchmark-models --category security --models codex,gemini --live --smoke --timeout 90 --preflight-timeout 30 --require-live-success

Product docs

Claude Code integration

Claude Code integration is project-local and explicit. Install the hook and agent files into the current project with:

python3 scripts/arena-runtime.py install-claude-integration --project-root .

AI Review Arena

The project does not depend on hosted provider APIs. Provider execution is intentionally CLI-based so it can run with the same tools a developer already uses locally.

What it does

Runs Codex, Gemini, and Claude-oriented review flows through a typed Python runtime.
Retrieves local evidence chunks with BM25, symbol, import, changed-file, and file-hint scoring.
Attaches evidence chunks to findings so reports can show why a finding was produced.
Benchmarks severity calibration, retrieval recall, harness ablations, and live provider samples.
Emits structured harness events for tracing, cost, latency, tool calls, RAG, debate, aggregation, report generation, benchmark, and auto-fix phases.
Exports OpenTelemetry-compatible JSONL or OTLP JSON, and can push OTLP JSON to an HTTP collector endpoint.
Provides a policy-gated MCP runtime, including a JSON-RPC stdio server for configured MCP tools.
Installs Claude Code hooks and agent files for project-local integration when explicitly enabled.

Architecture

flowchart LR
    A[User /<br/>Claude Code Hook] --> B[Runner<br/>state machine]
    B --> C[RAG Engine<br/>BM25 + symbol/import]
    C --> D{Provider Runner}
    D --> E[Claude CLI]
    D --> F[Codex CLI]
    D --> G[Gemini CLI]
    E --> H[Aggregator<br/>+ Debate]
    F --> H
    G --> H
    H --> I[Report Gen<br/>+ Auto-fix]
    I --> J[OTel Export<br/>JSONL/OTLP/HTTP]
    B -.->|policy gate| K[MCP Runtime<br/>tool allowlist]

    style A fill:#dbeafe,stroke:#2563eb
    style D fill:#fef3c7,stroke:#d97706
    style H fill:#dcfce7,stroke:#16a34a
    style J fill:#f3e8ff,stroke:#9333ea

Runtime model

The modern runtime lives in arena_runtime/ and is entered through:

python3 scripts/arena-runtime.py <command> [args]

Shell is not used as the orchestration layer. Remaining shell files are tests or developer support scripts, not the review runtime.

Core areas:

arena_runtime/entrypoint.py: command dispatch.
arena_runtime/provider_runner.py: CLI provider adapters.
arena_runtime/rag_runtime.py: evidence retrieval.
arena_runtime/benchmarking.py: deterministic and live benchmark commands.
arena_runtime/harness.py: event bus and OTel export/push.
arena_runtime/mcp_runtime.py: MCP tool-call wrapper and stdio server.
arena_runtime/exporters.py: Claude, Codex, and Gemini integration exports.
config/default-config.json: default model, policy, RAG, benchmark, and MCP settings.

Quick start

Install the package from a checkout:

python3 -m pip install -e .
arena validate-config config/default-config.json

Validate the local configuration:

python3 scripts/arena-runtime.py validate-config config/default-config.json

Run CLI diagnostics without calling live models:

arena cli-diagnostics --config config/default-config.json
arena provider-smoke --models codex,gemini,claude --timeout 30

Index and retrieve local RAG evidence:

python3 scripts/arena-runtime.py rag-indexer . --config config/default-config.json
python3 scripts/arena-runtime.py rag-evidence . security "credential handling" --config config/default-config.json --top-k 5

Run deterministic benchmark paths:

python3 scripts/arena-runtime.py retrieval-benchmark --config config/default-config.json --max-cases 3
python3 scripts/arena-runtime.py benchmark-harness-ablation --config config/default-config.json --max-cases 3

Run a bounded live provider sample when the CLIs are installed and authenticated:

arena benchmark-models --category security --models codex,gemini --live --smoke --timeout 90 --preflight-timeout 30 --require-live-success

Product docs

Claude Code integration

Claude Code integration is project-local and explicit. Install the hook and agent files into the current project with:

python3 scripts/arena-runtime.py install-claude-integration --project-root .

ai-review-arena

Popularity

Confidence

What's Inside

README

AI Review Arena

What it does

Architecture

Runtime model

Quick start

Product docs

Claude Code integration

Similar Plugins

roundtable

octo

redline

code-critic

claude-leverage

fullstack-dev-skills

AI Review Arena

What it does

Architecture

Runtime model

Quick start

Product docs

Claude Code integration

Popularity

Health & Quality

Similar Plugins

roundtable

octo

redline

code-critic

claude-leverage

fullstack-dev-skills