By HajinJ
Full AI development and business lifecycle orchestrator - Always-On routing, codebase analysis, MCP detection, static analysis integration, STRIDE threat modeling, multi-AI adversarial code/business review with external CLI models (Codex subagents with per-agent model config/Gemini), evidence tiering, adversarial red team, business model benchmarking, 3-round multi-agent debate with Round 4 escalation, CSV batch review, auto-fix loop, test generation, fallback framework, cost estimation, commit/PR safety gate, and feedback-based routing
Based on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Run AI Review Arena business review through the typed runtime
Run AI Review Arena documentation review through the typed runtime
Run AI Review Arena research through the typed runtime
Detect project technology stack through the runtime compatibility layer
Run AI Review Arena through the typed runtime
Agent Team teammate. Platform compliance and guideline specialist. Detects feature types, identifies required platform guidelines, and verifies implementation compliance.
Agent Team teammate. Accessibility reviewer. Evaluates WCAG 2.1 AA compliance, ARIA usage correctness, keyboard navigation, color contrast, screen reader compatibility, and semantic HTML structure.
Agent Team teammate. Accuracy and evidence reviewer. Validates business claims against verifiable data, evaluates statistical support quality, verifies data sources, assesses projection methodology soundness, and checks benchmark comparisons.
Agent Team teammate. API contract reviewer. Validates REST/GraphQL/gRPC schema consistency, HTTP semantics, versioning strategy, backward compatibility, and breaking change detection.
Agent Team teammate. Architecture and design reviewer. Evaluates SOLID principles, design patterns, coupling/cohesion, dependency management, and code organization.
Modifies files
Hook triggers on file write and edit operations
Uses power tools
Uses Bash, Write, or Edit tools
AI Review Arena is a CLI-first review harness for running multiple AI code reviewers, attaching local RAG evidence, comparing results with deterministic benchmarks, and exporting integration files for Claude Code, Codex, and Gemini CLI workflows.
The project does not depend on hosted provider APIs. Provider execution is intentionally CLI-based so it can run with the same tools a developer already uses locally.
flowchart LR
A[User /<br/>Claude Code Hook] --> B[Runner<br/>state machine]
B --> C[RAG Engine<br/>BM25 + symbol/import]
C --> D{Provider Runner}
D --> E[Claude CLI]
D --> F[Codex CLI]
D --> G[Gemini CLI]
E --> H[Aggregator<br/>+ Debate]
F --> H
G --> H
H --> I[Report Gen<br/>+ Auto-fix]
I --> J[OTel Export<br/>JSONL/OTLP/HTTP]
B -.->|policy gate| K[MCP Runtime<br/>tool allowlist]
style A fill:#dbeafe,stroke:#2563eb
style D fill:#fef3c7,stroke:#d97706
style H fill:#dcfce7,stroke:#16a34a
style J fill:#f3e8ff,stroke:#9333ea
The runner is a typed Python state machine that owns phases, retries, timeouts, and error recovery. CLI providers are isolated adapters. RAG runs locally over the project tree (no external vector store). MCP tool calls pass through a policy gate with allowlist and side-effect approval.
The modern runtime lives in arena_runtime/ and is entered through:
python3 scripts/arena-runtime.py <command> [args]
Shell is not used as the orchestration layer. Remaining shell files are tests or developer support scripts, not the review runtime.
Core areas:
arena_runtime/entrypoint.py: command dispatch.arena_runtime/provider_runner.py: CLI provider adapters.arena_runtime/rag_runtime.py: evidence retrieval.arena_runtime/benchmarking.py: deterministic and live benchmark commands.arena_runtime/harness.py: event bus and OTel export/push.arena_runtime/mcp_runtime.py: MCP tool-call wrapper and stdio server.arena_runtime/exporters.py: Claude, Codex, and Gemini integration exports.config/default-config.json: default model, policy, RAG, benchmark, and MCP settings.Install the package from a checkout:
python3 -m pip install -e .
arena validate-config config/default-config.json
Validate the local configuration:
python3 scripts/arena-runtime.py validate-config config/default-config.json
Run CLI diagnostics without calling live models:
arena cli-diagnostics --config config/default-config.json
arena provider-smoke --models codex,gemini,claude --timeout 30
Index and retrieve local RAG evidence:
python3 scripts/arena-runtime.py rag-indexer . --config config/default-config.json
python3 scripts/arena-runtime.py rag-evidence . security "credential handling" --config config/default-config.json --top-k 5
Run deterministic benchmark paths:
python3 scripts/arena-runtime.py retrieval-benchmark --config config/default-config.json --max-cases 3
python3 scripts/arena-runtime.py benchmark-harness-ablation --config config/default-config.json --max-cases 3
Run a bounded live provider sample when the CLIs are installed and authenticated:
arena benchmark-models --category security --models codex,gemini --live --smoke --timeout 90 --preflight-timeout 30 --require-live-success
Claude Code integration is project-local and explicit. Install the hook and agent files into the current project with:
python3 scripts/arena-runtime.py install-claude-integration --project-root .
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimnpx claudepluginhub hajinj/ai-review-arena --plugin ai-review-arenaMulti-agent adversarial review panel — 4-6 AI reviewers debate your code/plans, then a judge delivers a structured verdict with epistemic labels. Bundles plan-review-integrator for applying review findings back into implementation plans.
v9.44.1 — Patch release for Gemini environment/version detection and qwen auth gating. Run /octo:setup.
Automatic code review, adversarial review, and rescue via Codex.
AI-powered code review analysis — Run three-level AI analysis and implement-review-fix loops directly in your coding agent. Works standalone, no server required.
Personal Claude Code + Codex dev stack: security hooks, AI-first code conventions, /security-review, /repo-map, /stack-check, portable statusline. Designed to complement other skills-based plugins, not replace them.
Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.