From buymeagoat-skills
Full multi-model codebase review. Dispatches Codex, Gemini, and Copilot in parallel to find bugs, security issues, tech debt, and gaps. Deduplicates findings via Gemini, classifies into structured backlogs under .planning/, and auto-routes findings to agent-remediate-loop by severity tier. Outputs review artifacts to .planning/reviews/YYYYMMDD-<agent>-findings.md. Trigger: "review codebase", "run triage", "full review", "find issues", "/agent-review-loop".
How this skill is triggered — by the user, by Claude, or both
Slash command
/buymeagoat-skills:agent-review-loopThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are running the **agent-review-loop** pipeline.
You are running the agent-review-loop pipeline.
| Agent | Role in this skill |
|---|---|
| Claude Code | Orchestrates — builds prompts, routes results, classifies findings, writes state |
| Codex | Line-level review — bugs, logic errors, dead code, null checks, type mismatches |
| Gemini | Cross-file review — architecture, schema drift, system-level security, data flow |
| Copilot | Security + quality review — antipatterns, dependency hygiene, test gaps, API design |
Claude Code owns all state mutations (backlogs, active-context). Reviewers only produce output.
Codex reviews best when given explicit read authorization and a strict output contract. Key directives to include:
<path>" (prevents stdout dumping)**[SEVERITY]** \file:line` — [CATEGORY] — Description. Fix.`Codex failure modes:
Gemini's edge is large-context cross-file reasoning. Prime it to use that advantage:
timeout_seconds=1200 for large reposFallback chain if MCP fails: gemini -m gemini-2.5-pro → gemini-2.5-flash → gemini-1.5-pro
Gemini failure modes:
MODEL_CAPACITY_EXHAUSTED — use fallback chain, never block the pipelineCopilot adds a confidence dimension other reviewers lack. Use it:
"confidence": "high|medium|low" on every finding"fingerprint" field (slug of file:line:category) for cross-run dedupNO_FINDINGS sentinel means clean — write that to file, don't treat as failureAfter all three reviews complete, use Gemini again for cross-source deduplication. This is Gemini's highest-value use in this pipeline — large context lets it read all three review files simultaneously and merge findings without truncation. Claude Code doing this inline would burn context; Gemini does it in one call.
After dedup, Claude Code classifies and writes to backlogs incrementally. Write each finding immediately after classifying — do not batch. Batching risks losing state if the session is interrupted.
REPO=$(basename $(git rev-parse --show-toplevel))
DATE=$(date +%Y%m%d)
mkdir -p .planning/reviews .planning/archive
[ -f .claude/model-routing.md ] && wc -c < .claude/model-routing.md | grep -v '^0$' && echo "routing: ok" || echo "ERROR: .claude/model-routing.md missing or empty"
echo "REPO=$REPO DATE=$DATE"
If model-routing.md missing or empty: stop. Create it first. Default for unknown task types: Gemini.
Extract planned-work preamble to prevent false positives on in-progress items:
grep -A 30 "## Next Phase\|## In Progress\|## Planned" active-context.md 2>/dev/null | head -40
Store as $PLANNED_WORK. If nothing found: "No planned-work context available."
Announce: agent-review-loop start — repo: $REPO — date: $DATE
Build all three prompts. After building, verify DATE and REPO are substituted — print resolved Codex output path:
Codex output path: .planning/reviews/$DATE-codex-findings.md
If path contains literal DATE or REPO: stop and fix substitution before proceeding.
SEVERITY definitions:
CRITICAL: data loss, auth bypass, crash, security breach
HIGH: broken feature, wrong output, missing validation at boundary
MEDIUM: degraded behavior, performance regression, missing error handling
LOW: code quality, style, dead code, minor UX friction
Exclude: .planning/archive/, keep/, node_modules/, __pycache__/, dist/, build/,
*.lock, uploads/, generated assets, .claude/, AGENTS.md, CLAUDE.md, GEMINI.md
For this review task only, you are explicitly authorized to inspect source files broadly
and write exactly one review artifact under .planning/reviews/. Do not edit application
code, backlog files, active-context.md, or any other docs.
Scope: source code, config, tests, API boundaries, security-sensitive paths, persistence,
auth/session logic, migrations, scripts, frontend user flows, integration calls.
[SEVERITY RUBRIC]
[EXCLUDE PATTERNS]
The following are intentional, planned, or in-progress — do not flag as issues:
$PLANNED_WORK
Findings must be:
- Actionable and tied to an existing file and line number.
- Reproducible or strongly evidenced — include a concrete fix.
- If no line-specific evidence exists, omit the finding.
- Speculative architecture opinions: LOW only, or omit.
- Prefer fewer high-signal findings over exhaustive noise.
Output format — one finding per line, no prose, no headers, no summaries:
**[SEVERITY]** `file:line` — **[CATEGORY]** — Description. Suggested fix.
CATEGORY options: BUG | SECURITY | TECH-DEBT | FEATURE-GAP | UX | PERFORMANCE | ARCHITECTURE | DEPENDENCY | TEST-GAP
Write ALL findings to: .planning/reviews/$DATE-codex-findings.md
Confirm: absolute path + finding count only. Do not return findings in your response.
You have access to filesystem tools. Use them to read all source files before beginning.
Do not skip files due to length or count.
Repository root: <project_root>
Read all .py, .ts, .tsx, .json, .yaml, .toml files recursively. Do not skip config files.
[SEVERITY RUBRIC]
[EXCLUDE PATTERNS]
Exhaustive cross-file architectural audit.
Your primary value is cross-file reasoning that file-by-file tools miss. Prioritize:
- Cross-file inconsistencies: schema/contract drift between modules
- Data flow bugs spanning multiple files
- Architecture violations and coherence failures
- Security vulnerabilities at system level (auth flow, session handling, data exposure)
- Missing abstractions or boundary violations
Also cover: dependency hygiene, performance at architectural level, API surface design.
Do NOT duplicate line-level bug findings — Codex handles those.
Return findings as a JSON array. Each object:
{"severity": "", "category": "", "file": "", "line": 0, "issue": "", "fix": ""}
If no findings: return []
No prose, no section headers, no summaries outside the JSON array.
Code security and quality review.
Scope: security antipatterns, dependency hygiene, test coverage gaps, API design issues,
frontend component quality, error handling exposed to users, external service calls,
webhook handlers, ancillary services.
[SEVERITY RUBRIC]
[EXCLUDE PATTERNS]
Return findings as a JSON array. Each object:
{
"severity": "",
"category": "",
"file": "",
"line": 0,
"issue": "",
"fix": "",
"confidence": "high|medium|low",
"fingerprint": ""
}
fingerprint: short slug of "file:line:category:brief-issue" — used for cross-run dedup.
If no findings: return the exact token NO_FINDINGS
No prose, no section headers, no summaries outside the JSON array.
Before dispatch:
DATE or REPO remain in prompts.mkdir -p .planning/reviewsgit worktree add /tmp/agent-review-codex HEAD 2>/dev/null && echo "worktree: ok" || echo "worktree exists"
In a single response turn, dispatch all three simultaneously:
1. Codex — Bash, timeout: 300000:
cd /tmp/agent-review-codex && codex exec --dangerously-bypass-approvals-and-sandbox - <<'EOF'
<codex prompt with all substitutions applied>
EOF
2. Gemini — MCP bridge (same turn):
mcp__agent-bridge__call_gemini(prompt="<gemini prompt>", cwd="<project_root>", timeout_seconds=1200)
3. Copilot — MCP bridge (same turn):
mcp__agent-bridge__call_copilot(prompt="<copilot prompt>", cwd="<project_root>")
Announce: 3 reviewers dispatched in parallel — waiting
wc -l .planning/reviews/$DATE-codex-findings.md 2>/dev/null || echo "codex output: missing"
git worktree remove /tmp/agent-review-codex --force 2>/dev/null; echo "worktree cleaned"
If missing after timeout: identify high-risk files and re-dispatch with explicit file list.
If second attempt fails: skip Codex, announce codex: skipped (both attempts failed), continue.
.planning/reviews/$DATE-gemini-findings.md[] → write NO_FINDINGSgemini -m gemini-2.5-pro --approval-mode yolo - <<'PROMPT'
<gemini prompt>
PROMPT
If that fails → gemini-2.5-flash → gemini-1.5-pro. Write first successful output to file.
All fallbacks fail → skip Gemini, announce gemini: skipped (all models at capacity).
Write to .planning/reviews/$DATE-copilot-findings.md.
NO_FINDINGS → write sentinel. Failure → skip, announce, continue.
wc -l .planning/reviews/$DATE-*.md
Announce: reviews complete — codex: N lines, gemini: N lines, copilot: N lines
mcp__agent-bridge__call_gemini(prompt="""
Read these review files and return a single deduplicated consolidated findings list.
Files:
- .planning/reviews/$DATE-codex-findings.md
- .planning/reviews/$DATE-gemini-findings.md
- .planning/reviews/$DATE-copilot-findings.md
Skip any file containing only NO_FINDINGS or absent.
Dedup rules:
- Same file + lines within ±3 + same CATEGORY = same finding. Keep most detailed.
- Same underlying defect at different line numbers = same finding. Merge.
- List all source models for each merged finding.
Return one consolidated JSON array:
{"severity": "", "category": "", "file": "", "line": 0, "issue": "", "fix": "", "sources": []}
No prose. JSON only. If nothing: []
""", cwd="<project_root>")
JSON parse failure → fall back to line-by-line processing of each file. Log: dedup: fallback.
Before classifying each finding:
rg --files | grep -F "<cited file>"
Drop findings citing non-existent files. Log as invalid.
Run ID: $DATE-$(date +%H%M)
| Bucket | File | Categories |
|---|---|---|
| Bugs & Security | .planning/open-issues.md | BUG, SECURITY |
| Code Health | .planning/tech-debt.md | TECH-DEBT, ARCHITECTURE, DEPENDENCY, TEST-GAP |
| Missing Capability | .planning/feature-backlog.md | FEATURE-GAP |
| User Experience | .planning/ux-backlog.md | UX |
PERFORMANCE: user-facing → ux-backlog.md, internal → tech-debt.md.
Drop before writing: vague findings, findings citing deleted code, non-actionable findings. Normalize severity against rubric before writing.
Cross-run dedup: grep existing backlog for **Fingerprint**: \`` before appending.
If match found: skip.
Output format per item:
## [SEVERITY] Short title (max 10 words)
- **Location**: `file:line`
- **Source**: codex | gemini | copilot
- **Run**: $RUN_ID
- **Fingerprint**: `file:line:category-slug`
- **Issue**: One sentence.
- **Fix**: One sentence.
---
Sort within each file: CRITICAL → HIGH → MEDIUM → LOW.
Write header if file empty:
# [Filename] — Last updated: $DATE
Write incrementally — append each finding immediately. Do not batch.
Announce: classified: N open-issues, N tech-debt, N feature-backlog, N ux-backlog | dropped: N invalid, N dupes, N quality-filtered
After Phase 4 writes all findings, scan each backlog for entries with **Run**: $RUN_ID. Group by severity.
Present numbered list:
CRITICAL/HIGH findings this run — approve each for agent-remediate-loop:
[1] CRITICAL: <title> — <file:line>
Issue: <issue>
[2] HIGH: <title> — <file:line>
Issue: <issue>
Approve items (e.g. "1 3 5"), "all", or "none":
Wait for user response. Collect fingerprints of approved items.
If approved: invoke agent-remediate-loop scoped to those fingerprints.
Announce: Routing N CRITICAL/HIGH item(s) to agent-remediate-loop.
Skipped items remain in backlog.
MEDIUM findings this run (N items):
- <title> (<file:line>)
...
Run agent-remediate-loop on all N MEDIUM findings? [y / n / list]
list → show full issue + fix for each, then re-ask.
y → collect fingerprints → invoke agent-remediate-loop.
n → leave in backlog.
No user gate. Collect all LOW fingerprints this run.
Announce: Auto-routing N LOW findings to agent-remediate-loop.
Invoke agent-remediate-loop scoped to those fingerprints.
mv .planning/reviews/$DATE-*-findings.md .planning/archive/ 2>/dev/null; echo "archived"
Validate length first: wc -l active-context.md. Over 150 lines: strip completed-phase details first.
Replace ## Last Review section (or append if absent):
## Last Review — $DATE
Reviewers: [list which ran — note skipped and reason]
Findings classified into:
- .planning/open-issues.md (N items)
- .planning/tech-debt.md (N items)
- .planning/feature-backlog.md (N items)
- .planning/ux-backlog.md (N items)
Next: work through backlogs by severity. Start with open-issues CRITICAL/HIGH.
agent-review-loop complete.
Reviews: [list reviewers that produced output — note skipped and reason]
Received findings: N
Invalid (non-existent files) dropped: N
Duplicates merged: N
Quality-filtered dropped: N
Final unique findings: N
open-issues: N items
tech-debt: N items
feature-backlog: N items
ux-backlog: N items
active-context.md updated.
npx claudepluginhub buymeagoat/agent-skills --plugin buymeagoat-skillsGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.