Skill

dev-review

Runs structured code reviews with parallel multi-reviewer fan-out, test output gates, and confidence scoring. Supports adversarial review via Codex or thorough Claude-only review.

code-quality

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/workflows:dev-review

Not user invocable

Model invocation disabled

Inline context

Default effort

Uses dynamic context injection — preprocesses shell commands at runtime

Hooks

PreToolUse

Matcher: Write|Edit

Hooks:

commanduv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/dev-delegation-guard.py

Matcher: Agent

Hooks:

command

GATE_ARTIFACT=.planning/VALIDATION.md GATE_STATUS=validated GATE_BLOCKED_TOOLS=Agent GATE_DESCRIPTION="Test gap validation" GATE_REMEDY="Return to dev-implement and run dev-test-gaps (Phase 5.5). VALIDATION.md must have status: validated (not gaps_found) before review starts." uv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/phase-gate-guard.py

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

**Iteration topology:** parallel multi-reviewer fan-out (fresh read-only subagents; main chat reconciles)

Supporting Files

references/performance-reviewer.mdreferences/security-reviewer.mdreferences/tests-reviewer.md

SKILL.md

883 lines · ~8.3k tokens(exceeds 5k compaction limit)

Stats

LanguageJupyter Notebook

Stars16

Forks5

MaintenanceExcellent

Last CommitJun 17, 2026

Actions

View Source View Plugin View on GitHub View README

Context Check

Before starting this phase, check remaining context:

Level	Remaining	Action
Normal	>35%	Proceed
Warning	25-35%	Finish the current step, then invoke dev-handoff
Critical	≤25%	Invoke dev-handoff immediately — resume fresh

At Warning/Critical: Read ${CLAUDE_SKILL_DIR}/../../skills/dev-handoff/SKILL.md and follow its instructions.

The Iron Law of Topic Changes

If the user sends a message NOT about the current review, announce the loop pause before responding — then resume. dev-review runs a REVIEW_STATE.md fix-and-re-review loop; silently abandoning it (as dev-debug:121-139 documents) drops the structure the user invoked.

Protocol:

Announce: "Pausing the dev-review loop to address your request."
Handle the off-topic request (normal tools allowed — you're outside the loop).
Announce: "Resuming dev-review. Re-reading .planning/REVIEW_STATE.md for current state."
Re-read .planning/REVIEW_STATE.md and continue the review/fix iteration.

If the message could be EITHER a new topic OR part of the review, ask before assuming — do NOT silently abandon the loop.

Prerequisites - Test Output Gate
Review Strategy Choice
Codex Adversarial Review (default when Codex is installed)
Parallel Review (Thorough) (Claude-only fallback)
The Iron Law of Review (single Claude reviewer path)
Review Focus Areas
Confidence Scoring
Required Output Structure
Agent Invocation
Quality Standards

Code Review

Load shared enforcement:

Auto-load all constraints matching applies-to: dev-review:

!uv run python3 ${CLAUDE_SKILL_DIR}/../../scripts/load-constraints.py dev-review

You MUST have these constraints loaded before proceeding. No claiming you "remember" them.

Dynamic plan re-read: Before starting review, re-read .planning/SPEC.md and .planning/PLAN.md to catch any requirements or tasks added during implementation. Do not rely on cached state from prior phases.

Single-pass code review combining spec compliance and quality checks. Uses confidence-based filtering to report only high-priority issues.

## Prerequisites - Test Output Gate

Do NOT start review without test evidence.

Before reviewing, verify these preconditions:

.planning/LEARNINGS.md contains actual test output
Tests were run (not just written)
Test output shows PASS (not SKIP, not assumed)

What Counts as Test Evidence

Valid Evidence	NOT Valid
`meson test` output with results	"Tests should pass"
`pytest` output showing PASS	"I wrote tests"
Screenshot of working UI	"It looks correct"
Playwright snapshot showing expected state	"User can verify"
D-Bus command output	"The feature works"
E2E test output with user flow verified	"Unit tests pass" (for UI changes)

### The E2E Evidence Requirement

FOR USER-FACING CHANGES: Unit test evidence is INSUFFICIENT.

Before approving user-facing changes, verify:

Unit tests pass (necessary but not sufficient)
E2E tests pass (required for approval)
Visual evidence exists (screenshots/snapshots for UI)

Change Type	Unit Evidence	E2E Evidence	Approval?
Internal refactor	Yes	N/A	APPROVE
API change	Yes	Missing	BLOCKED
UI change	Yes	Missing	BLOCKED
User workflow	Yes	Missing	BLOCKED

Return BLOCKED if E2E evidence is missing for user-facing changes.

"Unit tests pass" without E2E for UI changes is NOT approvable.

Gate Check

Check LEARNINGS.md for test output:

rg -E "(PASS|OK|SUCCESS|\d+ passed)" .planning/LEARNINGS.md

If no test output is found, STOP and return to /dev-implement.

"It should work" is NOT evidence. Test output IS evidence.

Review Strategy Choice

After verifying test output in LEARNINGS.md, choose review strategy.

Skip this choice when:

Trivial changes (< 50 LOC, single file)
Purely cosmetic changes (formatting, comments)
Automated refactoring (rename, extract)
Internal utility functions (not user-facing or security-sensitive)

Step 1: Probe Codex availability (silent)

Codex provides an out-of-process adversarial reviewer that uses a different model family than Claude — the diversity catches issues a Claude-reviewing-Claude loop would miss. When installed and authenticated, it is the default adversarial path. When unavailable, fall back to the existing Claude-based flow without prompting the user about installation.

Read ${CLAUDE_SKILL_DIR}/../../references/codex-availability.md for the full probe and invocation contract. Execute the probe before asking the user:

CODEX_SCRIPT=$(find "$HOME/.claude/plugins/cache/openai-codex/codex" -maxdepth 3 -name codex-companion.mjs -type f 2>/dev/null | sort -rV | head -1)
if [ -n "$CODEX_SCRIPT" ]; then
  node "$CODEX_SCRIPT" setup --json 2>/dev/null | jq -r '.ready // false'
else
  echo "false"
fi

Set CODEX_READY=true only when the probe prints true. Otherwise CODEX_READY=false and skip Codex entirely — do not announce its absence.

Step 2: Ask the user

If CODEX_READY=true:

AskUserQuestion(questions=[{
  "question": "How should we review this implementation?",
  "header": "Review Strategy",
  "options": [
    {"label": "Codex adversarial review (Recommended)", "description": "Out-of-process adversarial review via Codex. Different model family from Claude — catches issues a Claude-on-Claude loop misses. Default for adversarial review."},
    {"label": "Single Claude reviewer", "description": "Combined Claude review covering spec compliance and code quality. Faster, lower overhead. Use when Codex is overkill."},
    {"label": "Parallel Claude review (Thorough)", "description": "Spawn 3 specialized Claude reviewers (Security, Performance, Tests). Use when Codex is unavailable or you want multi-perspective Claude review."}
  ],
  "multiSelect": false
}])

If CODEX_READY=false:

AskUserQuestion(questions=[{
  "question": "How should we review this implementation?",
  "header": "Review Strategy",
  "options": [
    {"label": "Single reviewer (Default)", "description": "Combined review covering spec compliance and code quality. Faster, lower overhead."},
    {"label": "Parallel review (Thorough)", "description": "Spawn 3 specialized reviewers (Security, Performance, Tests). Use for security-sensitive, performance-critical, or test-heavy PRs. Requires reconciliation."}
  ],
  "multiSelect": false
}])

Routing:

Choice	Go to
Codex adversarial review	Codex Adversarial Review
Single (Claude) reviewer	The Iron Law of Review
Parallel (Claude) review	Parallel Review (Thorough)

Codex Adversarial Review

Use this section when the user chose Codex adversarial review.

Reference: See references/codex-availability.md for the full invocation contract, JSON schema, and verdict mapping table.

1. Prerequisites Check

Before invoking Codex, verify (same as the other review paths):

Test evidence exists — LEARNINGS.md contains actual test output
E2E evidence for UI changes — user-facing changes have E2E test output
SPEC.md exists — for REQ-ID tagging of findings post-hoc
Git repo present — Codex adversarial review is git-diff scoped

If any prerequisite fails, STOP and return BLOCKED to /dev-implement.

2. Estimate Scope and Choose Wait vs Background

# Working-tree review
git status --short --untracked-files=all
git diff --shortstat --cached
git diff --shortstat

Wait when the diff is clearly tiny (1-2 files, no untracked dir-sized changes). Otherwise launch in background.

3. Invoke Codex

Foreground (small diff):

CODEX_SCRIPT=$(find "$HOME/.claude/plugins/cache/openai-codex/codex" -maxdepth 3 -name codex-companion.mjs -type f 2>/dev/null | sort -rV | head -1)
node "$CODEX_SCRIPT" adversarial-review --wait

Background (anything bigger):

Launch with Bash(..., run_in_background: true) and tell the user: "Codex adversarial review started in the background. Check /codex:status for progress."

Then await completion notification before proceeding to step 4.

Optional focus text — append SPEC.md context to weight the review:

node "$CODEX_SCRIPT" adversarial-review --wait "focus: REQ-AUTH-01 token rotation under retry"

4. Parse Verdict

Codex returns JSON validated against its review-output schema. Top-level fields: verdict (approve | needs-attention), summary, findings[], next_steps[]. Each finding has severity, title, body, file, line_start, line_end, confidence (0-1 float), recommendation.

Apply the iron law: only confidence >= 0.8 findings block. Multiply by 100 when displaying alongside Claude-style scores.

Codex result	dev-review verdict
`verdict: approve`	APPROVED
`needs-attention` + any finding ≥ 0.8 confidence	CHANGES_REQUIRED
`needs-attention` + all findings < 0.8 confidence	APPROVED (log advisory findings to LEARNINGS.md)

5. Tag Findings to Requirements

Codex doesn't know SPEC.md REQ-IDs. For each blocking finding:

Read .planning/SPEC.md
Tag the finding with the most likely REQ-ID (or OUT-OF-SPEC)
OUT-OF-SPEC findings are advisory unless user opts in

6. Report

Use the same output structure as ## Required Output Structure below, with Reviewer: Codex (adversarial) in the header. Each issue includes the Codex confidence (×100) and the REQ-ID you tagged in step 5.

7. Iteration & Re-Review

Codex adversarial review participates in the same REVIEW_STATE.md loop as Claude reviewers — increment iteration on CHANGES_REQUIRED, escalate at iteration 3. Re-runs stay on Codex unless it becomes unavailable between runs (re-probe each iteration).

The "Iron Law of Re-Review" still applies: implementer claims "fixed" → main chat re-invokes Codex via the same command — no spot-checks.

Phase Complete (Codex Adversarial)

After Codex review completes:

If APPROVED: Immediately invoke the dev-verify skill:

Read ${CLAUDE_SKILL_DIR}/../../skills/dev-verify/SKILL.md and follow its instructions.

If CHANGES_REQUIRED: Return to /dev-implement with the parsed findings.

If BLOCKED (test evidence missing): Return to /dev-implement to collect test evidence.

Parallel Review (Thorough)

Use this section when user chose "Parallel review (Thorough)" above.

Prerequisite: Requires CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS enabled. If unavailable, fall back to single reviewer.

1. Prerequisites Check

Before spawning reviewers, verify:

Test evidence exists - LEARNINGS.md contains actual test output (check first!)
E2E evidence for UI changes - User-facing changes have E2E test output (not just unit tests)
Changed files identified - git diff --name-only to scope review
SPEC.md exists - reviewers verify against spec, not assumptions

If any prerequisite fails, STOP and return BLOCKED to /dev-implement.

2. When to Use Parallel Review

Use parallel review when:

Security-sensitive changes (auth, permissions, data access, crypto, input validation)
Performance-critical paths (tight loops, database queries, API endpoints)
Test-heavy PRs (new test infrastructure, testing frameworks, E2E flows)
Complex PRs (4+ files changed, multiple subsystems affected)
High-stakes deployments (production hotfixes, customer-facing releases)

Do NOT use when:

Simple bug fixes (< 50 LOC, single file)
Documentation or config changes
Automated refactoring (no logic changes)
Internal utilities (not security-sensitive or performance-critical)
Overhead exceeds benefit (< 4 files changed)

3. Create Team and Spawn Reviewers

Team Creation

TeamCreate(name="Code Review", task_description="Parallel code review with 3 specialized reviewers")

Press Shift+Tab to enter delegate mode. The lead coordinates reviews, does NOT review code directly.

Spawn 3 Reviewers

Each reviewer receives a self-contained prompt from a reference file. Reviewers start with a blank conversation and do NOT auto-load skills. Read the prompt, substitute variables, and paste it in full.

Tool Restrictions: All reviewers are READ-ONLY. Dispatch each with allowed_tools=["Read", "Glob", "Grep", "Bash(read-only)"]. Reviewers MUST NOT use Write or Edit tools. They read code, analyze it, and report findings — the main chat handles all fixes.

Generate the review package ONCE (shared across all lenses):

The diff is the largest artifact in review. Write it to a file ONCE and point every reviewer at the path — do not paste the diff into three prompts (it parks the most expensive bytes in context three times over).

# BASE = the commit before this implementation began (the task/level base SHA you
# recorded — NOT HEAD~1, which drops all but the last commit of a multi-commit task).
bash ${CLAUDE_SKILL_DIR}/../../scripts/dev/review-package.sh <BASE_SHA> HEAD
# prints: wrote .planning/handoff/review-<base>..<head>.diff  (.planning/ excluded)

Before spawning, substitute these variables in each prompt:

REVIEW_PACKAGE_PATH -> the path review-package.sh printed (each reviewer reads it ONCE; do NOT paste the diff)
SPEC_CONTEXT -> relevant sections of .planning/SPEC.md (paste inline, do NOT reference file)
LEARNINGS_TEST_OUTPUT -> test output from .planning/LEARNINGS.md (paste actual output)
PLUGIN_ROOT -> resolved base directory for skill paths (relative to this skill's base directory)

Reviewer prompts (read, substitute variables, send as message):

Reviewer	Focus	Prompt Source
1. Security	Vulnerabilities, auth, data exposure, crypto	`references/security-reviewer.md`
2. Performance	Complexity, queries, memory, hot paths	`references/performance-reviewer.md`
3. Tests	Coverage, correctness, reliability, E2E	`references/tests-reviewer.md`

4. Lead Monitoring

While reviewers work, the lead:

Watches for completion messages from all 3 reviewers
Does NOT review code directly - your job is coordination and reconciliation
If a reviewer asks a question: Answer it, then broadcast to other reviewers if relevant
If a reviewer is taking significantly longer than others: Message them for status
When all 3 reviewers complete: Proceed to reconciliation

5. Reconciliation Protocol (3 Passes)

Post-subagent boundary (the highest-risk moment). During reconciliation the lead consolidates findings — it does NOT re-review the code or fix anything:

Lead CAN (verification/coordination)	Lead CANNOT (investigation/fixing)
Read reviewer findings, dedup, prioritize	Re-read source to second-guess a finding
Record verdict in REVIEW_STATE.md	Edit code to fix an issue a reviewer raised
Route CHANGES_REQUIRED back to dev-implement	Grep the codebase to build a new finding the reviewers missed

Fixes are dev-implement's job, never the review lead's. (Full rule: auto-loaded verification-vs-investigation / delegation-law constraints.)

After ALL reviewers message completion, the lead performs three passes:

**Pass 1 -- Deduplication:**

Multiple reviewers may find the same issue (e.g., input validation gap found by both Security and Tests reviewers).

Read all reviewer findings
Group by file and line number
Identify duplicates:
- Same file:line
- Same root cause (even if described differently)
Merge duplicates:
- Keep the highest confidence score
- Combine descriptions if both add value
- Attribute to both reviewers

Example:

Security found: "file.py:42 - Input not validated (Confidence: 85)"
Tests found: "file.py:42 - Missing test for invalid input (Confidence: 80)"

-> Merge: "file.py:42 - Input validation missing + no test coverage (Confidence: 85, found by Security + Tests)"

Pass 2 -- Prioritization:

Not all issues are equally important. Rank by:

Severity x Confidence:
- Critical (90-100 confidence) > Important (80-89)
- Security > Performance > Tests (when confidence is equal)
Impact on users:
- User-facing > Internal
- Data loss risk > Slowness > Test gaps
Fix effort:
- Quick wins (< 30 min) should be fixed now
- Large refactors (> 2 hours) should be filed as tech debt

Create final prioritized list:

1. [CRITICAL] Security: XSS in user input (Confidence: 95)
2. [CRITICAL] Tests: User workflow untested (Confidence: 90)
3. [IMPORTANT] Performance: N+1 query in hot path (Confidence: 85)
4. [IMPORTANT] Tests: Error path missing coverage (Confidence: 80)

Pass 3 -- Integration Check:

Proposed fixes may conflict with each other.

Read each reviewer's suggested fixes
Check for conflicts:
- Do two fixes modify the same code?
- Does one fix introduce a problem the other reviewer would flag?
- Do fixes require contradictory approaches?
If conflicts exist:
- Design a unified fix addressing both concerns
- OR: Flag the conflict and ask reviewers for input

Example conflict:

Security: "Add input validation on every field"
Performance: "Batch validate to reduce overhead"

-> Unified: "Batch validate with early exit on first invalid field (security + performance)"

If ANY pass finds conflicts -> resolve before reporting final verdict.

6. Final Verdict

After reconciliation, the lead reports:

## Parallel Code Review: [Feature Name]

Reviewed by: Security, Performance, Tests

### Reconciliation Summary

**Issues found:** X total (Y critical, Z important)
**Duplicates merged:** N
**Conflicts resolved:** M

### Critical Issues (Must Fix)

[Deduplicated, prioritized list from Pass 1 + 2]

### Important Issues (Should Fix)

[Deduplicated, prioritized list from Pass 1 + 2]

### Verdict: APPROVED | CHANGES REQUIRED

[If APPROVED]
All 3 reviewers approved with no issues >= 80 confidence.

[If CHANGES REQUIRED]
X critical and Y important issues must be addressed. Return to /dev-implement.

Phase Complete (Parallel Review)

After parallel review completes:

If APPROVED: Immediately invoke the dev-verify skill:

Read ${CLAUDE_SKILL_DIR}/../../skills/dev-verify/SKILL.md and follow its instructions.

If CHANGES REQUIRED: Return to /dev-implement to fix reported issues.

If BLOCKED (test evidence missing): Return to /dev-implement to collect test evidence.

## The Iron Law of Review

You MUST report only issues with >= 80% confidence. This is not negotiable.

Before reporting ANY issue, complete these verification steps:

Verify it's not a false positive
Verify it's not a pre-existing issue
Assign a confidence score
Report only if score >= 80

You MUST apply this rule even when encountering:

"This looks suspicious"
"I think this might be wrong"
"The style seems inconsistent"
"I would have done it differently"

You MUST discard any low-confidence issue found during review.

## The Iron Law of Re-Review

NO "FIXED" CLAIMS WITHOUT FRESH RE-REVIEW. This is not negotiable.

When review returns CHANGES REQUIRED and the implementer applies fixes, you MUST:

Re-run the SAME review criteria (not lighter, not spot-check)
Verify issues are actually resolved (not assumed)
Check for new issues introduced by fixes (regression)
Only THEN return APPROVED

"I fixed it" without re-reviewing is NOT HELPFUL — unverified fixes ship bugs to the user.

The Audit-Fix Loop (Max 3 Iterations)

Iteration 1: Review → CHANGES REQUIRED → Fix → Re-Review
              ↓
Iteration 2: Re-Review → CHANGES REQUIRED → Fix → Re-Review
              ↓
Iteration 3: Re-Review → CHANGES REQUIRED → Fix → Re-Review
              ↓
         Still issues? → ESCALATE to user
         All clean? → APPROVED

Track iterations in .planning/REVIEW_STATE.md:

---
iteration: 1
max_iterations: 3
last_review_date: 2026-03-09
issues_found_count: 5
---

Exit criteria:

APPROVED: Zero issues >= 80 confidence
ESCALATE: iteration >= 3 AND issues remain
CONTINUE: iteration < 3 AND issues remain → loop back

Before returning any verdict, check iteration count:

READ .planning/REVIEW_STATE.md (create if missing with iteration: 1)
If iteration >= 3 and issues remain: ESCALATE (don't return CHANGES REQUIRED)
If iteration < 3 and issues remain: INCREMENT iteration, return CHANGES REQUIRED
If no issues: APPROVED

Claiming APPROVED without re-review after fixes is NOT HELPFUL — you're rubber-stamping unverified work that ships bugs to the user.

Re-Review Facts

A re-review after fixes runs the FULL review with the same criteria — spot-checking only the fixed lines misses the regressions a fix introduces elsewhere, which is the failure mode re-review exists to catch.
At iteration 3 with issues remaining the verdict is ESCALATE, never APPROVED — an approval issued to end the loop is a fabricated verdict, not a judgment.

Review Focus Areas

Test Evidence (Check First!)

LEARNINGS.md contains actual test command output
Tests show PASS/OK (not SKIP, FAIL, or missing)
UI changes have screenshot/snapshot evidence
All test types run (unit, integration, UI as applicable)
E2E tests exist and pass for user-facing changes
E2E test simulates actual user flow, not just component render

Spec Compliance

All requirements from .planning/SPEC.md are implemented
Acceptance criteria are met
No requirements were skipped or partially implemented
Edge cases mentioned in spec are handled

Code Quality

Code is simple and DRY (no unnecessary duplication)
Logic is correct (no bugs, handles edge cases)
Codebase conventions followed (naming, patterns, structure)
Error handling is complete
No security vulnerabilities detected

Confidence Scoring

Rate each potential issue from 0-100:

Score	Meaning
0	False positive or pre-existing issue
25	Might be real, might not. Stylistic without guideline backing
50	Real issue but nitpick or rare in practice
75	Verified real issue, impacts functionality
100	Absolutely certain, confirmed with direct evidence

CRITICAL: Only report issues with confidence >= 80.

Required Output Structure

## Code Review: [Feature/Change Name]
Reviewing: [files/scope being reviewed]

### Test Evidence Verified
- Unit tests: [PASS/FAIL/MISSING] - [paste key output line]
- Integration: [PASS/FAIL/N/A]
- UI/Visual: [Screenshot taken / Snapshot verified / N/A]

### Critical Issues (Confidence >= 90)

#### [Issue Title] (Confidence: XX)

**Location:** `file/path.ext:line_number`

**Requirement:** [REQ-ID from SPEC.md — every issue MUST trace to a requirement ID]

**Problem:** Clear description of the issue

**Fix:**
```[language]
// Specific code fix

Important Issues (Confidence 80-89)

[Same format as Critical Issues]

Summary

Verdict: APPROVED | CHANGES REQUIRED | BLOCKED (no test evidence)

[If APPROVED] The reviewed code meets project standards. Tests pass. No issues with confidence >= 80 detected.

[If CHANGES REQUIRED] X critical issues and Y important issues must be addressed before proceeding.

[If BLOCKED] Cannot approve without test evidence. Return to /dev-implement and run tests.


**If review finds the implementation fundamentally violates the spec (not just minor issues), DELETE the contaminated implementation and return to dev-implement for a fresh attempt. Do not patch a structurally wrong approach.**

### Delete & Restart Protocol

**When implementation deviates fundamentally from spec, DELETE and restart entirely.**

| Situation | Action |
|-----------|--------|
| Code uses wrong protocol/architecture than spec | DELETE. Rewrite from scratch with correct approach. |
| Code implements different approach than PLAN.md | DELETE. User approved specific approach for a reason. |
| Fundamental misunderstanding of requirements | DELETE. Don't patch. Fresh subagent with correct understanding. |
| Patch would require 30%+ of implementation to change | DELETE. Rewrite is cleaner than patching wrong foundation. |

**Why delete instead of patch:** Patching a structurally wrong approach creates technical debt. Fresh implementation from correct architecture is faster than fixing wrong foundation.

**When to patch instead:** Bug in otherwise-correct implementation, missing edge case, performance tweak, minor deviation that doesn't affect core behavior.

**The test:** If the subagent says "oh, I misunderstood the whole approach" → DELETE and restart.

## Agent Invocation

Spawn Task agent for review execution:

Task(subagent_type="general-purpose", allowed_tools=["Read", "Glob", "Grep", "Bash(read-only)"]): "Review implementation against .planning/SPEC.md.

Tool Restrictions: You are READ-ONLY. You MUST NOT use Write or Edit tools. You read code, check test evidence, and report issues — you do NOT fix them. The main chat handles all fixes.

FIRST: Check .planning/LEARNINGS.md for test output. Return BLOCKED immediately if no test output is found.

Complete single-pass review covering:

Test evidence - tests actually run and pass?
Spec compliance - all requirements met?
Code quality - simple, correct, follows conventions?

Confidence score each issue (0-100). Report only issues with >= 80 confidence. Return structured output per /dev-review format."


## Review Facts

- An APPROVED verdict asserts three things: tests actually ran, the output shows PASS (not SKIP, not assumed), and the evidence was verified by the reviewer rather than trusted from a report. An APPROVED with any leg missing is a fabricated verdict — review is the last gate before bugs ship, and BLOCKED is the honest answer when evidence is absent.
- During reconciliation the controller dedups and prioritizes; it does not get to drop a reviewer's qualifying finding or pre-rate its severity down. A finding suppressed at reconciliation ships the bug with the review's sign-off attached — the one outcome review exists to prevent. (Conflicts between lenses get a *unified* fix in Pass 3, never a silent deletion.)
- A rationale a reviewer or implementer offers ("it's intentional", "covered elsewhere") is a claim to verify against the diff and SPEC, never a reason to downgrade the finding. Lowering severity on the strength of an unverified rationale is trusting the report — the exact failure the read-only-reviewer design rules out.
- A finding the diff cannot settle is labeled **"Cannot verify from diff"** with the missing context named — not silently dropped and not guessed into Critical. Dropping it understates risk; inventing a verdict fabricates one; naming the gap is the honest reviewer move and routes the right follow-up.
- Every lens reads the SAME review package file (one `review-package.sh` run), so re-deriving the diff per reviewer (or pasting it per prompt) is wasted turns and wasted context for an identical artifact — generate once, share the path.

## Quality Standards

- **Test evidence is mandatory** - do not approve without test output
- Do not report style preferences lacking project guideline backing
- Do not report pre-existing issues (confidence = 0)
- Make each reported issue immediately actionable
- Use absolute file paths with line numbers in reports
- Treat uncertainty as below 80 confidence

## Gate: Exit Review Loop

**Checkpoint type:** human-verify (test evidence and confidence scores are machine-verifiable)

Before claiming review is complete (APPROVED or ESCALATE):

IDENTIFY → What proves the review verdict is valid? - APPROVED: Zero issues >= 80 confidence - ESCALATE: iteration >= 3 AND issues remain
RUN → Check .planning/REVIEW_STATE.md for iteration count Read review output for issue count
READ → Examine both: - Review output (issues list) - REVIEW_STATE.md (iteration number)
VERIFY → Verdict matches state: - APPROVED only if 0 issues - ESCALATE only if iteration >= 3 - CHANGES REQUIRED only if iteration < 3
CLAIM → Only after steps 1-4 pass, return verdict


**If iteration >= 3 and you're returning CHANGES REQUIRED instead of ESCALATE, you're ignoring the iteration limit — escalate to the user instead of looping forever.**

## Phase Complete

**Phase summary (append to LEARNINGS.md):**

```yaml
## Phase: Review

---
phase: review
status: completed
implements: []          # review verifies; implements no new requirement IDs
requires: [VALIDATION.md, LEARNINGS.md]
provides: [REVIEW_STATE.md, review-verdict]
affects: []             # read-only review; fixes happen in dev-implement
verdict: APPROVED | CHANGES_REQUIRED | ESCALATE | BLOCKED
iterations: N
issues-found: X (Y critical, Z important)
---

After review completes, handle verdict-specific transitions:

If APPROVED (no issues >= 80 confidence)

Mark review complete in .planning/REVIEW_STATE.md:

---
status: APPROVED
iteration: [N]
max_iterations: 3
last_review_date: [date]
issues_found_count: 0
verdict: APPROVED
---

The status: APPROVED line is the structural gate dev-verify checks — only an APPROVED review admits verification. On non-approved paths (CHANGES_REQUIRED / ESCALATE / BLOCKED) set status: to that verdict so the gate correctly blocks.

Immediately invoke dev-verify:

Read ${CLAUDE_SKILL_DIR}/../../skills/dev-verify/SKILL.md and follow its instructions.

If CHANGES REQUIRED (issues >= 80 confidence found, iteration < 3)

Update .planning/REVIEW_STATE.md:

---
status: CHANGES_REQUIRED
iteration: [N+1]
max_iterations: 3
last_review_date: [date]
issues_found_count: [count]
verdict: CHANGES_REQUIRED
---

Return to /dev-implement with specific issues. Implementer MUST re-invoke /dev-review after fixes.

Critical: When implementer returns claiming "fixed", you MUST re-run the FULL review. No shortcuts.

If ESCALATE (iteration >= 3, issues remain)

Update .planning/REVIEW_STATE.md:

---
status: ESCALATE
iteration: 3
max_iterations: 3
last_review_date: [date]
issues_found_count: [count]
verdict: ESCALATE
---

Report to user:

Review Loop Escalation (3 iterations completed)

After 3 fix-review cycles, [N] issues remain:

[List issues]

Options:
1. Accept current state and proceed (issues become tech debt)
2. Extend review (manual approval for iteration 4+)
3. Rethink approach (return to /dev-design)

Which option do you prefer?

If BLOCKED (no test evidence)

Return immediately to /dev-implement to collect test evidence. Do NOT increment iteration counter - no review occurred.

Workflow Continuity After Review

Verdict	Next Action	Iteration Counter
APPROVED	Invoke /dev-verify immediately, mark task `[x]` in PLAN.md	Reset to 1 for next task
CHANGES REQUIRED	Return to /dev-implement, implementer fixes then re-invokes /dev-review	Increment
ESCALATE	Ask user for direction	Keep at max
BLOCKED	Return to /dev-implement for test evidence	No change (no review ran)

Do NOT pause between review completion and next action. The workflow is sequential.

dev-review

Popularity

Invocation

Hooks

PreToolUse

Context Preview

Supporting Files

SKILL.md

dev-review

Popularity

Invocation

Hooks

PreToolUse

Context Preview

Supporting Files

SKILL.md

Context Check

The Iron Law of Topic Changes

Contents

Code Review

What Counts as Test Evidence

Gate Check

Review Strategy Choice

Step 1: Probe Codex availability (silent)

Step 2: Ask the user

Codex Adversarial Review

1. Prerequisites Check

2. Estimate Scope and Choose Wait vs Background

3. Invoke Codex

4. Parse Verdict

5. Tag Findings to Requirements

6. Report

7. Iteration & Re-Review

Phase Complete (Codex Adversarial)

Parallel Review (Thorough)

1. Prerequisites Check

2. When to Use Parallel Review

3. Create Team and Spawn Reviewers

Team Creation

Spawn 3 Reviewers

4. Lead Monitoring

5. Reconciliation Protocol (3 Passes)

6. Final Verdict

Phase Complete (Parallel Review)

The Audit-Fix Loop (Max 3 Iterations)

Re-Review Facts

Review Focus Areas

Test Evidence (Check First!)

Spec Compliance

Code Quality

Confidence Scoring

Required Output Structure

Important Issues (Confidence 80-89)

Summary

If APPROVED (no issues >= 80 confidence)

If CHANGES REQUIRED (issues >= 80 confidence found, iteration < 3)

If ESCALATE (iteration >= 3, issues remain)

If BLOCKED (no test evidence)

Workflow Continuity After Review

Similar Skills

Context Check

The Iron Law of Topic Changes

Contents

Code Review

What Counts as Test Evidence

Gate Check

Review Strategy Choice

Step 1: Probe Codex availability (silent)

Step 2: Ask the user

Codex Adversarial Review

1. Prerequisites Check

2. Estimate Scope and Choose Wait vs Background

3. Invoke Codex

4. Parse Verdict

5. Tag Findings to Requirements

6. Report

7. Iteration & Re-Review

Phase Complete (Codex Adversarial)

Parallel Review (Thorough)

1. Prerequisites Check