From workflows
Runs structured code reviews with parallel multi-reviewer fan-out, test output gates, and confidence scoring. Supports adversarial review via Codex or thorough Claude-only review.
How this skill is triggered — by the user, by Claude, or both
Slash command
/workflows:dev-reviewWrite|Edituv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/dev-delegation-guard.pyAgentGATE_ARTIFACT=.planning/VALIDATION.md GATE_STATUS=validated GATE_BLOCKED_TOOLS=Agent GATE_DESCRIPTION="Test gap validation" GATE_REMEDY="Return to dev-implement and run dev-test-gaps (Phase 5.5). VALIDATION.md must have status: validated (not gaps_found) before review starts." uv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/phase-gate-guard.pyThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
**Iteration topology:** parallel multi-reviewer fan-out (fresh read-only subagents; main chat reconciles)
Iteration topology: parallel multi-reviewer fan-out (fresh read-only subagents; main chat reconciles)
Before starting this phase, check remaining context:
| Level | Remaining | Action |
|---|---|---|
| Normal | >35% | Proceed |
| Warning | 25-35% | Finish the current step, then invoke dev-handoff |
| Critical | ≤25% | Invoke dev-handoff immediately — resume fresh |
At Warning/Critical: Read ${CLAUDE_SKILL_DIR}/../../skills/dev-handoff/SKILL.md and follow its instructions.
If the user sends a message NOT about the current review, announce the loop pause before responding — then resume. dev-review runs a REVIEW_STATE.md fix-and-re-review loop; silently abandoning it (as dev-debug:121-139 documents) drops the structure the user invoked.
Protocol:
.planning/REVIEW_STATE.md and continue the review/fix iteration.If the message could be EITHER a new topic OR part of the review, ask before assuming — do NOT silently abandon the loop.
Load shared enforcement:
Auto-load all constraints matching applies-to: dev-review:
!uv run python3 ${CLAUDE_SKILL_DIR}/../../scripts/load-constraints.py dev-review
You MUST have these constraints loaded before proceeding. No claiming you "remember" them.
Dynamic plan re-read: Before starting review, re-read .planning/SPEC.md and .planning/PLAN.md to catch any requirements or tasks added during implementation. Do not rely on cached state from prior phases.
Single-pass code review combining spec compliance and quality checks. Uses confidence-based filtering to report only high-priority issues.
## Prerequisites - Test Output GateDo NOT start review without test evidence.
Before reviewing, verify these preconditions:
.planning/LEARNINGS.md contains actual test output| Valid Evidence | NOT Valid |
|---|---|
meson test output with results | "Tests should pass" |
pytest output showing PASS | "I wrote tests" |
| Screenshot of working UI | "It looks correct" |
| Playwright snapshot showing expected state | "User can verify" |
| D-Bus command output | "The feature works" |
| E2E test output with user flow verified | "Unit tests pass" (for UI changes) |
FOR USER-FACING CHANGES: Unit test evidence is INSUFFICIENT.
Before approving user-facing changes, verify:
| Change Type | Unit Evidence | E2E Evidence | Approval? |
|---|---|---|---|
| Internal refactor | Yes | N/A | APPROVE |
| API change | Yes | Missing | BLOCKED |
| UI change | Yes | Missing | BLOCKED |
| User workflow | Yes | Missing | BLOCKED |
Return BLOCKED if E2E evidence is missing for user-facing changes.
"Unit tests pass" without E2E for UI changes is NOT approvable.
Check LEARNINGS.md for test output:
rg -E "(PASS|OK|SUCCESS|\d+ passed)" .planning/LEARNINGS.md
If no test output is found, STOP and return to /dev-implement.
"It should work" is NOT evidence. Test output IS evidence.
After verifying test output in LEARNINGS.md, choose review strategy.
Skip this choice when:
Codex provides an out-of-process adversarial reviewer that uses a different model family than Claude — the diversity catches issues a Claude-reviewing-Claude loop would miss. When installed and authenticated, it is the default adversarial path. When unavailable, fall back to the existing Claude-based flow without prompting the user about installation.
Read ${CLAUDE_SKILL_DIR}/../../references/codex-availability.md for the full
probe and invocation contract. Execute the probe before asking the user:
CODEX_SCRIPT=$(find "$HOME/.claude/plugins/cache/openai-codex/codex" -maxdepth 3 -name codex-companion.mjs -type f 2>/dev/null | sort -rV | head -1)
if [ -n "$CODEX_SCRIPT" ]; then
node "$CODEX_SCRIPT" setup --json 2>/dev/null | jq -r '.ready // false'
else
echo "false"
fi
Set CODEX_READY=true only when the probe prints true. Otherwise
CODEX_READY=false and skip Codex entirely — do not announce its absence.
If CODEX_READY=true:
AskUserQuestion(questions=[{
"question": "How should we review this implementation?",
"header": "Review Strategy",
"options": [
{"label": "Codex adversarial review (Recommended)", "description": "Out-of-process adversarial review via Codex. Different model family from Claude — catches issues a Claude-on-Claude loop misses. Default for adversarial review."},
{"label": "Single Claude reviewer", "description": "Combined Claude review covering spec compliance and code quality. Faster, lower overhead. Use when Codex is overkill."},
{"label": "Parallel Claude review (Thorough)", "description": "Spawn 3 specialized Claude reviewers (Security, Performance, Tests). Use when Codex is unavailable or you want multi-perspective Claude review."}
],
"multiSelect": false
}])
If CODEX_READY=false:
AskUserQuestion(questions=[{
"question": "How should we review this implementation?",
"header": "Review Strategy",
"options": [
{"label": "Single reviewer (Default)", "description": "Combined review covering spec compliance and code quality. Faster, lower overhead."},
{"label": "Parallel review (Thorough)", "description": "Spawn 3 specialized reviewers (Security, Performance, Tests). Use for security-sensitive, performance-critical, or test-heavy PRs. Requires reconciliation."}
],
"multiSelect": false
}])
Routing:
| Choice | Go to |
|---|---|
| Codex adversarial review | Codex Adversarial Review |
| Single (Claude) reviewer | The Iron Law of Review |
| Parallel (Claude) review | Parallel Review (Thorough) |
Use this section when the user chose Codex adversarial review.
Reference: See
references/codex-availability.mdfor the full invocation contract, JSON schema, and verdict mapping table.
Before invoking Codex, verify (same as the other review paths):
If any prerequisite fails, STOP and return BLOCKED to /dev-implement.
# Working-tree review
git status --short --untracked-files=all
git diff --shortstat --cached
git diff --shortstat
Wait when the diff is clearly tiny (1-2 files, no untracked dir-sized changes). Otherwise launch in background.
Foreground (small diff):
CODEX_SCRIPT=$(find "$HOME/.claude/plugins/cache/openai-codex/codex" -maxdepth 3 -name codex-companion.mjs -type f 2>/dev/null | sort -rV | head -1)
node "$CODEX_SCRIPT" adversarial-review --wait
Background (anything bigger):
Launch with Bash(..., run_in_background: true) and tell the user:
"Codex adversarial review started in the background. Check /codex:status for progress."
Then await completion notification before proceeding to step 4.
Optional focus text — append SPEC.md context to weight the review:
node "$CODEX_SCRIPT" adversarial-review --wait "focus: REQ-AUTH-01 token rotation under retry"
Codex returns JSON validated against its review-output schema. Top-level fields:
verdict (approve | needs-attention), summary, findings[], next_steps[].
Each finding has severity, title, body, file, line_start, line_end,
confidence (0-1 float), recommendation.
Apply the iron law: only confidence >= 0.8 findings block. Multiply by 100
when displaying alongside Claude-style scores.
| Codex result | dev-review verdict |
|---|---|
verdict: approve | APPROVED |
needs-attention + any finding ≥ 0.8 confidence | CHANGES_REQUIRED |
needs-attention + all findings < 0.8 confidence | APPROVED (log advisory findings to LEARNINGS.md) |
Codex doesn't know SPEC.md REQ-IDs. For each blocking finding:
.planning/SPEC.mdOUT-OF-SPEC)OUT-OF-SPEC findings are advisory unless user opts inUse the same output structure as ## Required Output Structure below, with
Reviewer: Codex (adversarial) in the header. Each issue includes the
Codex confidence (×100) and the REQ-ID you tagged in step 5.
Codex adversarial review participates in the same REVIEW_STATE.md loop as
Claude reviewers — increment iteration on CHANGES_REQUIRED, escalate at
iteration 3. Re-runs stay on Codex unless it becomes unavailable between
runs (re-probe each iteration).
The "Iron Law of Re-Review" still applies: implementer claims "fixed" → main chat re-invokes Codex via the same command — no spot-checks.
After Codex review completes:
If APPROVED: Immediately invoke the dev-verify skill:
Read ${CLAUDE_SKILL_DIR}/../../skills/dev-verify/SKILL.md and follow its instructions.
If CHANGES_REQUIRED: Return to /dev-implement with the parsed findings.
If BLOCKED (test evidence missing): Return to /dev-implement to collect test evidence.
Use this section when user chose "Parallel review (Thorough)" above.
Prerequisite: Requires
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMSenabled. If unavailable, fall back to single reviewer.
Before spawning reviewers, verify:
git diff --name-only to scope reviewIf any prerequisite fails, STOP and return BLOCKED to /dev-implement.
Use parallel review when:
Do NOT use when:
TeamCreate(name="Code Review", task_description="Parallel code review with 3 specialized reviewers")
Press Shift+Tab to enter delegate mode. The lead coordinates reviews, does NOT review code directly.
Each reviewer receives a self-contained prompt from a reference file. Reviewers start with a blank conversation and do NOT auto-load skills. Read the prompt, substitute variables, and paste it in full.
Tool Restrictions: All reviewers are READ-ONLY. Dispatch each with allowed_tools=["Read", "Glob", "Grep", "Bash(read-only)"]. Reviewers MUST NOT use Write or Edit tools. They read code, analyze it, and report findings — the main chat handles all fixes.
Generate the review package ONCE (shared across all lenses):
The diff is the largest artifact in review. Write it to a file ONCE and point every reviewer at the path — do not paste the diff into three prompts (it parks the most expensive bytes in context three times over).
# BASE = the commit before this implementation began (the task/level base SHA you
# recorded — NOT HEAD~1, which drops all but the last commit of a multi-commit task).
bash ${CLAUDE_SKILL_DIR}/../../scripts/dev/review-package.sh <BASE_SHA> HEAD
# prints: wrote .planning/handoff/review-<base>..<head>.diff (.planning/ excluded)
Before spawning, substitute these variables in each prompt:
REVIEW_PACKAGE_PATH -> the path review-package.sh printed (each reviewer reads it ONCE; do NOT paste the diff)SPEC_CONTEXT -> relevant sections of .planning/SPEC.md (paste inline, do NOT reference file)LEARNINGS_TEST_OUTPUT -> test output from .planning/LEARNINGS.md (paste actual output)PLUGIN_ROOT -> resolved base directory for skill paths (relative to this skill's base directory)Reviewer prompts (read, substitute variables, send as message):
| Reviewer | Focus | Prompt Source |
|---|---|---|
| 1. Security | Vulnerabilities, auth, data exposure, crypto | references/security-reviewer.md |
| 2. Performance | Complexity, queries, memory, hot paths | references/performance-reviewer.md |
| 3. Tests | Coverage, correctness, reliability, E2E | references/tests-reviewer.md |
While reviewers work, the lead:
Post-subagent boundary (the highest-risk moment). During reconciliation the lead consolidates findings — it does NOT re-review the code or fix anything:
| Lead CAN (verification/coordination) | Lead CANNOT (investigation/fixing) |
|---|---|
| Read reviewer findings, dedup, prioritize | Re-read source to second-guess a finding |
| Record verdict in REVIEW_STATE.md | Edit code to fix an issue a reviewer raised |
| Route CHANGES_REQUIRED back to dev-implement | Grep the codebase to build a new finding the reviewers missed |
Fixes are dev-implement's job, never the review lead's. (Full rule: auto-loaded verification-vs-investigation / delegation-law constraints.)
After ALL reviewers message completion, the lead performs three passes:
**Pass 1 -- Deduplication:**Multiple reviewers may find the same issue (e.g., input validation gap found by both Security and Tests reviewers).
Example:
Security found: "file.py:42 - Input not validated (Confidence: 85)"
Tests found: "file.py:42 - Missing test for invalid input (Confidence: 80)"
-> Merge: "file.py:42 - Input validation missing + no test coverage (Confidence: 85, found by Security + Tests)"
Pass 2 -- Prioritization:
Not all issues are equally important. Rank by:
Create final prioritized list:
1. [CRITICAL] Security: XSS in user input (Confidence: 95)
2. [CRITICAL] Tests: User workflow untested (Confidence: 90)
3. [IMPORTANT] Performance: N+1 query in hot path (Confidence: 85)
4. [IMPORTANT] Tests: Error path missing coverage (Confidence: 80)
Pass 3 -- Integration Check:
Proposed fixes may conflict with each other.
Example conflict:
Security: "Add input validation on every field"
Performance: "Batch validate to reduce overhead"
-> Unified: "Batch validate with early exit on first invalid field (security + performance)"
If ANY pass finds conflicts -> resolve before reporting final verdict.
After reconciliation, the lead reports:
## Parallel Code Review: [Feature Name]
Reviewed by: Security, Performance, Tests
### Reconciliation Summary
**Issues found:** X total (Y critical, Z important)
**Duplicates merged:** N
**Conflicts resolved:** M
### Critical Issues (Must Fix)
[Deduplicated, prioritized list from Pass 1 + 2]
### Important Issues (Should Fix)
[Deduplicated, prioritized list from Pass 1 + 2]
### Verdict: APPROVED | CHANGES REQUIRED
[If APPROVED]
All 3 reviewers approved with no issues >= 80 confidence.
[If CHANGES REQUIRED]
X critical and Y important issues must be addressed. Return to /dev-implement.
After parallel review completes:
If APPROVED: Immediately invoke the dev-verify skill:
Read ${CLAUDE_SKILL_DIR}/../../skills/dev-verify/SKILL.md and follow its instructions.
If CHANGES REQUIRED: Return to /dev-implement to fix reported issues.
If BLOCKED (test evidence missing): Return to /dev-implement to collect test evidence.
You MUST report only issues with >= 80% confidence. This is not negotiable.
Before reporting ANY issue, complete these verification steps:
You MUST apply this rule even when encountering:
You MUST discard any low-confidence issue found during review.
## The Iron Law of Re-ReviewNO "FIXED" CLAIMS WITHOUT FRESH RE-REVIEW. This is not negotiable.
When review returns CHANGES REQUIRED and the implementer applies fixes, you MUST:
"I fixed it" without re-reviewing is NOT HELPFUL — unverified fixes ship bugs to the user.
Iteration 1: Review → CHANGES REQUIRED → Fix → Re-Review
↓
Iteration 2: Re-Review → CHANGES REQUIRED → Fix → Re-Review
↓
Iteration 3: Re-Review → CHANGES REQUIRED → Fix → Re-Review
↓
Still issues? → ESCALATE to user
All clean? → APPROVED
Track iterations in .planning/REVIEW_STATE.md:
---
iteration: 1
max_iterations: 3
last_review_date: 2026-03-09
issues_found_count: 5
---
Exit criteria:
Before returning any verdict, check iteration count:
.planning/REVIEW_STATE.md (create if missing with iteration: 1)Claiming APPROVED without re-review after fixes is NOT HELPFUL — you're rubber-stamping unverified work that ships bugs to the user.
Rate each potential issue from 0-100:
| Score | Meaning |
|---|---|
| 0 | False positive or pre-existing issue |
| 25 | Might be real, might not. Stylistic without guideline backing |
| 50 | Real issue but nitpick or rare in practice |
| 75 | Verified real issue, impacts functionality |
| 100 | Absolutely certain, confirmed with direct evidence |
CRITICAL: Only report issues with confidence >= 80.
## Code Review: [Feature/Change Name]
Reviewing: [files/scope being reviewed]
### Test Evidence Verified
- Unit tests: [PASS/FAIL/MISSING] - [paste key output line]
- Integration: [PASS/FAIL/N/A]
- UI/Visual: [Screenshot taken / Snapshot verified / N/A]
### Critical Issues (Confidence >= 90)
#### [Issue Title] (Confidence: XX)
**Location:** `file/path.ext:line_number`
**Requirement:** [REQ-ID from SPEC.md — every issue MUST trace to a requirement ID]
**Problem:** Clear description of the issue
**Fix:**
```[language]
// Specific code fix
[Same format as Critical Issues]
Verdict: APPROVED | CHANGES REQUIRED | BLOCKED (no test evidence)
[If APPROVED] The reviewed code meets project standards. Tests pass. No issues with confidence >= 80 detected.
[If CHANGES REQUIRED] X critical issues and Y important issues must be addressed before proceeding.
[If BLOCKED] Cannot approve without test evidence. Return to /dev-implement and run tests.
**If review finds the implementation fundamentally violates the spec (not just minor issues), DELETE the contaminated implementation and return to dev-implement for a fresh attempt. Do not patch a structurally wrong approach.**
### Delete & Restart Protocol
**When implementation deviates fundamentally from spec, DELETE and restart entirely.**
| Situation | Action |
|-----------|--------|
| Code uses wrong protocol/architecture than spec | DELETE. Rewrite from scratch with correct approach. |
| Code implements different approach than PLAN.md | DELETE. User approved specific approach for a reason. |
| Fundamental misunderstanding of requirements | DELETE. Don't patch. Fresh subagent with correct understanding. |
| Patch would require 30%+ of implementation to change | DELETE. Rewrite is cleaner than patching wrong foundation. |
**Why delete instead of patch:** Patching a structurally wrong approach creates technical debt. Fresh implementation from correct architecture is faster than fixing wrong foundation.
**When to patch instead:** Bug in otherwise-correct implementation, missing edge case, performance tweak, minor deviation that doesn't affect core behavior.
**The test:** If the subagent says "oh, I misunderstood the whole approach" → DELETE and restart.
## Agent Invocation
Spawn Task agent for review execution:
Task(subagent_type="general-purpose", allowed_tools=["Read", "Glob", "Grep", "Bash(read-only)"]): "Review implementation against .planning/SPEC.md.
Tool Restrictions: You are READ-ONLY. You MUST NOT use Write or Edit tools. You read code, check test evidence, and report issues — you do NOT fix them. The main chat handles all fixes.
FIRST: Check .planning/LEARNINGS.md for test output. Return BLOCKED immediately if no test output is found.
Complete single-pass review covering:
Confidence score each issue (0-100). Report only issues with >= 80 confidence. Return structured output per /dev-review format."
## Review Facts
- An APPROVED verdict asserts three things: tests actually ran, the output shows PASS (not SKIP, not assumed), and the evidence was verified by the reviewer rather than trusted from a report. An APPROVED with any leg missing is a fabricated verdict — review is the last gate before bugs ship, and BLOCKED is the honest answer when evidence is absent.
- During reconciliation the controller dedups and prioritizes; it does not get to drop a reviewer's qualifying finding or pre-rate its severity down. A finding suppressed at reconciliation ships the bug with the review's sign-off attached — the one outcome review exists to prevent. (Conflicts between lenses get a *unified* fix in Pass 3, never a silent deletion.)
- A rationale a reviewer or implementer offers ("it's intentional", "covered elsewhere") is a claim to verify against the diff and SPEC, never a reason to downgrade the finding. Lowering severity on the strength of an unverified rationale is trusting the report — the exact failure the read-only-reviewer design rules out.
- A finding the diff cannot settle is labeled **"Cannot verify from diff"** with the missing context named — not silently dropped and not guessed into Critical. Dropping it understates risk; inventing a verdict fabricates one; naming the gap is the honest reviewer move and routes the right follow-up.
- Every lens reads the SAME review package file (one `review-package.sh` run), so re-deriving the diff per reviewer (or pasting it per prompt) is wasted turns and wasted context for an identical artifact — generate once, share the path.
## Quality Standards
- **Test evidence is mandatory** - do not approve without test output
- Do not report style preferences lacking project guideline backing
- Do not report pre-existing issues (confidence = 0)
- Make each reported issue immediately actionable
- Use absolute file paths with line numbers in reports
- Treat uncertainty as below 80 confidence
## Gate: Exit Review Loop
**Checkpoint type:** human-verify (test evidence and confidence scores are machine-verifiable)
Before claiming review is complete (APPROVED or ESCALATE):
IDENTIFY → What proves the review verdict is valid? - APPROVED: Zero issues >= 80 confidence - ESCALATE: iteration >= 3 AND issues remain
RUN → Check .planning/REVIEW_STATE.md for iteration count
Read review output for issue count
READ → Examine both: - Review output (issues list) - REVIEW_STATE.md (iteration number)
VERIFY → Verdict matches state: - APPROVED only if 0 issues - ESCALATE only if iteration >= 3 - CHANGES REQUIRED only if iteration < 3
CLAIM → Only after steps 1-4 pass, return verdict
**If iteration >= 3 and you're returning CHANGES REQUIRED instead of ESCALATE, you're ignoring the iteration limit — escalate to the user instead of looping forever.**
## Phase Complete
**Phase summary (append to LEARNINGS.md):**
```yaml
## Phase: Review
---
phase: review
status: completed
implements: [] # review verifies; implements no new requirement IDs
requires: [VALIDATION.md, LEARNINGS.md]
provides: [REVIEW_STATE.md, review-verdict]
affects: [] # read-only review; fixes happen in dev-implement
verdict: APPROVED | CHANGES_REQUIRED | ESCALATE | BLOCKED
iterations: N
issues-found: X (Y critical, Z important)
---
After review completes, handle verdict-specific transitions:
Mark review complete in .planning/REVIEW_STATE.md:
---
status: APPROVED
iteration: [N]
max_iterations: 3
last_review_date: [date]
issues_found_count: 0
verdict: APPROVED
---
The status: APPROVED line is the structural gate dev-verify checks — only an APPROVED review admits verification. On non-approved paths (CHANGES_REQUIRED / ESCALATE / BLOCKED) set status: to that verdict so the gate correctly blocks.
Immediately invoke dev-verify:
Read ${CLAUDE_SKILL_DIR}/../../skills/dev-verify/SKILL.md and follow its instructions.
Update .planning/REVIEW_STATE.md:
---
status: CHANGES_REQUIRED
iteration: [N+1]
max_iterations: 3
last_review_date: [date]
issues_found_count: [count]
verdict: CHANGES_REQUIRED
---
Return to /dev-implement with specific issues. Implementer MUST re-invoke /dev-review after fixes.
Critical: When implementer returns claiming "fixed", you MUST re-run the FULL review. No shortcuts.
Update .planning/REVIEW_STATE.md:
---
status: ESCALATE
iteration: 3
max_iterations: 3
last_review_date: [date]
issues_found_count: [count]
verdict: ESCALATE
---
Report to user:
Review Loop Escalation (3 iterations completed)
After 3 fix-review cycles, [N] issues remain:
[List issues]
Options:
1. Accept current state and proceed (issues become tech debt)
2. Extend review (manual approval for iteration 4+)
3. Rethink approach (return to /dev-design)
Which option do you prefer?
Return immediately to /dev-implement to collect test evidence. Do NOT increment iteration counter - no review occurred.
| Verdict | Next Action | Iteration Counter |
|---|---|---|
| APPROVED | Invoke /dev-verify immediately, mark task [x] in PLAN.md | Reset to 1 for next task |
| CHANGES REQUIRED | Return to /dev-implement, implementer fixes then re-invokes /dev-review | Increment |
| ESCALATE | Ask user for direction | Keep at max |
| BLOCKED | Return to /dev-implement for test evidence | No change (no review ran) |
Do NOT pause between review completion and next action. The workflow is sequential.
npx claudepluginhub edwinhu/workflows --plugin workflowsReviews implementation code for bugs, security issues, and quality problems. Creates FIX tasks for blocking issues before merge.
Dispatches 5 specialized agents for multi-perspective code review on correctness, architecture, security, production readiness, and test quality. Merges findings, auto-fixes Critical/Important issues up to 3 rounds.
Multi-phase code review pipeline with mechanical checks, graph-scoped context, parallel review agents, cross-agent deduplication, and structured output. Use for completed work reviews.