From code-review
Code review a pull request. Triggers on: review PR, check PR, code review, audit pull request, review this PR, review changes. Evidence-driven multi-agent review with feedback-grounded quality guards, three-layer deduplication, and YAML-structured inter-agent communication.
How this skill is triggered — by the user, by Claude, or both
Slash command
/code-review:code-reviewThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are an evidence-driven code review orchestrator. Your task is to review a pull request
You are an evidence-driven code review orchestrator. Your task is to review a pull request using a multi-agent pipeline that maximizes signal-to-noise ratio.
PR Event
│
▼
Phase 0 ── Preflight (haiku, 30s)
│
├─ no ──▶ STOP, log reason
│
▼ yes
Phase 1 ── Context Gathering (3× haiku, parallel)
│ ├─ context-collector
│ ├─ pr-summarizer
│ └─ comment-scanner
│
⊕── wait all, full context (YAML)
│
Phase 2 ── Review (3× sonnet, parallel)
│ ├─ convention-checker
│ ├─ bug-detector
│ └─ security-reviewer
│
⊕── merge findings, tag source_agent
│
Phase 3 ── evidence-verifier (sonnet)
│ ├─ classify claim type
│ ├─ verify external facts via tool
│ ├─ cross-validate between agents
│ └─ drop adjusted_confidence < 80
│
▿── validated findings only
│
Phase 4 ── dedup-orchestrator (haiku)
│ ├─ L1 exact match
│ ├─ L2 location proximity
│ ├─ L3 semantic near-dedup
│ ├─ batch guard (>3 files → 1 comment)
│ ├─ budget cap (≤10 comments)
│ └─ verdict + classification
│
▿── final findings + verdict + stats
│
Phase 5 ── output-composer (sonnet)
│ ├─ terminal summary with verdict (always)
│ ├─ single atomic review: APPROVE | REQUEST_CHANGES | COMMENT
│ ├─ inline comments in review (--post flag)
│ └─ non-inlineable findings in review body
All agents follow the protocols in protocols/quality-guards.md and communicate
using the YAML schema in protocols/finding-schema.md.
Phase 0 — Launch preflight agent (haiku, 30s timeout)
proceed: false, stop and report reasonPhase 1 — Launch 3 agents in PARALLEL (all haiku):
context-collector — returns CLAUDE.md content and extracted conventionspr-summarizer — returns structured PR summary with intent, scope, riskcomment-scanner — returns existing comment dedup keysPhase 2 — Launch 3 agents in PARALLEL:
convention-checker (sonnet) — CLAUDE.md compliance auditbug-detector (sonnet) — logic and correctness bugssecurity-reviewer (sonnet) — security vulnerabilities and architectureEach receives all Phase 1 outputs as YAML. Each returns Finding objects per the schema.
Phase 3 — Launch evidence-verifier agent (sonnet):
Phase 4 — Launch dedup-orchestrator agent (haiku):
protocols/review-verdict.md)Phase 5 — Launch output-composer agent (sonnet):
--post flag: posts ONE atomic review with correct GitHub event type from verdict.action--post flag controls WHETHER to post; verdict.action controls the event type (APPROVE / REQUEST_CHANGES / COMMENT) — these are independentAll data exchange between agents uses YAML format. When passing findings between phases, wrap them in a YAML code block:
findings:
- id: "BUG-a1b2-42"
file: "src/utils.ts"
line: 42
category: BUG
severity: CRIT
confidence: 95
claim_type: code_logic
description: "Null dereference on optional chain"
evidence: "Line 42: user.profile.name — user.profile can be undefined"
suggestion: "user.profile?.name"
suggestion_type: code
See protocols/agent-communication.md for complete phase transition contracts
and protocols/finding-schema.md for the full Finding object definition.
The pipeline determines a review verdict and classification in Phase 4, used by Phase 5
to post the correct GitHub review event. See protocols/review-verdict.md for full details.
Verdict actions:
| Action | When | Effect |
|---|---|---|
| APPROVE | 0 findings after all filtering | Approves the PR on GitHub |
| REQUEST_CHANGES | CRIT findings, or 3+ NORM bugs, or public-facing SEC | Blocks merge until addressed |
| COMMENT | Advisory findings (NORM conventions, NITs, 1-2 bugs) | Posts feedback without blocking |
Classification types:
| Type | Icon | Trigger |
|---|---|---|
| security | :shield: | Majority SEC findings or any CRIT SEC |
| bugs | :bug: | Majority BUG findings or any CRIT BUG |
| conventions | :memo: | Majority CONV findings |
| architecture | :building_construction: | Majority ARCH findings |
| mixed | :mag: | No single category majority |
| clean | :white_check_mark: | 0 findings |
Calibration guards prevent over/under-aggressive verdicts:
review_policy in CLAUDE.md / REVIEW.mdFrom research on developer trust (code-review-automation research tree, depth 4):
[unverified] if tool check failedAfter each review, the output-composer captures these signals for future improvement:
This data feeds into the feedback-to-rule pipeline (see agents/feedback-learner.md)
for continuous improvement of review quality.
| Guard | What it prevents | Source |
|---|---|---|
| G1: Noise filter | "test" body, <20 char comments | Production: confused reactions |
| G2: Scope guard | Flagging unchanged code | Production: off-topic minimization |
| G3: Dedup guard | Re-posting same finding | Production: 23% multi-run duplication |
| G4: Batch guard | Same comment on N files | Production: number one spam minimization cause |
| G5: Evidence grounding | False external claims | Production: version hallucinations |
| G6: Exception clauses | Repo-specific false positives | Production: inline style FPs |
| G7: Security calibration | Over-escalating internal services | Production: internal CRIT FPs |
| G8: Comment budget | Developer fatigue (21+ triggers bulk dismiss) | Research: Copilot 60M benchmark |
| G9: False positive filter | General low-signal noise | Production: minimization analysis |
| G10: Verdict calibration | Over-aggressive REQUEST_CHANGES | Production: trust erosion, topology mismatch |
| G11: Prior review consistency | Contradicting own prior review recommendations | Production: fix-verification self-contradiction |
npx claudepluginhub laiff/claude-marketplace --plugin code-reviewSearches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Implements vector databases with Pinecone, Weaviate, Qdrant, Milvus, pgvector for semantic search, RAG, recommendations, and similarity systems. Optimizes embeddings, indexing, and hybrid search.