Unified code-review orchestrator. Dispatches tiered reviewer personas (correctness, testing, maintainability, plus conditional security, performance, reliability, api-contract, data-migrations) in parallel, runs an adversarial/red-team lens AFTER specialists with access to their findings, merges JSON findings with P0-P3 severity and confidence (0.0-1.0), and routes by an autofix_class (safe_auto / gated_auto / manual / advisory). Prior learnings from docs/solutions/ are pulled via learnings-researcher before dispatch. Modes - interactive / autofix / report-only / headless. Triggers: ce-review, review this PR, orchestrated review, tiered review, red-team review, adversarial review, review with confidence.
How this skill is triggered — by the user, by Claude, or both
Slash command
/ce-review:ce-reviewThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A single entry point that composes CCGM's review surface into one structured pipeline. Pulls relevant prior learnings, runs scope-drift first, fans out tiered reviewer personas in parallel, closes with an adversarial lens that reads the other findings, merges everything, and routes by confidence and autofix class.
A single entry point that composes CCGM's review surface into one structured pipeline. Pulls relevant prior learnings, runs scope-drift first, fans out tiered reviewer personas in parallel, closes with an adversarial lens that reads the other findings, merges everything, and routes by confidence and autofix class.
This skill supersedes ad-hoc review flows that stitch /review, /security-review, code-review:code-review, and pr-review-toolkit:review-pr by hand. Those commands remain installed; this one composes their intent into one pass with structured output.
Run /ce-review when:
Do NOT run for:
pr-review-toolkit directly)/document-review once it ships - see issue #278)On invocation, parse $ARGUMENTS for a mode token:
mode:interactive (or no mode token) - Full pipeline, ask one batched question for taste calls, apply safe_auto fixes, report everything. Default.mode:autofix - No questions. Apply safe_auto fixes immediately. Write the run artifact and report gated_auto / manual findings as todos. Use for headless CI-like runs.mode:report-only - Strictly read-only. No edits, no questions. Safe for concurrent runs and for reviewing someone else's branch.mode:headless - Skill-to-skill invocation. No edits, no questions, no chatty narration. Emit only the structured output envelope and the terminal line Review complete. so a calling skill can parse it.The four modes also govern what happens on ambiguity. Interactive asks; autofix applies the cautious default; report-only records the question; headless records and moves on.
The orchestrator is responsible for:
Each reviewer is a separate agent under agents/reviewers/. The orchestrator never inlines reviewer logic - agents own their "what you flag / what you don't flag" boundaries.
Every reviewer phase runs in fresh context. A reviewer must judge the change on its own merits, not inherit the rationale of whoever produced it. A reviewer who has read the author's "here is why I did it this way" narrative grades the defense of the change, not the change. That inflates sign-off - the same failure mode the Argus judge avoids by never seeing the diff-author's reasoning, and that heliohq/ship enforces by running review in a clean session.
Concretely, the orchestrator MUST and MUST NOT pass the following to any reviewer subagent:
Allowed inputs (the only context a reviewer receives):
diff_files, diff_stat, and the refs needed to read them. Reviewers read the files themselves.Forbidden inputs (never forward to a reviewer):
DONE self-report, completion summary, or "what I changed and why" narrative.If the only available description of intent is entangled with the author's rationale (e.g., a PR body that mixes "this must do X" with "I did it this way because…"), extract the requirement clauses and discard the justification before passing it on. When in doubt, pass less: the diff plus the issue's acceptance criteria is always a safe reviewer input.
The single intended exception is Phase 4: the adversarial reviewer reads the other reviewers' findings (prior_findings). Those are independent review output, not the implementer's rationale - reading them is the point.
Reviewer findings are consumed from a written artifact, not trusted from the chat narrative. This mirrors Argus: the orchestrator (the judge's caller) routes on the structured result it reads from disk, not on a reviewer's prose summary in the reply.
.context/ce-review/reviewers/{run_id}/{reviewer-name}.json, and its reply contains only that path plus the terminal line Findings written.subagent-patterns); do not reconstruct its findings from the chat.This keeps the return path auditable: every finding traces back to a file, and no sign-off rests on a narrative that cannot be re-read.
Collect the inputs once and pass them by path, not content, to the subagents (see modules/subagent-patterns/rules/subagent-patterns.md). Everything passed to a reviewer must be an allowed input per the Fresh-Context Integrity section above - spec/intent, diff, or build/test output, never the implementer's rationale:
base_ref - usually origin/main; override from $ARGUMENTS if the user said base:{ref}head_ref - HEAD unless specifieddiff_files - paths returned by git diff --name-only {base_ref}...{head_ref}diff_stat - output of git diff --stat {base_ref}...{head_ref} (short; safe to include inline)intent - the requirement statements only: the issue body (gh issue view {num} for any issue the PR closes) and the PR title plus any acceptance-criteria clauses from gh pr view --json body,title. Strip out the author's justification ("I did it this way because…") before forwarding - that is rationale, not intent.commit_subjects - git log {base_ref}..{head_ref} --pretty=format:"%s" (subjects only). Do not collect commit-message bodies; they routinely carry the author's reasoning, which reviewers must not see.build_test_output - fresh output of the project's verification suite (lint, type-check, tests, build), captured by the orchestrator. This is the deterministic evidence reviewers ground on instead of the author's "tests pass" claim.If no base ref resolves (rare; e.g., detached HEAD with no upstream), state that explicitly and fall back to reviewing the last commit only.
Dispatch the learnings-researcher agent (from modules/compound-knowledge/agents/learnings-researcher.md). Pass:
task_summary - one paragraph derived from the PR title, the intent requirement clauses, and the diff stat (not the author's rationale)files_hint - diff_filestags_hint - tags inferred from touched directories (e.g., supabase if supabase/migrations/ changed; chrome-extension if manifest.json changed; language tags by file extension)problem_type_filter - absent (pull both bugs and knowledge priors)max_results - 5Include the returned prior blocks as grounding context for every downstream reviewer. This is what compound-knowledge buys the review loop - prior decisions and known-bad patterns surface before specialists pattern-match from scratch.
If the repo has no docs/solutions/ directory, the agent returns no_solutions_directory: true. Record that, skip the priors block, and proceed.
Invoke the scope-drift skill (from modules/pr-review-toolkit/skills/scope-drift/SKILL.md). This runs the intent-versus-diff audit and returns:
In interactive mode, if scope-drift returns a HIGH gating question and the user answers "block", STOP. Do not dispatch specialists on code the author has not decided to keep.
In autofix, report-only, and headless modes, record the scope-drift findings as part of the final envelope but do not block - the gating decision is informational for the caller.
Dispatch reviewer agents in parallel using the Agent tool. Each reviewer returns a JSON array of findings matching references/finding.schema.yaml.
correctness-reviewer - logic errors, off-by-ones, wrong branch handling, control-flow mistakestesting-reviewer - missing tests, untested branches, missing-for-the-right-reason assertions, flake-prone patternsmaintainability-reviewer - duplication, naming, dead code, excessive complexity, unclear boundariesproject-standards-reviewer - conformance to the repo's stated conventions (CLAUDE.md, AGENTS.md, lint/type configs, existing patterns in sibling files)The learnings-researcher priors from Phase 1 are already in the context, so each reviewer sees relevant past decisions without re-retrieving.
Decide per run based on what the diff touches. Use the table below. Multiple conditions can fire.
| Reviewer | Fire condition |
|---|---|
security-reviewer | Diff touches auth, session, crypto, input validation, SQL, env/secret handling, permissions, OAuth, RLS |
performance-reviewer | Diff touches loops in hot paths, N+1 DB call patterns, bundle size, memoization, large data structures, animation code |
reliability-reviewer | Diff touches retries, timeouts, circuit breakers, background jobs, queues, idempotency keys, error recovery |
api-contract-reviewer | Diff changes a public API surface, route handler, exported type, SDK function, or RPC schema |
data-migrations-reviewer | Diff touches migration files, schema definitions, RLS policies, index changes, or data-backfill scripts |
If none of the conditions fire, run only the always-on set.
Use the parallel-research pattern from modules/subagent-patterns/rules/subagent-patterns.md. Pass each reviewer ONLY the allowed inputs (Fresh-Context Integrity above):
base_ref, head_ref, diff_files, diff_statintent - the requirement clauses (not the author's rationale)build_test_output from Phase 0run_id and the output path .context/ce-review/reviewers/{run_id}/{reviewer-name}.json the reviewer must write toDo NOT pass the implementer's conversation, completion report, or commit-message bodies to any reviewer. If you find yourself about to paste "the author says…" into a reviewer prompt, stop - that is the rationale leak this phase exists to prevent.
Reviewers are independent. Do not share their drafts between each other in this phase. Each reviewer writes its findings to its artifact path and replies with only that path plus Findings written.; the orchestrator reads the files (Results Stay in Files, above), not the replies.
After Phase 3 completes, dispatch the adversarial-reviewer agent with:
.context/ce-review/reviewers/{run_id}/ (this is the key difference - the adversarial reviewer reads their findings from the files, not from a narrative). Reviewer findings are independent review output, not author rationale, so forwarding them is the one intended cross-phase context flow.The adversarial reviewer writes its own findings to .context/ce-review/reviewers/{run_id}/adversarial-reviewer.json and replies with only that path plus Findings written., same as every other reviewer.
The adversarial reviewer runs five lenses (attack the happy path, find silent failures, exploit trust assumptions, break edge cases, find integration-boundary issues) and is specifically told to find what the specialists MISSED. It does not re-state their findings; it augments.
Gating condition - the adversarial reviewer fires in every mode, but its autofix_class is always gated_auto or weaker. Nothing an adversarial reviewer reports is safe_auto - the whole point is that these findings deserve a human decision.
Read every reviewer artifact under .context/ce-review/reviewers/{run_id}/ (one file per Phase 3 and Phase 4 reviewer) and collect the findings from the files - not from the reviewer replies (Results Stay in Files, above). Each finding is a JSON object shaped like:
{
"reviewer": "correctness-reviewer",
"file": "src/foo.ts",
"line": 42,
"severity": "P1",
"confidence": 0.85,
"category": "off-by-one",
"title": "Loop runs one iteration too many",
"detail": "The condition `i <= arr.length` should be `i < arr.length`.",
"autofix_class": "safe_auto",
"fix": "change `<=` to `<`",
"test_stub": "expect(process([1,2,3]).length).toBe(3)"
}
See references/finding.schema.yaml for the full schema.
Every reviewer prompt states this convention; the orchestrator re-checks it:
>= 0.80 - HIGH confidence. Reviewer has direct evidence in the diff.0.60-0.79 - MODERATE. Pattern-match from code, plausible but not confirmed.0.50-0.59 - LOW. Suspicion; surface only if the category is security or data-integrity.< 0.50 - SUPPRESS. Drop silently.If a reviewer returns a finding with confidence below 0.50, the orchestrator drops it before routing.
P0 - Blocking. Merge would break production, leak data, corrupt state, or violate a stated hard requirement.P1 - Must fix before merge. Observable bug, test gap in critical path, contract break.P2 - Should fix before merge. Maintainability, clarity, or a narrow-impact bug.P3 - Nice to fix. Nits, cosmetics, optional improvements.Severity and confidence are orthogonal. A P0 finding at confidence 0.55 is still suppressed - you cannot block merge on a suspicion.
Two findings are duplicates if they have the same file, same line within ±3 lines, and overlapping category. Keep the one with higher confidence; if tied, keep the one from the reviewer closer to the category root (e.g., security-reviewer wins over correctness-reviewer for a security category finding).
safe_auto - Apply immediately in interactive and autofix modes. Record the applied edit. Never in report-only or headless modes.gated_auto - Propose the fix; include it in the batched question in interactive mode; skip in autofix mode (too risky without a human) but list as a todo.manual - No auto-fix possible. Report with recommended direction.advisory - Informational only; do not block, do not propose a fix.The scope-drift and adversarial layers have a hard ceiling - they can at most emit gated_auto.
The output uses the Fix-First format (see modules/pr-review-toolkit/rules/fix-first-review.md). The orchestrator is responsible for producing one envelope; individual reviewers are not.
## Review Summary
**Mode**: {interactive|autofix|report-only|headless}
**Base**: {base_ref} **Head**: {head_ref}
**Priors loaded**: {N} from docs/solutions/
**Scope drift**: {verdict line from Phase 2}
**Reviewers run**: {comma-separated list}
### AUTO-FIXED ({count})
- {file}:{line} - {what was changed} - applied ({reviewer}, {severity}, conf={confidence})
### NEEDS INPUT ({count})
Batched question follows. Please answer once:
1. {file}:{line} - {finding} - {proposed direction} ({reviewer}, {severity}, conf={confidence})
2. ...
### RED-TEAM LENS ({count})
- {file}:{line} - {finding} ({severity}, conf={confidence}) - {one-line next step}
### Strengths (optional)
- {what this PR did well}
### Next Step
{Single sentence - either "Answer the batched question" or "Ready to merge" or "Blocked on scope-drift"}
The AUTO-FIXED list shows only findings that were actually applied in this run. If the mode suppresses auto-apply (report-only, headless), move those findings to a PROPOSED AUTO-FIXES list instead.
In any mode, write the full JSON envelope to a run artifact so other skills and later runs can consume it:
.context/ce-review/{timestamp}-{base_ref_short}-{head_ref_short}.json
The envelope schema:
{
"run_id": "2026-04-16T12:34:56Z-abc1234-def5678",
"mode": "interactive",
"base_ref": "origin/main",
"head_ref": "HEAD",
"priors": [
{"path": "docs/solutions/.../foo.md", "score": 7, "why_relevant": "..."}
],
"scope_drift": { "plan_completion": [...], "drift": [...], "verdict": "..." },
"findings": [
{
"reviewer": "...", "file": "...", "line": 42,
"severity": "P1", "confidence": 0.85,
"category": "...", "title": "...", "detail": "...",
"autofix_class": "safe_auto",
"applied": true,
"fix": "...", "test_stub": "..."
}
],
"adversarial": [ /* same shape as findings */ ],
"summary": {
"findings_total": 12,
"auto_fixed": 4,
"needs_input": 5,
"red_team": 3,
"p0": 0, "p1": 2, "p2": 6, "p3": 4,
"suppressed_low_confidence": 7
}
}
In headless mode, the stdout is limited to the envelope path and the terminal line Review complete.. Calling skills parse the envelope file.
| Mode | Asks question? | Applies fixes? | Writes run artifact? | Stdout |
|---|---|---|---|---|
interactive | Yes, one batched | safe_auto applied, others proposed | Yes | Full Summary |
autofix | No | safe_auto applied; gated_auto listed as todo | Yes | Full Summary |
report-only | No | No | Yes | Full Summary |
headless | No | No | Yes | Envelope path + Review complete. |
prior_findings is the one allowed cross-phase context, and it is review output, not author rationale.).context/ce-review/reviewers/{run_id}/. A reviewer's prose summary in the reply is not the source of truth; if the file is missing or unparseable, the reviewer failed - do not reconstruct findings from the chat.gated_auto or weaker.what you flag / what you don't flag boundary; keeping it in the agent file prevents overlap. Do not absorb reviewer prompts into the orchestrator.AskUserQuestion. Multiple questions fragment the review.report-only as interactive minus the question. Report-only never applies edits. If a reviewer's safe_auto fix would land on disk in interactive, it must become a PROPOSED AUTO-FIX list entry in report-only.learnings-researcher (compound-knowledge module). If that module is not installed, Phase 1 is a no-op and the review proceeds without priors.scope-drift skill (pr-review-toolkit module). If that skill is not installed, the orchestrator runs an inline 10-item scope check derived from the PR body and moves on.rules/fix-first-review.md from the pr-review-toolkit module. That rule is installed globally; the orchestrator does not re-state the format./review, /security-review, code-review:code-review, and pr-review-toolkit:review-pr remain available. Each module's README marks them as legacy in favor of /ce-review when the full pipeline is warranted.Ported from EveryInc/compound-engineering's ce-review skill (orchestrator + 27 reviewer agents, ~743 lines of skill + per-agent prompts). Adversarial lens structure adapted from garrytan/gstack's review/specialists/red-team.md. CCGM adaptations:
mode:interactive / mode:autofix / mode:report-only / mode:headless)agents/reviewers/ per CCGM's agents/ directory convention (issue #273)learnings-researcher agent (compound-knowledge, issue #276) instead of a CE-specific retrieveragents/reviewers/stack/ as a follow-upProvides CDSS development patterns for drug interaction checking, dose validation, clinical scoring (NEWS2, qSOFA), and alert classification integrated into EMR workflows.
npx claudepluginhub lucasmccomb/ccgm --plugin ce-review