Skill

codebase-review

Run a comprehensive, autonomous codebase review across six floor lenses (architecture, security, code quality, testing, operations, maintenance) plus lenses derived from the repo's own risk surface (agentic/LLM, canon drift, …). Findings are adversarially verified, deduplicated against the repo's risk registers and the cumulative findings ledger, scored mechanically, and persisted for trend analysis. Emits proposal-only plan stubs; elaboration and validation are delegated to feature-elaborate / feature-review.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/workflow-skills:codebase-review

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Run a full, autonomous codebase review. Produces a timestamped review report with verified, scored findings; an updated cumulative findings ledger; trend analysis against prior reviews; and proposal-only plan stubs routed to the repo's backlog.

Supporting Files

ledger.mdlenses.mdreport-format.md

SKILL.md

209 lines · ~4.8k tokens

Stats

LanguageJavaScript

Parent stars0

MaintenanceExcellent

Last CommitJun 14, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

/codebase-review

This skill is repo-agnostic: everything repo-specific is read at runtime — .claude/repo-conventions.yaml for layout, CLAUDE.md for the repo's own disciplines, and the repo's risk registers for what to review hardest. The conventions file is a claims file, not ground truth: Phase 0 verifies its load-bearing keys against the tree, and a stale conventions file is itself a finding. One deliberate exception: the output root docs/reviews/ (reports + ledger) is fixed by this skill's convention — it's tool-owned output, not repo structure, and a stable location is what makes prior-review discovery mechanical.

Companion files (read each when its phase needs it):

File	Contents
`lenses.md`	The six floor-lens charters, the derived-lens procedure + trigger table, and the per-lens report contract
`ledger.md`	Findings-ledger schema, ID scheme, carry-forward/dedup rules, scoring rules, verification protocol
`report-format.md`	discovery.md / synthesis.md / final-summary templates

Usage

/codebase-review                  # full review: floor lenses + derived lenses
/codebase-review --lens <key>     # single-lens pass (e.g. --lens canon-drift) — same ledger,
                                  #   dedup, and verification machinery at a fraction of the cost
/codebase-review --full           # disable diff-scoping: full attention everywhere

Flags compose: --lens <key> --full runs the single lens with diff-scoping disabled (full attention within that lens).

Execution overview

Seven phases, end-to-end, without stopping for user input. On ambiguity: make the best judgment, apply it, document it in the synthesis Process notes. Never ask the user mid-run — that breaks one-shot and cloud runs; auto-detect and record assumptions instead.

Phase 0: Ground        — conventions (verified), prior ledger, mechanical gates
Phase 1: Scope         — inventory, diff-scope vs last reviewed SHA, derive the lens set
Phase 2: Lens reviews  — parallel agents, one per lens; findings + coverage manifest each
Phase 3: Verify        — adversarial verification of findings
Phase 4: Critique      — completeness check: what did the review itself miss?
Phase 5: Synthesis     — dedup (in-run + vs registers + vs ledger), score, trend
Phase 6: Emit          — reports, ledger update, proposal-only plan stubs, summary

Phase 0: Ground

Conventions — read, then verify

Read .claude/repo-conventions.yaml. Cache: package, backend.* (test_dir, api_main, config_module, events_module, migrations, patterns.*), frontend.*, docs.*, backlog_root, backlog.templates_dir, backlog.feature_size.goal_max_tasks, backlog.altitude_discipline. Pattern checks in the lenses fire only if declared and not false.

Then verify the load-bearing keys against the tree — do not trust them:

backlog_root resolves to a real directory containing epic/feature folders. If not, locate the actual backlog (Glob for **/tech-debt/features/*/proposal.md and epic README patterns), use reality for the rest of the run, and record a conventions-drift finding.
CI claims (ci: keys or comments) vs .github/workflows/ contents. The workflow files are the canonical step list wherever the two disagree.
test_dir and test invocation vs what CI actually runs.

If the conventions file is missing entirely: auto-detect (package manifests, workflow files, docs layout), record the detection in discovery.md, and suggest authoring the file in the final summary. Never block on it.

Prior state — the ledger

Read docs/reviews/ledger.json (schema in ledger.md). Three cases:

Ledger exists — load it. It supplies: last reviewed SHA, per-lens scores, open/accepted/monitor findings for carry-forward, the next_review directives (must-cover lenses, lightly-scrutinised areas).
No ledger, but docs/reviews/<date>/ folders exist — backfill the ledger from the most recent synthesis.md first, per the backfill procedure in ledger.md. One-time migration; then proceed as case 1.
Neither — first review. Note "no baseline" and proceed.

Prior state — the monitor routing ledger

Also read docs/reviews/monitor-ledger.yaml if present (schema in ledger.md). It is the routing layer over the findings ledger's monitor subset: the findings ledger records whether/when a monitor item recurs (monitor_trigger + carry-forward); the routing ledger records how it fires. An active entry carries a mechanism that keeps it monitored — acceptance | hook | schedule | review — validated in-repo by scripts/check-monitor-ledger.py (pre-commit + CI), so a monitor item cannot sit un-routed between reviews. (Graduating an item to its own dedicated PLAN feature instead closes it — see Phase 5 step 8; that is an exit from MONITOR, not an active mechanism.) It supplies the review-mechanism entries this run must re-open (Phase 5) and the active acceptance routings to reconcile against reality (an acceptance-guarded item whose fix has silently shipped is the drift this catches). Absent (no routing ledger yet) → the first run that produces or backfills monitor findings creates it in Phase 6.

Mechanical grounding — run the gates, then audit them

Run the repo's own quality gates and capture results as ground truth for the lens agents (findings about things a gate already catches are noise; findings about things a gate claims to catch but doesn't are gold):

the pre-commit suite (pre-commit run --all-files) if configured
the canonical test invocation — taken from the CI workflow file, not from memory or conventions
dependency audit (pip-audit / npm audit / cargo audit …) where configured
the type-checker at its configured scope

Timebox each; on failure-to-run record SKIPPED with reason — a skipped gate is a coverage note, never a silent omission.

While doing this, capture the gate inventory for the operations lens's truthfulness audit: every pre-commit hook's files:/exclude: scope, the CI step list, the type-checker's configured scope, which lockfiles/requirements the dependency audit actually covers, and any env-gated tests that exist but never run in CI.

Phase 1: Scope

Inventory

Build a structured inventory with Glob/Grep (never hardcode file lists). Scan: backend packages (per package), API layer, data layer + migrations, declared pattern directories (workflows/adapters/events), LLM/agent/prompt code (frameworks, providers, prompt templates, MCP servers, agent configs), config/settings, frontend dirs, tests per layer, infrastructure (containers, CI workflows, manifests), backlog (epics, feature counts, statuses), docs.

Diff-scope

If the ledger has a prior sha: git diff --stat <sha>..HEAD to build a churn map.

Changed/new areas get full lens attention.
Unchanged areas get declared sampling: each lens is told which areas it may sample and must list them under Sampled in its coverage manifest.
Every 4th review, or --full, or no prior SHA: no diff-scoping — full sweep. "Every 4th" is mechanical: this review's ordinal is len(ledger.reviews) + 1; ordinal divisible by 4 → full sweep.

Diff-scoping allocates attention; it never silently excludes. Anything neither read nor sampled must appear as Not examined.

Derive the lens set

The six floor lenses always run (charters in lenses.md): Architecture & Structure, Security, Code Quality, Testing, Operations & Infrastructure, Maintenance & Lifecycle.

Then read the repo's risk registers and apply the trigger table in lenses.md to derive additional lenses:

the canon manifest (docs/context/canonical.yaml or equivalent) and its flagged docs' deviation/gap tables
RAID / risk registers (e.g. docs/roadmap/raid.md), threat models (e.g. docs/security.md), ADR deviation tables
the prior ledger's next_review.must_cover — these are mandatory lenses this run
the prior ledger's lightly_scrutinised list — seed these areas into the relevant lens charters

Record the final lens set and the rationale for each derived lens in discovery.md. The lens set is part of the review's contract: next review compares against it.

Phase 2: Lens reviews

Launch one agent per lens, in parallel. Use the Workflow tool when available (per-agent results checkpoint as they complete, and the run is resumable after interruption); otherwise fan out with the Agent tool. Same prompts either way.

Each lens agent receives:

its charter from lenses.md
the discovery inventory + churn map (with its sampling allocation)
the gate results + gate inventory from Phase 0
project context synthesized from conventions + CLAUDE.md (languages, frameworks, multi-tenancy approach, the repo's own named disciplines)
its lens's open/accepted/monitor findings from the ledger (for recurrence checks — not for anchoring new work)
the list of the repo's dispositions registers (RAID, ADR deviation tables, tech-debt backlog) for pre-write dedup

Each agent writes its own report to docs/reviews/YYYY-MM-DD/<lens-key>.md following the report contract in lenses.md, and returns to the orchestrator only a structured summary: per finding {id, title, severity, disposition, files}, plus its coverage manifest. Full report bodies never round-trip through the orchestrator's context.

Standing instructions inside every lens prompt (spelled out in lenses.md): evidence cites file + symbol; verify any count you assert; a hedged claim is either resolved or tagged UNVERIFIED; check the dispositions registers before writing a finding as new; apply the solution-quality-over-effort rubric (an effort-justified security/authz/contract choice is a finding at P1 minimum, flagged for human decision).

Phase 3: Verify

Adversarial verification by independent agents that did not author the finding:

Mandatory: every P0 and P1, and every PLAN-disposition finding.
Sampled: at least 25% of the remaining P2/P3, biased toward findings whose evidence is a single file read.

The verifier re-reads the cited evidence at HEAD and actively tries to refute the finding. Verdicts: confirmed, refuted (excluded from scoring and the priority tables; retained in the ledger with status refuted for calibration), reprioritised (with the new priority), enriched (evidence corrected/extended).

Back-propagate every verdict into the lens report file — a lens report and the synthesis must never disagree on a finding's priority. Record the run's refutation and reprioritisation rates in the ledger (calibration trend across reviews).

Phase 4: Critique

One critic agent, after the lens summaries and coverage manifests are in. Inputs: all summaries + manifests, the discovery inventory, and the repo's own risk claims (threat model, canon manifest, RAID).

Charter:

Coverage reconciliation — inventory vs the union of manifests: what got neither read nor sampled? Produce the covered/sampled/not-examined table.
Self-claimed risks untouched — anything the repo's own threat model or canon declares critical that no finding addresses? (The review's most dangerous failure mode is silence on a declared risk.)
Cross-lens contradictions — disagreeing counts, conflicting recommendations, duplicated findings the lenses missed.

Critic findings join the pool under the critique lens key and P0/P1s among them go through Phase 3 verification like any other. The not-examined list feeds both the synthesis Coverage section and the ledger's next_review.lightly_scrutinised.

Phase 5: Synthesis

Collect all verified findings.
Dedup in-run — merge multi-lens duplicates into one finding (note all source lenses); fold cleanly, no vestigial rows.
Dedup vs the ledger (rules in ledger.md) — a finding matching an open prior finding is recurring (update last_seen, increment seen_count, keep its original ID); a finding matching a prior accepted/monitor item is auto-suppressed unless the evidence materially changed, in which case escalate with an explicit evidence-change note.
Dedup vs the repo's registers — a finding already recorded in RAID, an ADR deviation table, or an existing backlog entry is reported as known/dispositioned with a pointer to the register row, never as new. Only un-dispositioned findings are new. Disagreement with a recorded disposition is allowed but must cite and argue against the recorded rationale.
Score mechanically — apply the banding rule in ledger.md to each lens's verified finding counts. Record both the lens agent's raw score and the synthesis-computed final score; the scorecard shows the final, with a note where they differ. Overall = median of lens finals, adjustable ±1 with a documented reason.
Trend — compute from the ledger: resolution rate, recurring findings, regressions, new-vs-prior, refutation-rate trend.
Cluster PLAN findings into topics — each topic one independent deliverable (independently mergeable, not a slice that's meaningless alone). Task-count past goal_max_tasks is a soft re-check signal, not a limit. Route a finding to an existing backlog feature when one already covers it. Keep the single review-quick-fixes bundle for QUICK_FIX findings.
Route every MONITOR finding — a finding carried or newly dispositioned monitor is not done until it is either kept active under a mechanism that fires on its own, or graduated out. Active mechanisms: acceptance (a guard promoted into a milestone feature's Acceptance criteria — stays monitored until that feature ships), hook (a scoped pre-commit/CI tripwire), schedule (a /schedule poll), or review (time/drift — this skill re-opens it each run; record the mechanism it should graduate to as target_mechanism). Graduating an item to its own dedicated PLAN feature is the exit: the backlog then owns it, so its routing entry becomes closed (resolution = the feature), never an active feature mechanism. An un-routed monitor finding is the same defect as an un-homed PLAN finding — do not allow one. Then:
- Re-open every review-mechanism entry in the routing ledger: re-verify it against HEAD (the same resolution check as open findings), bump last_checked, and if its monitor_trigger has fired, escalate it to PLAN this run.
- Reconcile active entries against reality: re-verify each at HEAD — an item whose underlying issue has silently shipped is resolved, not carried (the same drift that lets a closed feature read not-started). For acceptance entries, additionally confirm the pointer still resolves and still names the finding id.

Phase 6: Emit

Reports — write discovery.md and synthesis.md per report-format.md (lens reports were written in Phase 2 and corrected in Phase 3).
Ledger — update docs/reviews/ledger.json per ledger.md: append this review's entry (date, SHA, lens set, scores, stats, rates, next_review block), add new findings, update carried ones. Never rewrite prior reviews' entries.
Monitor routing ledger — write/refresh docs/reviews/monitor-ledger.yaml (schema in ledger.md): new monitor findings → routed active entries (id = the finding's lens-local id, which joins the findings ledger as CR-<source_review>-<id>); graduated (to a feature) or resolved items → closed with a resolution. The join is 1:1 — every findings-ledger status: monitor entry has exactly one active routing entry and vice versa; an item that graduated or resolved is no longer status: monitor, so it is a closed entry, never active. Reconcile any divergence here (this step owns the join; scripts/check-monitor-ledger.py owns structural integrity + pointer resolution). The synthesis "Items to Monitor" section is a rendered snapshot (from check-monitor-ledger.py --report), never a second source of record.
Plan stubs — proposal-only. For each plan topic, create <tech-debt home>/<topic>/proposal.md using the repo's own proposal template (from backlog.templates_dir; minimal fallback skeleton in report-format.md if the repo has none). Frontmatter per the repo's conventions; body must cite Source: codebase-review YYYY-MM-DD and the finding IDs. Do not author design.md or tasks.md, do not invent decisions or task breakdowns — detail tracks confidence (altitude discipline). Elaboration is /feature-elaborate's job when the topic is scheduled; validation is /feature-review's job before implementation. This skill never inline-validates its own plans.
Final summary to the user per report-format.md, ending with the recommended next-review date and its must-cover list (also persisted in the ledger — the recommendation must survive the session).

Single-lens mode (`--lens <key>`)

Runs Phases 0, 1 (scoped to the named lens), 2 (one agent), 3, 5, 6 with the full ledger/dedup/verification machinery. The scorecard re-bands only the reviewed lens; all others carry their prior score marked not re-reviewed. Use it for cheap between-baseline passes (e.g. --lens canon-drift after a planning push, --lens security after an incident).

The Phase 5 monitor-routing step runs in full regardless of the lens scope — it is lens-independent (re-opening review-mechanism entries and reconciling acceptance routings keeps the ledger's cadence honest even on a single-lens pass). Intentional: a cheap pass should still not let monitored items rot.

Execution constraints

Autonomous completion — all phases without user input; judgment calls documented in synthesis Process notes.
Read actual code — every finding cites file + symbol (function/class/section). Line numbers are allowed as transient extras only; symbols are what survive drift.
No code changes — reports, ledger, and proposal-only stubs are the only writes.
Context economy — lens agents write their own report files and return structured summaries; never round-trip full report bodies.
Verification is not optional — an unverified P0/P1 may not appear in the scorecard or the priority tables.
Ledger integrity — append and update; never rewrite a prior review's entries or re-mint an existing finding's ID.
Track progress — TodoWrite per phase.
Date-stamped output — today's date for the review folder.

codebase-review

Invocation

Context Preview

Supporting Files

SKILL.md

codebase-review

Invocation

Context Preview

Supporting Files

SKILL.md

/codebase-review

Usage

Execution overview

Phase 0: Ground

Conventions — read, then verify

Prior state — the ledger

Prior state — the monitor routing ledger

Mechanical grounding — run the gates, then audit them

Phase 1: Scope

Inventory

Diff-scope

Derive the lens set

Phase 2: Lens reviews

Phase 3: Verify

Phase 4: Critique

Phase 5: Synthesis

Phase 6: Emit

Single-lens mode (--lens <key>)

Execution constraints

Similar Skills

/codebase-review

Usage

Execution overview

Phase 0: Ground

Conventions — read, then verify

Prior state — the ledger

Prior state — the monitor routing ledger

Mechanical grounding — run the gates, then audit them

Phase 1: Scope

Inventory

Diff-scope

Derive the lens set

Phase 2: Lens reviews

Phase 3: Verify

Phase 4: Critique

Phase 5: Synthesis

Phase 6: Emit

Single-lens mode (--lens <key>)

Execution constraints

Similar Skills

Single-lens mode (`--lens <key>`)

Single-lens mode (`--lens <key>`)