From rune
Reviews code for bugs, security issues, bad patterns, and untested code. Delegates fixes, tests, and security scans to specialized skills.
How this skill is triggered — by the user, by Claude, or both
Slash command
/rune:reviewThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Code quality analysis. Review finds bugs, bad patterns, security issues, and untested code. It does NOT fix anything — it reports findings and delegates: bugs go to rune:fix, untested code goes to rune:test, security-critical code goes to rune:sentinel.
Code quality analysis. Review finds bugs, bad patterns, security issues, and untested code. It does NOT fix anything — it reports findings and delegates: bugs go to rune:fix, untested code goes to rune:test, security-critical code goes to rune:sentinel.
A review that says "LGTM" or "code looks good" without specific file:line references is NOT a review. Every review MUST cite at least one specific concern, suggestion, or explicit approval per file changed.cook Phase 5 REVIEW — after implementation completefix for self-review on complex fixes/rune review — manual code reviewscout (L2): find related code for fuller context during reviewtest (L2): when untested edge cases found — write tests for themfix (L2): when bugs found during review — trigger fixsentinel (L2): when security-critical code detected (auth, input, crypto)docs-seeker (L3): verify API usage is current and correcthallucination-guard (L3): verify imports and API calls in reviewed codedesign (L2): when UI anti-patterns suggest missing design system — recommend design skill invocationperf (L2): when performance patterns detected in frontend diffreview-intake (L2): structured intake for complex multi-file reviewssast (L3): static analysis security scan on reviewed codeneural-memory | After review complete | Capture code quality insightcook (L1): Phase 5 REVIEW — post-implementation quality checkfix (L2): complex fix requests self-review/rune review direct invocationsurgeon (L2): review refactored code qualityrescue (L1): review refactored code qualitydesign (L2): review UI/design implementation qualitygraft (L2): review grafted code integrationreview → test — untested edge case found → test writes itreview → fix — bug found during review → fix applies correctionreview → scout — needs more context → scout finds related codereview → improve-architecture — when reviewer flag mentions "shallow", "wrapper", "indirection", or pass-through patternreview ← fix — complex fix requests self-reviewreview → sentinel — security-critical code → sentinel deep scanDetermine what to review.
Bash with git diff main...HEAD or git diff HEAD~1 to see exactly what changedRead on each named filerune:scout to identify all files touched by the changeFor each modified function/class, estimate its blast radius before reviewing.
Use Grep to count direct callers/importers of each modified symbol:
blast_radius = count(files importing or calling this symbol)
| Blast Radius | Risk | Review Depth |
|---|---|---|
| 1-5 callers | Low | Standard review |
| 6-20 callers | Medium | Check all callers for compatibility |
| 21-50 callers | High | Thorough review + regression test check |
| 50+ callers | Critical | MUST escalate to adversarial analysis (rune:adversary) even in quick triage |
Read each changed file. Prioritize bugs that pass CI but break production — these are the highest-value findings because linters and type checkers already catch the rest.
Read on every file in scopeCommon patterns to flag:
// BAD — missing await causes race condition
async function saveUser(data) {
db.users.create(data); // caller proceeds before save completes
return { success: true };
}
// GOOD
async function saveUser(data) {
await db.users.create(data);
return { success: true };
}
// BAD — null deref crash
function getUsername(user) {
return user.profile.name.toUpperCase(); // crashes if profile or name is null
}
// GOOD — safe access
function getUsername(user) {
return user?.profile?.name?.toUpperCase() ?? 'Anonymous';
}
Check consistency with project conventions.
Grep to sample similar code)any, full type coverage, no non-null assertions without justificationCommon patterns to flag:
// BAD — mutation
function addItem(cart, item) {
cart.items.push(item); // mutates in place
return cart;
}
// GOOD — immutable
function addItem(cart, item) {
return { ...cart, items: [...cart.items, item] };
}
// BAD — any defeats TypeScript's purpose
function process(data: any): any {
return data.items.map((i: any) => i.value);
}
// GOOD — typed
function process(data: { items: Array<{ value: string }> }): string[] {
return data.items.map(i => i.value);
}
Check for security-relevant issues.
rune:sentinel for deep scanOptional cross-model second opinion (security-critical / opus-escalated reviews only): a same-family reviewer shares blind spots with the author. For genuinely irreversible or attacker-facing changes (auth, crypto, payment, data migration), you MAY offer the user a different-architecture second pass via an external CLI (Gemini/Codex). This is opt-in and interactive-only — offer, never auto-invoke; skip in non-interactive runs (CI, /loop, scheduled) and announce the skip. If the user accepts, follow the safe transport in ../adversary/references/cross-model-escalation.md (per-call authorization, read-only sandbox, stdin not inline args), pass the diff + the security contract (not your verdict), and reconcile the reply as data — not a ruling.
For code that exposes APIs, shared utilities, or reusable interfaces, evaluate through 3 adversary personas:
| Adversary | Mindset | What They Reveal |
|---|---|---|
| The Scoundrel | Malicious — controls config, crafts inputs, exploits edge cases | Security holes, privilege escalation, injection surfaces |
| The Lazy Developer | Copy-pastes from docs, skips error handling, uses defaults | Unsafe defaults, missing validation, footgun APIs |
| The Confused Developer | Misunderstands API semantics, passes wrong types, ignores return values | Ambiguous interfaces, poor naming, missing type safety |
Pit-of-Success principle: Secure, correct usage should be the path of least resistance. If the API makes it EASIER to use it wrong than right → WARN.
Check: Does the API have sensible defaults? Does misuse fail loudly (not silently)? Is the happy path obvious from the signature?
Skip if: Code is internal-only (no external consumers), single-use utility, or test-only.
For any change that modifies exported functions, REST endpoints, event schemas, or shared types, check for backward-compatibility violations before proceeding.
Breaking change signals — flag any of these as HIGH:
| Signal | Example | Why it Breaks |
|---|---|---|
| Removed export | export function getUser deleted | Callers crash at import |
| Renamed parameter | id: string → userId: string | Named-argument callers break |
| Narrowed return type | User | null → User (null removed) | Callers that handle null crash |
| Required arg added | fn(a) → fn(a, b: string) | All existing callers missing b |
| Status code changed | 200 → 204 on success | Clients checking for body break |
| Event schema changed | { userId } → { user_id } | Consumers miss the field |
| Endpoint path renamed | /users/:id → /users/:userId | All client URLs broken |
Versioning check:
git diff main...HEAD — list every changed exported symbolCHANGELOG.md found: check that breaking changes are documented in the current version entrySkip if: Change is internal-only (no exports changed, no public API surface affected), or in test files only.
Identify gaps in test coverage.
Bash to check if a test file exists for each changed fileGlob to find test files: **/*.test.ts, **/*.spec.ts, **/__tests__/**rune:test with specific instructions on what to testGo beyond "test file exists" — check coverage at function granularity:
it(, test(, describe( blocks that reference the function)| Function Type | 0 tests | 1 test | 2+ tests |
|---|---|---|---|
| Business logic (money, auth, state) | BLOCK | WARN: "only happy path" | PASS |
| Data transform (parse, format, map) | HIGH | PASS | PASS |
| Event handler (onClick, onSubmit) | MEDIUM | PASS | PASS |
| Pure utility (string, math, date) | MEDIUM | PASS | PASS |
### Test Gap Analysis
| Function | File | Tests Found | Verdict |
|----------|------|-------------|---------|
| calculateTotal | src/billing.ts:42 | 3 (happy, zero, overflow) | PASS |
| processRefund | src/billing.ts:89 | 0 | BLOCK — business logic untested |
| formatCurrency | src/utils.ts:12 | 1 | PASS |
Skip if: Diff only touches config, docs, styles, or test files themselves.
Separate spec compliance from code quality. Most reviews conflate both — this gate forces the distinction.
Stage 1 — Spec Compliance (check FIRST)
Before evaluating code quality, verify the implementation matches what was asked:
requirements.md if availableFlag spec deviations as HIGH — clean code that misses requirements ships broken products.
# Spec Compliance Checklist
[ ] All acceptance criteria from plan/ticket covered
[ ] No stated requirements missing from implementation
[ ] No unrequested features added (scope creep)
[ ] API surface matches what was specified (signatures, endpoints, return types)
[ ] File structure matches plan (no renamed or relocated files without justification)
If spec violations found: document them separately from code quality findings in the report. Label as SPEC-MISS or SPEC-CREEP.
Stage 2 — Code Quality
Proceed to Step 6 only after Stage 1 passes. Code quality findings (bugs, patterns, security, coverage) are the existing Steps 2–5 above.
The review report MUST show both stages: spec compliance verdict first, then code quality findings.
Produce a structured severity-ranked report.
Before reporting, apply confidence filter:
Only report findings with >80% confidence it is a real issue
Consolidate similar issues: "8 functions missing error handling in src/services/" — not 8 separate findings
Skip stylistic preferences unless they violate conventions found in .eslintrc, CLAUDE.md, or CONTRIBUTING.md
Adapt to project type: a console.log in a CLI tool is fine; in a production API handler it is not
Group findings by severity: CRITICAL → HIGH → MEDIUM → LOW
Include file path and line number for every finding
Include a Positive Notes section (good patterns observed)
Include a Verdict: APPROVE | REQUEST CHANGES | NEEDS DISCUSSION
From gstack (garrytan/gstack, 50.9k★): "Reviews that produce 20 findings and delegate all to the user waste everyone's time."
Classify each finding as AUTO-FIX or ASK before reporting:
| Category | Auto-Fix? | Examples |
|---|---|---|
| Dead imports, unused variables | AUTO-FIX | import { foo } from './bar' where foo is never used |
| Missing error handling on obvious paths | AUTO-FIX | await fetch() without try/catch in production code |
| Console.log in production code | AUTO-FIX | Remove console.log from non-CLI production files |
| Architectural concern, trade-off | ASK | "This bypasses the auth middleware — intentional?" |
| Ambiguous intent | ASK | "Is this fallback behavior correct for null users?" |
| Style/convention disagreement | ASK | "Project uses camelCase but this file uses snake_case" |
After classification:
rune:fix — include all in a single batchAskUserQuestion — not 5 separate questionsRationalization prevention: "This looks fine" is NOT acceptable without evidence. If you can't cite a specific file:line or convention that justifies the code, flag it as UNVERIFIED — don't rationalize away uncertainty.
From gstack (garrytan/gstack, 50.9k★): "Intent vs diff catches scope creep that plan-based guards miss."
After reviewing code, compare stated intent vs actual diff:
git diff --stat to see actual file changes| Result | Meaning | Action |
|---|---|---|
| CLEAN | All changed files serve the stated intent | Note in report |
| DRIFT | 1-2 files changed that don't relate to stated intent | WARN — "These files were modified but aren't mentioned in the task: [list]" |
| REQUIREMENTS_MISSING | Stated intent mentions files/features not in the diff | WARN — "Task mentions X but it's not in the diff" |
This is informational, not blocking. Scope drift is common and sometimes intentional — but making it visible prevents silent creep.
After reporting:
rune:fix immediately with the finding detailsrune:fix with the finding detailsrune:test with specific coverage gaps identifiedneural-memory (Capture Mode) to save any novel code quality patterns or recurring issues found.Apply only if the framework is detected in the changed files. Skip if not relevant.
React / Next.js (detect: import React or .tsx files)
useEffect with missing dependencies (stale closure) → flag HIGHkey={i} → flag MEDIUMuseState, useEffect) in Server Components (Next.js App Router) → flag HIGHNode.js / Express (detect: import express or require('express'))
req.body passed directly to DB without validation schema → flag HIGHPython (detect: .py files with django, flask, or fastapi imports)
except: bare catch without specific exception type → flag MEDIUMdef func(items=[]) → flag HIGHApply only when .tsx, .jsx, .svelte, .vue, or .html files are in the diff. Skip for backend-only changes.
These are the "AI UI signature" — patterns that make AI-generated frontends visually identifiable as non-human-designed. Flag each as MEDIUM severity.
Preamble — load design contract first:
If .rune/design-system.md exists, read it first. Pull the project's Scale Minimums block (if authored by rune:design v0.5.0+) and apply those thresholds instead of the defaults below. Missing design-system.md → use defaults and add a LOW finding: "Project has no design-system.md — run rune design to lock visual decisions." Never enforce stale defaults against a project that has already declared stricter/looser minimums.
AI_ANTIPATTERN — Purple/indigo default accent with no domain justification:
// BAD: LLM default color bias — signals "AI-generated" to experienced designers
className="bg-indigo-600 text-white" // every button/CTA is indigo
// GOOD: domain-appropriate — trading → neutral dark, healthcare → trust blue,
// e-commerce → conversion-optimized warm. Purple is only appropriate for
// AI-native tools and creative platforms.
AI_ANTIPATTERN — Card-grid monotony (every section is 3-col cards, zero layout variation):
// BAD: every section uses the same grid pattern
<div className="grid grid-cols-3 gap-6"> // features
<div className="grid grid-cols-3 gap-6"> // testimonials
<div className="grid grid-cols-3 gap-6"> // pricing
// GOOD: mix layouts — split sections, bento grids, full-bleed hero, list+detail
AI_ANTIPATTERN — Centeritis (everything centered, no directional flow):
// BAD: no visual tension, no reading direction
<div className="text-center flex flex-col items-center"> // hero
<div className="text-center"> // every feature section
// GOOD: left-align body copy, use centering intentionally for hero/CTAs only
AI_ANTIPATTERN — Numeric/financial values in non-monospace font:
// BAD: prices, stats, metrics in Inter/Roboto
<span className="text-2xl font-bold">${price}</span>
// GOOD: monospace for all numbers that need alignment
<span className="font-mono text-2xl font-bold">${price}</span>
AI_ANTIPATTERN — Scale Minimum violations (AI boilerplate tell):
// BAD: body text at 14px (AI default) — primary content must be ≥16px
<p className="text-sm">Welcome to the dashboard.</p>
// BAD: hero/display text below 40px — reads as "section heading", not "hero"
<h1 className="text-3xl font-bold">Ship Faster</h1> // 30px
// BAD: touch target below 44×44px on mobile
<button className="w-8 h-8"><XIcon /></button> // 32px — WCAG 2.5.8 failure
// GOOD: hero ≥48px, body ≥16px, touch ≥44×44px
<h1 className="text-5xl md:text-6xl font-bold">Ship Faster</h1> // 48-60px
<p className="text-base">Welcome to the dashboard.</p> // 16px
<button className="w-11 h-11"><XIcon /></button> // 44px
Pull project-specific overrides from .rune/design-system.md § Scale Minimums.
AI_ANTIPATTERN — Hand-rolled SVG for standard iconography:
// BAD: custom <svg> for dashboard/menu/close/chevron — AI geometry almost always malformed
<svg viewBox="0 0 24 24"><path d="M3 3h18v18H3z M3 9h18 M9 3v18"/></svg>
// GOOD: Phosphor Icons (preferred) or Huge Icons
import { House, List, X } from '@phosphor-icons/react';
<House weight="bold" size={24} />
// GOOD: labeled placeholder when no icon library available yet
<span className="icon-placeholder" aria-label="Dashboard icon — design pass needed">
[ ICON: dashboard ]
</span>
Exceptions: inline SVG for project-unique logos, data visualizations (charts/graphs), or decorative illustrations generated by a human designer — these are not "standard iconography."
AI_ANTIPATTERN — Manual hex shading for accent states (oklch() violation):
/* BAD: hand-darkened hex — breaks perceived lightness consistency */
--accent: #3b82f6;
--accent-hover: #2563eb; /* guessed darker */
--accent-pressed: #1d4ed8; /* guessed even darker */
/* GOOD: relative oklch() derivation */
--accent: oklch(62% 0.19 258);
--accent-hover: oklch(from var(--accent) calc(l - 0.08) c h);
--accent-pressed: oklch(from var(--accent) calc(l - 0.15) c h);
--accent-subtle: oklch(from var(--accent) calc(l + 0.3) calc(c * 0.4) h);
Flag any CSS file defining 2+ hover/pressed/active variants with sibling hex literals. Not a finding if accent uses a design-token library (Radix Colors, Tailwind palette) that already ships perceptually-tuned scales.
AI_ANTIPATTERN — Missing UI states (only happy path rendered):
// BAD: data rendering without empty/error/loading states
{data.map(item => <Card key={item.id} {...item} />)}
// GOOD: all 4 states covered
{isLoading && <CardSkeleton />}
{error && <ErrorState message={error.message} />}
{!data.length && <EmptyState />}
{data.map(item => <Card key={item.id} {...item} />)}
Accessibility — flag as HIGH (these are WCAG 2.2 failures):
// BAD: icon button with no accessible name
<button onClick={close}><XIcon /></button>
// GOOD
<button onClick={close} aria-label="Close dialog"><XIcon aria-hidden="true" /></button>
// BAD: placeholder as label
<input placeholder="Email address" type="email" />
// GOOD
<label htmlFor="email">Email address</label>
<input id="email" type="email" />
// BAD: removes focus ring without replacement
className="focus:outline-none"
// GOOD: must have focus-visible replacement
className="focus:outline-none focus-visible:ring-2 focus-visible:ring-blue-500"
// BAD: color as sole information conveyor
<span className="text-red-500">{errorMessage}</span>
// GOOD: icon + color + text
<span className="text-red-500 flex gap-1"><ErrorIcon aria-hidden />Error: {errorMessage}</span>
WCAG 2.2 New Rules — flag as MEDIUM:
position: sticky or position: fixed header/footer without scroll-padding-top → Focus Not Obscured (2.4.11)width < 24px or height < 24px without 8px spacing → Target Size (2.5.8)Platform-Specific — flag as MEDIUM when platform is detectable:
MaterialTheme.colorScheme tokens → not adaptive to dynamic colorWhen a review is part of a recurring quality-gate cycle (e.g., sprint review, pre-release gate), produce a composite quality score alongside the findings list. This makes review output numeric and comparable across runs.
Quality Score = (Correctness × 0.35) + (Security × 0.30) + (Test Coverage × 0.20) + (Conventions × 0.15)
Each dimension is scored 0–100 based on findings count and severity:
| Score | Grade | Verdict |
|---|---|---|
| 90–100 | Excellent | APPROVE |
| 75–89 | Good | APPROVE with notes |
| 60–74 | Fair | REQUEST CHANGES (MEDIUM issues) |
| 40–59 | Poor | REQUEST CHANGES (HIGH issues present) |
| 0–39 | Critical | REQUEST CHANGES (CRITICAL present) |
When to include: Only when mode: "scored" is passed by the caller, or when invoked by audit. Default review output uses the standard severity-ranked report without the score.
CRITICAL — security vulnerability, data loss risk, crash bug
HIGH — logic error, missing validation, broken edge case
MEDIUM — code smell, performance issue, missing error handling
LOW — style inconsistency, naming suggestion, minor refactor opportunity
## Code Review Report
- **Files Reviewed**: [count]
- **Findings**: [count by severity]
- **Review Commit**: [git hash at time of review]
- **Overall**: APPROVE | REQUEST CHANGES | NEEDS DISCUSSION
### Spec Compliance
- [PASS/FAIL]: [acceptance criteria coverage]
### CRITICAL
- `path/to/file.ts:42` — [description of critical issue]
### HIGH
- `path/to/file.ts:85` — [description of high-severity issue]
### MEDIUM
- `path/to/file.ts:120` — [description of medium issue]
### Blast Radius
- [High-impact symbols with caller counts]
### Positive Notes
- [good patterns observed]
### Verdict
[Summary and recommendation]
Track the git commit hash at review time. If code changes after review → review is STALE.
Review commit: abc123 → Code changed to def456 → Review is STALE, re-review required
When cook or ship checks review status: compare review commit hash with current HEAD. If different → WARN: "Review is stale — code changed since last review."
| Artifact | Format | Location |
|---|---|---|
| Code review report | Markdown | inline (chat output) |
| Severity-ranked findings | Markdown table | inline |
| Spec compliance verdict | Markdown | inline |
| Composite quality score | Markdown table | inline (when mode: "scored") |
| Blast radius assessment | Markdown table | inline |
Append to Code Review Report when invoked standalone. Suppress when called as sub-skill inside an L1 orchestrator (cook, team, etc.) — the orchestrator emits a consolidated block. See docs/references/chain-metadata.md.
chain_metadata:
skill: "rune:review"
version: "1.2.0"
status: "[DONE | DONE_WITH_CONCERNS]"
domain: "[area reviewed]"
files_changed: [] # review doesn't change files
exports:
findings_count: { critical: [N], high: [N], medium: [N], low: [N] }
findings:
- { severity: "[level]", file: "[path]", line: [N], message: "[issue]" }
verdict: "[APPROVE | REQUEST_CHANGES | NEEDS_DISCUSSION]"
quality_score: [0-100] # when mode: "scored"
suggested_next:
- skill: "rune:fix"
reason: "[grounded in findings — e.g., '2 HIGH findings in api/users.ts need remediation']"
consumes: ["findings"]
| Failure Mode | Severity | Mitigation |
|---|---|---|
| Finding flood — 20+ findings overwhelm developer | MEDIUM | Confidence filter: only >80% confidence, consolidate similar issues per file |
| "LGTM" without file:line evidence | HIGH | HARD-GATE blocks this — cite at least one specific item per changed file |
| Expanding review scope beyond the diff | MEDIUM | Limit to git diff scope — do not creep into adjacent unchanged files |
| Security finding without sentinel escalation | HIGH | Any auth/crypto/payment code touched → MUST call rune:sentinel |
| Skipping UI anti-pattern checks for frontend changes | MEDIUM | Any .tsx/.jsx/.svelte/.vue in diff → MUST run UI/UX Anti-Pattern Checks section |
| Skipping spec compliance check (Step 5.5 Stage 1) | HIGH | Code quality without spec check ships clean code that does the wrong thing — always load the plan/ticket before reviewing quality |
| Treating purple/indigo accent as "just a color choice" | MEDIUM | It is a documented AI-generated UI signature — always flag for domain justification |
| Suggesting "add X" without checking if X is used | MEDIUM | YAGNI pushback: grep codebase for the suggested feature → if uncalled anywhere → respond "Not called anywhere. Remove? (YAGNI)". Valid pushback, not laziness |
| Adding abstractions "for future flexibility" | MEDIUM | Three similar lines > premature abstraction. Only abstract when there are 3+ concrete callers today |
| Missing cross-phase integration check at phase boundary | MEDIUM | When reviewing a phase completion: check orphaned exports, uncalled routes, auth gaps, E2E flow continuity. Delegate to completion-gate Step 4.5 |
| Review loop exceeds 3 iterations without resolution | MEDIUM | Cap at 3 review loops. After 3rd iteration with unresolved findings → surface to user with "these findings persist after 3 fix attempts — needs human decision" |
| Auto-fixing something that should have been ASK | HIGH | When in doubt, ASK. AUTO-FIX only for mechanical issues (dead imports, console.log). Anything involving intent or trade-offs = ASK |
| Scope drift flagged on intentional refactoring | LOW | Scope drift is informational, not blocking. User can override with "intentional" — don't re-flag after override |
~3000-6000 tokens input, ~1000-2000 tokens output. Sonnet default, opus for security-critical reviews. Runs once per implementation cycle.
npx claudepluginhub rune-kit/rune --plugin @rune/analyticsReviews code changes for correctness, readability, architecture, security, and performance. Checks lint, type safety, test coverage, and security issues. Use for PRs, audits, or pre-merge reviews.
Conducts multi-axis code reviews evaluating correctness, readability, architecture, security, and performance before merging changes.
Dispatches concurrent code reviews from architecture, security, and testing perspectives on paths, modules, PRs, or staged changes before merging or release.