From dev
Audit a codebase for architecture and code quality patterns that scale. Use when the user says "code quality", "audit this repo", "check architecture", "refactor assessment", "what needs fixing", "tech debt", "code review the repo", "architecture review", or any variation of reviewing a codebase for structural issues. Produces a scored report with specific file:line references and prioritised fixes.
How this skill is triggered — by the user, by Claude, or both
Slash command
/dev:code-qualityThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Systematic architecture and code quality review. Checks 14 patterns extracted from
Systematic architecture and code quality review. Checks 14 patterns extracted from production-grade codebases. Produces a scored report with specific findings and prioritised refactoring recommendations.
This is an audit skill, not a fix skill. Report findings. Do not modify code unless the user explicitly asks for fixes after reviewing the report.
docs/code-quality/YYYY-MM-DD-HH-MM-code-quality-audit.md using the current date/time. Create the docs/code-quality/ directory if it doesn't exist.For each check: find 2–3 concrete examples (good or bad) with file:line references. Skip checks that don't apply to the project's scale or language — mark as N/A with a reason.
Suggested opening commands:
# Understand the shape of the codebase
find . -type f -name "*.ts" -o -name "*.tsx" | wc -l
find . -type f -name "*.ts" -o -name "*.tsx" | head -60
# Find large files (god modules)
find src/ -name "*.ts" -o -name "*.tsx" | xargs wc -l | sort -rn | head -20
# Find any-type usage
grep -rn ": any\|as any\|catch (e)" src/ --include="*.ts" --include="*.tsx" | grep -v node_modules | grep -v _generated
# Find raw env var reads outside config files
grep -rn "process\.env\|import\.meta\.env" src/ --include="*.ts" --include="*.tsx" | grep -v "lib/env\|config"
# Find circular import candidates
grep -rn "from '\.\." src/ --include="*.ts" --include="*.tsx" | grep -v node_modules | grep -v test
What to check: Do dependencies flow one way (leaves → core → app), or are there circular imports and reverse dependencies? Can you draw the module graph as a DAG?
Signs of quality:
import chains between modules or packagespackages/*) import only from each other, never from apps/*Red flags:
utils/ file importing from pages/ or components/stores/ importing from hooks/ or components/How to check:
# Utils importing from pages (reverse dependency)
grep -rn "from.*pages/" src/utils/ src/lib/ src/stores/ --include="*.ts" --include="*.tsx"
# Shared packages importing from apps
grep -rn "from.*apps/" packages/ --include="*.ts" --include="*.tsx"
# Stores importing from hooks or components
grep -rn "from.*hooks/\|from.*components/" src/stores/ --include="*.ts"
What to check: If the repo has shared packages (monorepo or packages/ structure),
are the package boundaries clean? Do packages stand alone, or do they secretly
reach into sibling packages or application code?
Signs of quality:
package.json with explicit dependencies declaredindex.ts — internals not directly imported../../apps/something (breaks publishability)publishConfig or equivalent present if publishing is intendedRed flags:
@bh/schemas/src/geo-v1/atterberg) rather than the barrel exportindex.ts barrel — every file is a public surfacepackage.json missing "main" or "exports" fieldHow to check:
# Packages reaching into app code
grep -rn "from '../../apps" packages/ --include="*.ts"
# Check each package has an index.ts
find packages/ -maxdepth 2 -name "index.ts"
# Check package.json exports fields
cat packages/*/package.json | grep -A3 '"exports"\|"main"\|"publishConfig"'
Note: Mark N/A if the project has no packages/ structure.
What to check: Are interfaces narrow and focused, or are there "god interfaces" that force implementers to provide methods they don't use?
Signs of quality:
Red flags:
Partial<BigInterface> used to avoid implementing unused methodsPattern to look for (good):
// Utility defines what IT needs, not what the store provides
interface HydrationTarget {
setBudget: (data: BudgetState) => void;
setRisk: (data: RiskState) => void;
}
// Caller wires real store at the boundary — utility stays testable
What to check: Are external data boundaries (API responses, file parsing, env vars, JSON hydration) validated at entry, or does untyped data flow into the system?
Signs of quality:
{ ok: true; value: T } | { ok: false; error: E })catch (err: unknown) with explicit narrowing — not catch (err: any)JSON.parse() always followed by schema validation before useRed flags:
catch (err: any) — bypasses TypeScript's type system on the most dangerous pathJSON.parse() without validation at hydration or API boundariesrow['field'] without shape validationas SomeType assertions on external data (casting, not validating)v.any() in a Convex schema for a field with a known shapeHow to check:
# Find catch-any
grep -rn "catch.*: any" src/ convex/ --include="*.ts" --include="*.tsx" | grep -v _generated
# Find raw JSON.parse without nearby validation
grep -rn "JSON\.parse" src/ --include="*.ts" --include="*.tsx"
# Find as-any assertions
grep -rn " as any" src/ --include="*.ts" --include="*.tsx" | grep -v node_modules
What to check: Does the codebase distinguish between error types that require
different handling (retry vs fail vs notify user), or is everything catch (e) { log(e) }?
Signs of quality:
AppError or equivalent carries code + retryable + userFacing flagsRed flags:
catch blocks that log and swallow without classificationtoAppError() exists but is not consistently used across catch blocksHow to check:
# Find console.error without toAppError or equivalent
grep -rn "console\.error" src/hooks/ src/components/ --include="*.ts" --include="*.tsx"
# Find swallowing catch blocks (catch with no rethrow or error surface)
grep -rn "catch" src/ --include="*.ts" | grep -v "toAppError\|AppError\|console"
What to check: Are stateful entities (sessions, orders, workflows, upload batches) managed through explicit state transitions, or can state be mutated arbitrarily?
Signs of quality:
z.enum()), not open stringsRed flags:
status = 'completed'string rather than a union of known valuesWhat to check: Are tests testing the right things at the right level? Do they run independently, or do they leak state into each other?
Signs of quality:
beforeEachRed flags:
beforeAll setting up state consumed by tests in other files../stores/ without resetting stateHow to check:
# Count test files vs source files
find src/__tests__ -name "*.test.ts" | wc -l
find src/ -name "*.ts" -not -path "*__tests__*" -not -path "*node_modules*" | wc -l
# Check for beforeEach resets in store tests
grep -rn "beforeEach\|setState" src/__tests__/ --include="*.test.ts"
What to check: Does the system validate inputs early and reject bad data before it propagates, or does it fail deep in the stack with confusing errors?
Signs of quality:
Red flags:
process.env.X scattered through business logic rather than centralisedHow to check:
# Raw env var reads outside a central config/env file
grep -rn "process\.env\.\|import\.meta\.env\." src/ convex/ --include="*.ts" --include="*.tsx" \
| grep -v "lib/env\|env\.ts\|config\.ts\|node_modules\|_generated"
What to check: Are non-critical side effects (logging, metrics, analytics, events) isolated so their failure never blocks the primary operation?
Signs of quality:
await is not used on non-critical side effects in the critical pathRed flags:
await logEvent(...) in the critical path without try/catchWhat to check: When a dependency is unavailable, does the system degrade gracefully or crash entirely?
Signs of quality:
Red flags:
What to check: Is configuration layered sensibly (defaults < env vars < runtime), or scattered across env vars, config files, and hardcoded values?
Signs of quality:
src/lib/env.ts (or equivalent) as the only place env vars are read.env.example)Red flags:
process.env.X or import.meta.env.X scattered throughout business logic.env committed to version controlHow to check:
# Env vars outside the central config file
grep -rn "import\.meta\.env\." src/ --include="*.ts" --include="*.tsx" | grep -v "src/lib/env"
# Check .env is gitignored
git ls-files .env
What to check: Does each file/module/package have a single clear purpose, or are there god modules mixing concerns?
Signs of quality:
snapshotSerializer.ts not utils.ts)Red flags:
utils.ts or helpers.ts over 200 lines with unrelated functionshooks/ containing what are effectively stores (no React lifecycle dependency)How to check:
# Large files
find src/ -name "*.ts" -o -name "*.tsx" | xargs wc -l 2>/dev/null | sort -rn | head -15
# Stores with many concerns
wc -l src/stores/*.ts | sort -rn | head -10
What to check: Are resources (connections, file handles, object URLs, background processes, timers) properly cleaned up, including on error paths?
Signs of quality:
URL.createObjectURL() always paired with URL.revokeObjectURL() after usesetTimeout/setInterval cleared in cleanup or finally blocksuseEffect cleanupRed flags:
How to check:
# Object URL creation without revoke nearby
grep -rn "createObjectURL" src/ --include="*.ts" --include="*.tsx"
grep -rn "revokeObjectURL" src/ --include="*.ts" --include="*.tsx"
# useEffect without cleanup
grep -rn "useEffect" src/ --include="*.tsx" -A 10 | grep -v "return () =>"
What to check: Is there a shared visual language across the product surfaces (portal, dropsite, client-facing tools), or does each app accumulate its own ad-hoc colour tokens, spacing scales, and component variants?
This check has two tiers depending on maturity:
Tier 1 — No shared UI package yet (early stage)
The question is whether the foundations are in place to extract one later without a rewrite. Signs you're set up well:
packages/ui (no app-specific imports baked in).storybook/ exists even if dormant — signals intent to document componentsSigns of drift beginning:
packages/ui despite multiple app surfaces existingTier 2 — packages/ui exists
packages/ui/src/index.ts)packages/ui/tokens.ts and imported everywhereRed flags (either tier):
className="text-[#2D6BE4]")packages/ui exists but consuming apps bypass it and implement components locallyHow to check:
# Hardcoded hex colours outside tailwind config
grep -rn "#[0-9a-fA-F]\{3,6\}" src/ apps/ --include="*.tsx" --include="*.ts" \
| grep -v "tailwind.config\|tokens\|node_modules"
# Duplicate component names across apps
find apps/ -name "Button.tsx" -o -name "Card.tsx" -o -name "Footer.tsx" 2>/dev/null
# Check Storybook story count and recency
find . -name "*.stories.tsx" -o -name "*.stories.ts" 2>/dev/null
git log --oneline -- "**/*.stories.*" | head -5
# Check packages/ui exists and has an index
ls packages/ui/src/index.ts 2>/dev/null && echo "exists" || echo "no packages/ui yet"
Scoring guide for this check:
packages/ui with current stories, tokens defined once, no duplicationNote: Score at Tier 1 ceiling (max 7) if packages/ui does not yet exist, regardless
of how good the Tailwind config is. The ceiling reflects structural readiness, not
current quality.
# Code Quality Audit: {repo name}
**Date:** {date}
**Auditor:** Claude Code
**Scope:** {what was reviewed — full repo / specific packages / recent changes}
**Codebase size:** {approx file count and line count}
---
## Scores
| # | Dimension | Score | Key Finding |
|---|-----------|-------|-------------|
| 1 | Dependency Direction | X/10 | one-line summary |
| 2 | Package Boundary Hygiene | X/10 | one-line summary |
| 3 | Interface Segregation | X/10 | one-line summary |
| 4 | Type Safety at Boundaries | X/10 | one-line summary |
| 5 | Error Classification | X/10 | one-line summary |
| 6 | State Machine Discipline | X/10 | one-line summary |
| 7 | Test Quality & Distribution | X/10 | one-line summary |
| 8 | Fail-Fast at Boundaries | X/10 | one-line summary |
| 9 | Observable-Only Side Effects | X/10 | one-line summary |
| 10 | Graceful Degradation | X/10 | one-line summary |
| 11 | Configuration Hierarchy | X/10 | one-line summary |
| 12 | Module Cohesion | X/10 | one-line summary |
| 13 | Cleanup & Resources | X/10 | one-line summary |
| 14 | Design System & Visual Consistency | X/10 | one-line summary |
**Overall: {average}/10**
**Previous score (if applicable):** {X/10 — note trajectory}
---
## Top Findings
### Critical (fix before next feature)
1. **{short label}** — `file:line` — {why this matters, what goes wrong if left}
2. ...
### Important (fix in next refactor pass)
1. **{short label}** — `file:line` — {specific impact}
2. ...
### Minor (fix opportunistically)
1. **{short label}** — `file:line` — {low-priority note}
2. ...
---
## Positive Patterns Worth Preserving
1. **{pattern name}** — `file:line` — {why it's good, what it enables}
2. ...
---
## Recommended Refactoring Order
Ordered by ROI: highest impact relative to effort first.
| Priority | Change | Files affected | Scope | Rationale |
|----------|--------|---------------|-------|-----------|
| 1 | {change} | {files} | small/medium/large | {why now} |
| 2 | ... | | | |
---
## Known Gaps (out of scope / accepted debt)
1. **{gap}** — {why accepted, what would trigger revisiting it}
After writing the initial report, check for past audit reports in the project's docs/code-quality/ folder:
ls docs/code-quality/*.md 2>/dev/null | sort
If past reports exist:
## Trajectory
| Date | Overall | Notable Changes |
|------|---------|-----------------|
| {date} | {score}/10 | {1-line summary of what changed} |
| {date} | {score}/10 | {1-line summary} |
**Trend:** {Improving / Stable / Declining} — {one sentence explaining the direction}
"8/10 (was 5/10) — AppError now consistently adopted""Regression from {X}/10 → {Y}/10 since {date} audit"If no past reports exist, note in the report header: **Previous score:** N/A (first audit)
This step runs after the initial report is written and saved. Update the saved report file in-place with the trajectory data.
| Score | Meaning |
|---|---|
| 9–10 | Exemplary. Could be used as a teaching reference. |
| 7–8 | Solid. Minor issues, patterns are generally correct. |
| 5–6 | Mixed. Some good patterns but significant gaps. |
| 3–4 | Weak. Systematic issues that will compound as the codebase grows. |
| 1–2 | Critical. Fundamental patterns missing or actively harmful. |
Score based on what you actually find, not what documentation claims. Evidence over assertions. A 5/10 with accurate findings is more useful than a flattering 8/10.
file:line reference. "Error handling could be better" is useless.packages/ structure.packages/ui does not exist. A single-app repo with no multi-surface deployment can score max 6/10 on this check.Provides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub mercurial-weasel/bh-ops-claude-plugins --plugin dev