From joesys-skills
Use when the user invokes /codebase-audit to run a language-agnostic codebase quality audit measuring up to 12 quality criteria + development velocity with industry benchmarks, grading, and actionable recommendations.
How this skill is triggered — by the user, by Claude, or both
Slash command
/joesys-skills:codebase-auditThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Run a comprehensive, language-agnostic codebase quality audit. Measures up to 12 core quality criteria + development velocity across 6 parallel collection agents, displays graded metrics on console, and optionally writes a full analysis report with industry benchmarks and actionable recommendations.
benchmarks/cpp.mdbenchmarks/csharp.mdbenchmarks/gdscript.mdbenchmarks/general.mdbenchmarks/go.mdbenchmarks/javascript.mdbenchmarks/python.mdbenchmarks/rust.mdbenchmarks/typescript.mdhelpers/compute_churn.pyhelpers/compute_complexity.pyhelpers/compute_structure.pyhelpers/test_compute_churn.pyhelpers/test_compute_complexity.pyhelpers/test_compute_structure.pyprinciples/consistency.mdprinciples/correctness.mdprinciples/evolvability.mdprinciples/maintainability.mdprinciples/modularity.mdRun a comprehensive, language-agnostic codebase quality audit. Measures up to 12 core quality criteria + development velocity across 6 parallel collection agents, displays graded metrics on console, and optionally writes a full analysis report with industry benchmarks and actionable recommendations.
This skill MUST NOT:
--fix, linter --fix, code generators) as part of measurement. Read-only invocations only. If a tool only runs in fix mode, skip it./codebase-audit metrics, live commands need approval.This skill uses progressive disclosure — read reference files only when needed:
| File | Contents | When to read |
|---|---|---|
references/agent-prompts.md | Full prompt templates for all 6 collection agents + Phase 4 author agent | Before dispatching agents in Phase 1 or Phase 4 |
references/output-schemas.md | metrics.json schema, metrics.md template, codebase-audit.md preferences template, execution flows | Before writing output files in Phase 5 |
references/detection-defaults.md | Language marker files, language defaults, path auto-detection, polyglot rules, config file format | During Phase 0 detection |
Parse the user's /codebase-audit arguments:
| Invocation | Mode |
|---|---|
/codebase-audit | Full pipeline (all 12 criteria + velocity) |
/codebase-audit metrics | Collect + display only, write metrics.json + metrics.md |
/codebase-audit analysis | Re-analyze from most recent metrics.json |
/codebase-audit delta | Compare two most recent audits |
/codebase-audit maintainability performance | Only specified criteria |
/codebase-audit velocity | Just development velocity |
/codebase-audit --static-only | No live commands (no test run, no dep audit) |
metrics, analysis, delta--static-only — skip all live commands| Argument | Criterion | Category |
|---|---|---|
maintainability | 1. Maintainability | Core |
evolvability | 2. Evolvability | Core |
correctness | 3. Correctness | Core |
testability | 4. Testability | Core |
reliability | 5. Reliability | Core |
performance | 6. Performance | Core |
readability | 7. Readability | Core |
modularity | 8. Modularity | Core |
consistency | 9. Consistency | Core |
operability | 10. Operability | Core |
security | 11. Security | Core |
story-readability | 12. Story Readability | Core |
velocity | 13. Development Velocity | Extended |
Read references/detection-defaults.md for language marker files, language defaults, path detection rules, and config file format.
full, metrics, analysis, delta, or scopedshared/skill-context.md for the full protocol. Load .claude/skill-context/preferences.md (shared) and .claude/skill-context/codebase-audit.md (skill-specific). If no shared preferences exist, invoke /preferences (streamlined mode). Shared preferences supply project phase, team size, and business priority — Phase 3 will skip questions already answered here..claude/audit.yaml if it exists (all fields optional)shared/tooling-registry.md and per-language profiles from shared/tooling/. Classify tools as available, configured-but-unavailable, or absent. Build gap recommendations for absent tools.large tier at >1000 source files. Override via .claude/skill-context/codebase-audit.md key large_tier_threshold_files. Large tier activates Phase 1 module decomposition (§ Large Repo Decomposition) and Phase 2 heat-map-driven deep dive (§ Heat-Map-Driven Deep Dive). Below threshold: current whole-repo behavior.A project context block passed to all agents:
Language: {primary} (+{additional})
Source paths: {paths}
Test paths: {paths}
Exclude: {patterns}
Test runner: {runner}
Domain: {summary}
Engine: {if detected}
If --static-only was passed, skip tool execution in Phase 1 but still detect and classify tools.
MUST spawn 6 measurement agents in parallel via the Agent tool — all 6 in a single response. Each uses model: "opus". Sequential dispatch is a defect. Read references/agent-prompts.md for the full prompt template for each agent.
| # | Agent | Key Metrics | Helper Script |
|---|---|---|---|
| 1 | Structural | LOC, file/function lengths, nesting, comment density | helpers/compute_structure.py |
| 2 | Quality | Cyclomatic complexity, naming, magic numbers, duplication, secrets | helpers/compute_complexity.py |
| 3 | Architecture | Coupling, circular deps, CI/CD, dependency health, tooling adoption | — (Grep/Read) |
| 4 | Git/Velocity | Churn, commit frequency, bus factor, knowledge concentration | helpers/compute_churn.py |
| 5 | Performance | Algorithm issues, N+1, blocking I/O, memory leaks | — (Grep/Read) |
| 6 | Tests | Pass rate, test ratio, assertion density, test quality | Test runner (if approved) |
For scoped criteria, launch only the required agents:
| Criterion | Required Agents |
|---|---|
| Maintainability | Structural, Quality |
| Evolvability | Structural, Architecture |
| Correctness | Tests, Structural |
| Testability | Tests, Structural, Architecture |
| Reliability | Architecture, Structural |
| Performance | Performance, Architecture |
| Readability | Structural, Quality |
| Modularity | Architecture |
| Consistency | Quality, Architecture |
| Operability | Architecture, Structural |
| Security | Architecture, Quality |
| Story Readability | Structural, Quality |
| Velocity | Git/Velocity |
MUST present all live commands for approval before dispatching agents:
The following live commands will be executed during collection:
{test_runner}(Tests agent){audit_command}(Architecture agent){tool_command}(Tooling — {tool_name})Options: Run all | Static only | Select
Read-only commands (helper scripts, git log, Glob/Grep/Read, tool detection) do not need approval.
Activates when Phase 0 step 12 classified the repo as large tier.
When the repo exceeds the large-tier threshold, the three qualitative agents (Architecture, Performance, Security) switch from whole-repo to per-module dispatch. Statistical agents (Structural, Quality, Git/Velocity, Tests) continue to run whole-repo — their helper scripts aggregate without reading files, so repo size doesn't degrade them.
Step 1 — Module detection. Treat each top-level directory under the detected source paths as a module. Example: src/api/, src/payment/, src/ui/, src/worker/ → four modules.
misc module to avoid agent spam.src/core/ → src/core/auth/, src/core/data/).Step 2 — Per-module dispatch. For each module, dispatch one Architecture + one Performance + one Security agent in parallel. MUST fire all modules × 3 agents in the same parallel batch, alongside the 3 statistical agents that run whole-repo.
Each qualitative agent receives only its module's files plus the shared project context block.
Step 3 — Roll-up. Per-module findings carry a module tag so the heat map and console display can reference them. The Phase 2 grade for Architecture/Performance/Security is the weighted average across modules (weighted by source file count).
This is the "structural breadth" half of large-tier analysis. The "risk depth" half runs in Phase 2 (see § Heat-Map-Driven Deep Dive).
Collect structured JSON from each agent. For each criterion, compute a grade using the principle file rubric + benchmark data.
Audit Confidence Model: Each criterion gets a confidence level (high, medium, low). Append ~ to grades with low confidence (e.g., "B~"). Overall confidence = lowest among all criteria.
Tooling Impact on Grades:
| Criterion | Positive Signal | Negative Signal |
|---|---|---|
| Security | Scanner present + clean | No scanner, or vulnerabilities found |
| Consistency | Formatter + linter clean | Violations, or no formatter/linter |
| Operability | Analysis tooling present, CI-integrated | No tooling at all |
| Maintainability | Static analyzer clean | Analyzer found issues |
| Correctness | Type checker clean | Type errors found |
Cross-reference complexity (Quality agent) with churn (Git/Velocity agent):
High Churn
│
┌──────────────┼──────────────┐
│ Refactor │ Danger Zone │
│ candidates │ (act now) │
───┼──────────────┼──────────────┼─── High Complexity
│ Stable │ Monitor │
│ (leave it) │ (watch) │
└──────────────┼──────────────┘
Low Churn
"Danger Zone" files (high complexity + high churn) MUST be named explicitly.
Activates when Phase 0 step 12 classified the repo as large tier.
Runs after the heat map is computed, in addition to the Phase 1 per-module dispatch. Module decomposition gave breadth — every top-level dir got a reviewer. Heat-map deep dive gives depth on actual risk.
Risk clusters to deep-dive:
Dispatch. For each risk cluster, dispatch one Architecture + one Performance + one Security agent in parallel. Each agent receives:
Deep-dive findings merge into their criteria grades. Mark each deep-dive finding with source: heat-map-deep-dive so the methodology section can cite the two-pass structure.
Skip condition: if the heat map is clean (no Danger Zone files) and every module graded B+ or above in Phase 1, skip this step — no risk to dive into.
| Grade | Meaning |
|---|---|
| A+ | Exceeds industry best practice |
| A | Meets best practice |
| B | Acceptable, minor improvements possible |
| C | Below average, attention needed |
| D | Significant issues, action required |
| F | Critical deficiencies |
Grading is relative to resolved benchmarks (language-specific → general fallback).
Print a summary table directly in the conversation:
╔══════════════════════════════════════════════════════════════╗
║ CODEBASE AUDIT — {Project Name} ║
║ {Domain Summary} · {Language} · {Date} ║
╠══════════════════════════════════════════════════════════════╣
║ Overall Grade: {GRADE} (confidence: {CONFIDENCE}) ║
╠══════════════════════════════════════════════════════════════╣
║ # Criterion Grade Key Metric Benchmark ║
║ ── ──────────────── ────── ────────────────── ─────────── ║
║ 1 Maintainability B CC avg: 8.2 ≤ 10 ║
║ 2 Evolvability B+ Fan-out avg: 3.1 ≤ 5 ║
║ ... ║
║ 12 Story Readability B+ Narr: 8, Chunk: 6 ≥ 7 avg ║
║ ── ──────────────── ────── ────────────────── ─────────── ║
║ 13 Velocity — +2.1k lines/30d — ║
╠══════════════════════════════════════════════════════════════╣
║ Top Risk: {criterion} ({grade}) — {reason} ║
║ Top Strength: {criterion} ({grade}) — {reason} ║
║ Danger Zone: {file} (CC:{N}, {N} changes) ║
╚══════════════════════════════════════════════════════════════╝
Dynamically generated — only measured criteria appear. Failed/skipped agents show "—" with a note.
After displaying the table:
Metrics collected. What would you like to do?
- Write both — metrics.json + metrics.md + full analysis
- Metrics only — write metrics.json + metrics.md (numbers, no commentary)
- Done — just the console display, no files
Routing rules:
/codebase-audit metrics → skip gate, write metrics files directly/codebase-audit analysis → skip Phase 1, load most recent metrics.json, proceed to Phase 3Gathers context the code alone can't reveal. Uses the shared preferences system (shared/skill-context.md) to avoid re-asking questions.
.claude/skill-context/preferences.md) — loaded in Phase 0 step 2. Contains project phase, team size, business priority..claude/skill-context/codebase-audit.md) — deployment cadence, known trade-offs.docs/reports/codebase-audit/project-context.md) — if this exists but no shared preferences file does, migrate its contents into the shared system.Check what's already known from shared preferences. MUST only ask questions whose answers are not already captured:
| Question | Skip if already in... |
|---|---|
| Project phase | shared preferences → "Project phase" |
| Team size | shared preferences → "Team size" |
| Deployment cadence | shared preferences → "Deployment cadence" or audit-specific preferences |
| Business priority | shared preferences → "Business priority" |
| Known trade-offs | audit-specific preferences → "Known trade-offs" |
If shared preferences exist and cover project phase, team size, and business priority, the only new questions are deployment cadence (if missing), known trade-offs, and informed questions based on Phase 1 findings (e.g., "I noticed zero tests — intentional for now?").
If no shared preferences exist at all, /preferences was already invoked in Phase 0 step 2 — those answers are now available. Ask only the audit-specific questions: deployment cadence, known trade-offs, and informed questions.
Save audit-specific answers to .claude/skill-context/codebase-audit.md.
Check for existing audit-specific preferences at .claude/skill-context/codebase-audit.md. If found, present the combined profile (shared + audit-specific) and ask if anything has changed. Quick on repeat audits.
If docs/reports/codebase-audit/project-context.md exists but .claude/skill-context/preferences.md does not:
.claude/skill-context/preferences.md.claude/skill-context/codebase-audit.md| User Context | Analysis Effect |
|---|---|
| Solo + Prototype | Lighter on process, heavier on "what to invest in first" |
| Team of 10 + Mature | Heavier on consistency, modularity, onboarding friction |
| "Speed to market" priority | Recommendations framed as "do this now" vs. "before scaling" |
| "Low test coverage intentional" | Testability acknowledges trade-off rather than flagging as surprise |
A single author agent writes the full analysis in one pass. MUST use model: "opus". Read references/agent-prompts.md for the full author agent prompt.
The author receives: assembled metrics JSON, project context, user context, risk heat map, and previous audit data (if any).
The author assigns a priority rank (1–12) and weight (High/Medium/Low) to each criterion based on language + domain expertise. This affects priority order, overall grade, analysis depth, and recommended actions. Users can override via criteria_priority in audit.yaml.
Read references/output-schemas.md for the full schemas and templates.
| File | Content | When written |
|---|---|---|
metrics.json | Machine-readable metrics with grades, benchmarks, methodology | Always (options 1 & 2) |
metrics.md | Human-readable metrics table | Always (options 1 & 2) |
analysis.md | Full qualitative report per templates/analysis-template.md | Option 1 only |
.claude/skill-context/codebase-audit.md | Audit-specific preferences (trade-offs, cadence, history) | Updated each audit |
Output directory: docs/reports/codebase-audit/YYYYMMDD/
Remove temp files. Report output paths:
Audit complete.
- Metrics:
docs/reports/codebase-audit/{DATE}/metrics.md- Analysis:
docs/reports/codebase-audit/{DATE}/analysis.md- Overall grade: {GRADE}
Notify completion (cross-platform):
if command -v powershell.exe &>/dev/null; then
powershell.exe -c "[Console]::Beep(800, 300)"
elif command -v afplay &>/dev/null; then
afplay /System/Library/Sounds/Glass.aiff &
elif command -v paplay &>/dev/null; then
paplay /usr/share/sounds/freedesktop/stereo/complete.oga &
else
printf '\a'
fi
| Situation | Behavior |
|---|---|
| 1–2 agents fail or time out | Proceed with available data. Note missing agents. Offer to retry. |
| All agents fail | Report failure. Suggest retrying or narrowing scope. |
| No git history | Git/Velocity agent skips. Churn/bus factor marked "No git history." |
| No test runner detected | Tests agent does static analysis only. |
| Helper script fails | Agent falls back to qualitative-only. Metrics marked "Not measured." |
| No internet (WebSearch unavailable) | Use cached benchmarks. Note "cached benchmarks only" in methodology. |
| Unknown language | Use general benchmarks. Extension-count fallback. |
| Massive repo (large tier) | Phase 1 module decomposition (per top-level dir) + Phase 2 heat-map-driven deep dive fire automatically. See § Large Repo Decomposition and § Heat-Map-Driven Deep Dive. |
No .claude/audit.yaml | Fully auto-detected. Note in methodology. |
| Python not available | Qualitative-only for helper-dependent metrics. |
| No previous audit for delta | "Need at least 2 audits for delta comparison." |
| Live commands declined | Static analysis fallback. Mark as "Skipped (live commands declined)." |
| No static analysis tools detected | Gap recommendations included. Criteria graded without tool input. |
| Tool configured but not installed | Graded as absent. Config noted in analysis. |
| Tool execution fails | Skip tool, proceed with remaining tools. Note failure. |
--static-only with tools detected | Tools detected and classified but not executed. |
| Error | Behavior |
|---|---|
| Invalid criterion name | Print valid names, stop |
.claude/audit.yaml is malformed | Report parse error, proceed with auto-detection |
| No source files found | "No source files found in detected paths. Check project structure." |
metrics.json not found for analysis mode | "No previous metrics found. Run /codebase-audit first." |
<2 metrics.json for delta mode | "Need at least 2 audits for delta comparison. Found {N}." |
| Output directory creation fails | Report error, suggest alternative path |
| Agent returns malformed JSON | Use what's parseable, note the issue |
| Tool binary not found | Classify as configured-but-unavailable, skip, continue |
| Tool output unparseable | Report raw summary, skip structured parsing, continue |
| Tool timeout | Kill process, skip tool, continue with remaining tools |
Guides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub joesys/joesys-skills --plugin joesys-skills