From codebase-audit
Produce a thorough software quality assessment report for a git repository — covering code architecture, security, database design, observability, testing, frontend quality, deployment, disaster recovery, data privacy (GDPR), dependency management, frontend bundle performance, and CI/CD execution speed, plus summary stats (LOC, test LOC, test coverage, commit activity). Use this skill whenever the user asks for a "quality report", "quality assessment", "code audit", "codebase review", "technical due diligence", "production readiness review", "health check", "grade my codebase", or anything along the lines of "how good is this project", "what are the weak spots", "is this safe to deploy", or "assess/audit this repo". Works on any git repo regardless of primary language (Python, TypeScript/JavaScript, Go, Rust, Java, Ruby, PHP, C#, etc.). Writes the report to `reports/{project}_quality_assessment_{YYYY-MM-DD}.md`.
How this skill is triggered — by the user, by Claude, or both
Slash command
/codebase-audit:reportThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Generate a production-readiness quality assessment for a git repository. The output is a markdown report with graded dimensions, concrete evidence cited to file:line, and an actionable "what to do next" list.
Generate a production-readiness quality assessment for a git repository. The output is a markdown report with graded dimensions, concrete evidence cited to file:line, and an actionable "what to do next" list.
The report is meant to be read top-to-bottom by a technical stakeholder — a maintainer, a reviewer, or a potential contributor — in 15-20 minutes, and walk away with a calibrated picture of the codebase: what's strong, what's weak, what would a reviewer flag in due diligence, and what's worth fixing first.
The value of this report is that every grade and every finding is backed by a concrete pointer into the codebase (app/routers/auth.py:85-88, docker-compose.yml:12, .github/workflows/ci.yml:40). A reviewer should be able to verify any claim in under a minute. If you can't cite evidence, say "not verified" — do not guess.
If you find yourself writing "seems to" or "probably has", that is a signal to go read the file. The difference between a good report and a generic one is whether every claim lands somewhere specific in the codebase.
Follow these steps in order. Do not skip the orient/stats phases — they calibrate the rest.
Identify the project before assessing it. Read these in parallel:
README.md / README.rst — what does the project do?pyproject.toml, package.json, go.mod, Cargo.toml, pom.xml, build.gradle, Gemfile, composer.json, *.csprojCLAUDE.md, AGENTS.md, CONTRIBUTING.md, ARCHITECTURE.md, docs/ — authoritative context the project wrote about itselfDockerfile, docker-compose*.yml, deploy/, .github/workflows/ — deployment and CI shapeFrom this, form a one-paragraph operational picture you can test against as you read code. Example: "FastAPI + React + PostgreSQL chess analyzer deployed to a single Hetzner box via GitHub Actions, Sentry on both ends, single-maintainer open source."
Run the stats script. It produces a snapshot of LOC, test LOC, coverage artifacts, git activity, dependency files, and CI timing — all the numeric inputs for the summary table at the top of the report.
bash "${CLAUDE_PLUGIN_ROOT}/skills/report/scripts/collect_stats.sh" /absolute/path/to/repo
The script writes a machine-readable summary to /tmp/quality-assessment-stats.txt and also prints it. Read the output and keep key numbers handy for the report.
If tokei is not installed, the script will ask whether to install it. Tokei gives accurate LOC with code/comment/blank split across languages; the fallback (git ls-files + wc -l) only gives a rough total. If the user declines, report stats as "approximate".
Test coverage: the script looks for existing coverage artifacts (.coverage, coverage.xml, lcov.info, coverage/coverage-summary.json, htmlcov/, Go coverprofile, etc.). It does not run tests — if no artifact exists, report coverage as "not measured" and recommend running coverage as a follow-up. Do not silently omit the coverage row.
Before grading dimensions, build a mental map of how the system actually runs. Read:
main.py, app.py, server.ts, main.go, etc.)downgrade() / rollback is implemented.env.example / config loader — how secrets and config are supplieddeploy/entrypoint.sh, Procfile, systemd unit, k8s/ manifestsCapture the data flow in 3-5 numbered steps. This becomes section 2 of the report and anchors the rest.
Work through every dimension in references/dimensions.md. For each one:
Grade honestly. Not every project is an A. A codebase that genuinely hasn't implemented backups, or has no dependency automation, or ships 2 MB of unused JS to mobile users, is a B or C in that dimension. The report is most useful when grades are calibrated, not uniformly inflated.
Per-language probes (what to grep for, which config files, which tools are idiomatic) live in references/languages.md. Read only the sections for languages actually present.
Use the exact structure in references/report-template.md. Fill each section with findings from step 4, citing file:line. Do not invent sections the template does not have, and do not skip sections (if a section doesn't apply, say so explicitly with one sentence).
Save to reports/{project-name}_quality_assessment_{YYYY-MM-DD}.md (create the reports/ directory if needed). Use the repo's directory name as {project-name}, lowercased and kebab-cased. Use today's date for {YYYY-MM-DD} so each run produces a timestamped, side-by-side report instead of overwriting the previous one.
At the end of your turn, tell the user:
references/dimensions.md for what to check in each).env checked inondelete, unique/natural-key constraints, deliberate column types, index strategy, migration rollback coveragearia-label, semantic HTML), dead-export detectionbefore_send fingerprinting, slow-query logs, metrics endpointsondelete=CASCADE actually wired to an API actionnpm audit / pip-audit / equivalent, pinned base images in Dockerfilepytest-xdist, vitest --shard, go test -parallel), caching of deps/build artifacts, deploy automationDimensions 12-16 are the ones most commonly missed in ad-hoc code reviews. Treat them as first-class and always include them in the report, even when the answer is "not applicable" (e.g. no frontend → say so in the frontend bundle row).
Use a standard five-tier scale. Attach + or − to a letter for half-steps.
| Grade | Meaning |
|---|---|
| A | Genuinely best-practice. Nothing a reviewer would flag. |
| B | Solid, with known small gaps. "Would ship, would note the gap in the PR." |
| C | Works but has real rough edges. "Ship with ticket to fix." |
| D | Risky. "Don't ship until this is fixed." |
| F | Broken or absent in a way that blocks production use. |
Calibration anchors:
ondelete" is A only if every FK has it. One bare FK → A−. Three or more → B+.Do not grade any dimension you couldn't gather evidence for. Mark it — and explain in one sentence why you couldn't assess it.
Full template: references/report-template.md. At a high level:
app/repositories/query_utils.py:12" beats "shared filter utilities are used consistently".auth.py:85-96 is precise enough to verify.capture_exception() sites across 8,600 LOC", not "Sentry coverage is thin".— with a one-line explanation.packages/*, apps/*, separate package.json trees). Either grade the subprojects separately (sections 3-5 per sub-project) or focus on the primary one and note which you focused on. Do not silently blend numbers from multiple subprojects.— N/A: library does not store user data and keep going.references/languages.md for that language and follow its probes literally.<REDACTED: looks like a real key> and note that the secret was found.SKILL.md — this file, the entry point and workflowreferences/report-template.md — the exact markdown template to fill inreferences/dimensions.md — what to check and how to grade per dimensionreferences/languages.md — per-language probes, idioms, and tool pointersscripts/collect_stats.sh — bash script to gather LOC, coverage, git activity, CI timingRead the reference files when you hit the relevant step — you don't need them loaded upfront.
npx claudepluginhub aimfeld/claude-plugins --plugin codebase-auditProvides CDSS development patterns for drug interaction checking, dose validation, clinical scoring (NEWS2, qSOFA), and alert classification integrated into EMR workflows.