From fleet
Honest brownfield codebase audit. Decomposes a repo into assessable modules, checks each for real implementation vs stubs/mocks/TODOs, grades test quality, scans infrastructure, and produces a truthful assessment report. Requires fleet-discover manifest.
How this skill is triggered — by the user, by Claude, or both
Slash command
/fleet:fleet-assessThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are performing a rigorous, honest audit of a brownfield codebase. Your job is to determine what actually works, what is faked, what is broken, and what is missing. You produce a machine-readable assessment that downstream skills (fleet-specgen, fleet-infra, fleet-sync) consume.
You are performing a rigorous, honest audit of a brownfield codebase. Your job is to determine what actually works, what is faked, what is broken, and what is missing. You produce a machine-readable assessment that downstream skills (fleet-specgen, fleet-infra, fleet-sync) consume.
This skill requires _fleet/manifest.json produced by fleet-discover. If it does not exist, stop immediately and tell the user to run fleet-discover first.
Read _fleet/manifest.json before doing anything else. It contains:
language, framework, packageManager — stack detailsmodules — discovered directories, entry points, route groupstestFramework — detected test tooling (if any)ciPipeline — detected CI configuration (if any)existingSpecs — any planning artifacts already foundfileCount, monorepo, packages — scale indicators// DONE, status properties in config files, README claims of coverage — verify everything against actual code.TODO comments, uses mock objects instead of real service calls, or has unused parameters prefixed with _, it is NOT complete. Period.| Argument | Default | Description |
|---|---|---|
--modules <list> | all | Comma-separated module names to audit (subset mode) |
--skip-tests | false | Skip test quality analysis (faster, less complete) |
--skip-security | false | Skip security scan phase |
--depth shallow | deep | shallow checks file existence and top-level patterns only; deep reads function bodies |
--output-dir <path> | _fleet/ | Directory for assessment output files |
Break the codebase into assessable units. The decomposition strategy depends on what the manifest tells you.
If the manifest shows a web framework (Next.js, Express, FastAPI, Rails, etc.):
If the manifest shows a monorepo (Turborepo, Nx, Lerna, Cargo workspace):
If the manifest shows multiple services:
If existingSpecs in the manifest points to story/requirement files:
Build a module list. For each module record:
id — short kebab-case identifier (e.g., api-auth, db-schema, web-dashboard)name — human-readable namepaths — list of file globs this module coversentryPoints — main files (route handler, service export, package index)specRef — link to existing spec if Strategy D applies, otherwise nullFor EACH module, perform the following checks. Use Agent tool to parallelize across modules when the codebase is large.
For every entry point and key source file in the module:
# Direct stub/mock indicators
TODO:|FIXME:|HACK:|XXX:|PLACEHOLDER
mock|Mock|MOCK|fake|Fake|stub|Stub|placeholder|dummy|simulated|hardcoded
getMock|createMock|buildMock|fake|fixture
# Structural stub indicators
_entityId|_userId|_req|_res|_ctx (unused params with _ prefix)
void someImport; (imported but unused, type-check only)
return \[\]|return \{\}|return null (empty returns in functions that should return data)
throw new Error\('not implemented (explicit not-implemented markers)
console\.log\('TODO (logged TODOs)
# Commented-out real code
// In production|// TODO: replace|// Will be implemented|// Real implementation
createUser that returns { id: 1, name: 'test' } is a stub. A function named createUser that calls db.insert(users).values(...) is real.For each module, find associated test files and grade them:
| Grade | Meaning | Detection |
|---|---|---|
real | Tests exercise actual business logic with meaningful assertions | Imports source code, calls real functions, asserts on computed output |
structural | Tests verify files/configs exist but not runtime behavior | existsSync, typeof, schema shape checks only |
mock-only | Tests create mock data and assert on that same mock data | Never imports or calls the real module under test |
no-op | Assertions are trivially true regardless of code | expect(true).toBe(true), expect(1+1).toBe(2) |
none | No test files found for this module | No matching *.test.* or *.spec.* files |
Count: total tests, real assertions, skipped tests (describe.skip, it.skip, test.skip), no-op assertions.
Quick check for common vulnerabilities in the module:
sk-, key=, secret=, password=)Assign ONE classification:
| Classification | Meaning |
|---|---|
complete | All functionality works with real implementations. No stubs. Has real tests. |
mostly-complete | Core functionality works, but 1-2 minor pieces are missing (e.g., edge case handler, one integration not wired up). List what is missing. |
partial | Some parts have real implementations, others are stubbed or missing. List what works and what does not. |
stub | Files exist but implementation is fake — mock data, TODO comments, no real service/DB calls. Looks complete at a glance but is not. |
missing | Module should exist based on manifest or specs but no source files found. |
broken | Code exists but does not compile, has runtime errors, or has failing tests that indicate fundamental breakage. |
Assess the health of shared infrastructure that is not tied to a single module.
<test-command> --help or a dry run.npm test but project uses pnpm)Produce TWO output files.
_fleet/assessment.jsonFull machine-readable assessment following this schema:
{
"$schema": "fleet-assessment-v1",
"timestamp": "ISO-8601",
"repoRoot": "/absolute/path",
"manifestRef": "_fleet/manifest.json",
"summary": {
"totalModules": 0,
"classifications": { "complete": 0, "mostly-complete": 0, "partial": 0, "stub": 0, "missing": 0, "broken": 0 },
"honestCompletionRate": 0.0,
"totalTestFiles": 0,
"totalTestCases": 0,
"testQuality": { "real": 0, "structural": 0, "mock-only": 0, "no-op": 0, "none": 0 },
"securityIssues": 0,
"infrastructureScore": {
"testFramework": "configured | partial | missing",
"ciPipeline": "complete | partial | missing",
"devExperience": "good | acceptable | poor",
"database": "healthy | partial | missing | n/a"
}
},
"modules": [{
"id": "module-id",
"name": "Human-Readable Name",
"paths": ["src/modules/foo/**"],
"entryPoints": ["src/modules/foo/index.ts"],
"specRef": "path/to/spec.md | null",
"classification": "complete | mostly-complete | partial | stub | missing | broken",
"confidence": "high | medium | low",
"implementation": {
"realFiles": 0, "stubFiles": 0, "missingFiles": 0,
"stubs": [{ "file": "path", "line": 42, "pattern": "description", "description": "what it does vs should do" }],
"todos": [{ "file": "path", "line": 15, "text": "TODO text" }]
},
"tests": {
"grade": "real | structural | mock-only | no-op | none",
"files": ["path/to/test.ts"],
"totalCases": 0, "realAssertions": 0, "skippedCases": 0, "noOpAssertions": 0,
"coverage": "unknown"
},
"security": {
"issues": [{
"severity": "critical | high | medium | low",
"type": "hardcoded-secret | sql-injection | missing-auth | missing-validation | exposed-debug | insecure-cors | unencrypted-data",
"file": "path", "line": 8, "description": "what was found"
}]
},
"notes": "Free-text notes about this module"
}],
"infrastructure": {
"testFramework": {
"status": "configured | partial | missing", "tool": "name | null",
"configFile": "path | null", "works": true, "hasE2E": false, "e2eTool": "name | null", "coverageConfigured": false
},
"ciPipeline": {
"status": "complete | partial | missing", "provider": "name | null", "configFile": "path | null",
"gates": { "lint": true, "typecheck": true, "test": true, "build": true, "deploy": false },
"missingGates": ["e2e", "security-scan"]
},
"devExperience": {
"status": "good | acceptable | poor",
"buildWorks": true, "devServerWorks": true, "linterConfigured": true, "linterPasses": false,
"formatterConfigured": true, "lockfileConsistent": true, "envTemplate": true
},
"database": {
"status": "healthy | partial | missing | n/a",
"hasMigrations": true, "migrationCount": 0, "hasSeed": false, "hasRLS": false,
"schemaTyped": true, "orm": "name | null"
}
},
"stubInventory": [{
"file": "path", "line": 42, "module": "module-id",
"pattern": "mock | TODO | hardcoded | unused-param | not-implemented",
"current": "Returns static array of 3 items",
"expected": "Should query users table with pagination"
}],
"recommendations": [{
"priority": 0,
"type": "broken-fix | infra | security | stub-upgrade | new-feature | test-gap",
"module": "module-id", "description": "What needs to happen", "effort": "trivial | moderate | significant"
}]
}
_fleet/assessment.mdHuman-readable summary mirroring the JSON data. Must include these sections:
Adapt analysis depth to codebase size (use fileCount from manifest):
low for all modules and recommend targeted re-assessmentWhen using Agent tool for subagent analysis:
If BMAD stories exist in _bmad-output/implementation-artifacts/, reconcile assessment findings against them. This is the critical bridge between "what BMAD planned" and "what the code actually is."
Read every .md file in _bmad-output/implementation-artifacts/. For each story, extract:
For each acceptance criterion in each BMAD story:
verified — code exists, is real, has a meaningful testimplemented-untested — code exists, is real, but no test covers itstubbed — code exists but is a stub/mock/placeholdermissing — no code found that satisfies this ACFor each BMAD story, compute the honest status:
| AC Results | Correct Status |
|---|---|
All ACs verified | complete |
All ACs verified or implemented-untested | implemented (needs tests) |
| Mix of real and stubbed ACs | partial |
All ACs stubbed | stub |
Any ACs missing | incomplete |
If the story's current status disagrees with the computed status:
## Fleet Reconciliation ({date})
- Status changed: {old} → {new}
- ACs verified: {N}/{total}
- ACs needing tests: {list}
- ACs still stubbed: {list}
- ACs missing: {list}
{N+1}. Given the implementation of AC {ref}, when tests are run, then all behavior described in AC {ref} is verified by at least one test with meaningful assertions
After reconciling all BMAD stories, check if the assessment found issues that NO BMAD story covers:
For each unplanned gap, record it in the assessment output under a new field:
"unplannedGaps": [{
"type": "infra | security | orphaned-code",
"description": "What was found",
"files": ["paths"],
"recommendation": "What BMAD story should be created"
}]
These will be handled by fleet-specgen, which creates new BMAD stories for them.
Save to _fleet/reconciliation.json:
{
"timestamp": "ISO-8601",
"stories_checked": 0,
"stories_accurate": 0,
"stories_corrected": 0,
"corrections": [{
"story": "path/to/story.md",
"old_status": "complete",
"new_status": "partial",
"acs_verified": 3,
"acs_untested": 2,
"acs_stubbed": 1,
"acs_missing": 0,
"test_acs_added": 2
}],
"unplanned_gaps": 0
}
honestCompletionRate is calculated as: (complete + mostly-complete) / totalModules * 100. This is the number that matters. Do not inflate it.confidence on each module should be high if you read the actual code, medium if you relied on grep patterns, low if you sampled or estimated.recommendations array feeds directly into fleet-specgen. Each recommendation becomes a candidate for BMAD story creation or update. Order them by priority (0 = fix broken things first, 5 = nice-to-have test gaps last).--modules is specified, only assess those modules but still produce the full JSON structure (other modules get classification: "not-assessed")._fleet/ directory if it does not exist._bmad-output/implementation-artifacts/. This is intentional — BMAD is the source of truth for specs, and Fleet's job is to keep it honest.npx claudepluginhub g6xai/claude-plugins --plugin fleetRuns mechanical checks (build, typecheck, lint, tests, secrets scan) then dispatches specialist reviewers and produces a scored codebase health report. Use for code quality, security, or performance audits.
Runs static tools like tsc, semgrep, knip to analyze codebase for dead code, test quality, duplicates, complexity, security, architecture. Full TS/JS support; limited Python/Go/Rust. Stores structured reports.
Runs codebase audits (health, evaluation, documentation) with parallel agents, producing intake docs for a pipeline run.