Skill

fleet-assess

Honest brownfield codebase audit. Decomposes a repo into assessable modules, checks each for real implementation vs stubs/mocks/TODOs, grades test quality, scans infrastructure, and produces a truthful assessment report. Requires fleet-discover manifest.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/fleet:fleet-assess

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

ReadGrepGlobBashAgent

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

You are performing a rigorous, honest audit of a brownfield codebase. Your job is to determine what actually works, what is faked, what is broken, and what is missing. You produce a machine-readable assessment that downstream skills (fleet-specgen, fleet-infra, fleet-sync) consume.

SKILL.md

422 lines · ~5k tokens

Stats

Parent stars0

MaintenanceExcellent

Last CommitMar 29, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Fleet Assess — Honest Brownfield Audit

PREREQUISITES

This skill requires _fleet/manifest.json produced by fleet-discover. If it does not exist, stop immediately and tell the user to run fleet-discover first.

Read _fleet/manifest.json before doing anything else. It contains:

language, framework, packageManager — stack details
modules — discovered directories, entry points, route groups
testFramework — detected test tooling (if any)
ciPipeline — detected CI configuration (if any)
existingSpecs — any planning artifacts already found
fileCount, monorepo, packages — scale indicators

CRITICAL RULES

Never trust status fields. Comments like // DONE, status properties in config files, README claims of coverage — verify everything against actual code.
A stub is not an implementation. If a function returns hardcoded data, has TODO comments, uses mock objects instead of real service calls, or has unused parameters prefixed with _, it is NOT complete. Period.
An empty query result is not a stub. If code makes a real database call or API request that returns empty because no data exists yet, that IS a real implementation.
Log progress regularly. After completing analysis of every 3-5 modules, output a progress update so the user can see what is happening.
Be specific. Never say "partially implemented." Say exactly WHAT is implemented and WHAT is missing, with file paths and line numbers.
Do not hallucinate file contents. If you cannot find a file, say so. If you are unsure about a classification, explain your uncertainty.

ARGUMENTS

Argument	Default	Description
`--modules <list>`	all	Comma-separated module names to audit (subset mode)
`--skip-tests`	false	Skip test quality analysis (faster, less complete)
`--skip-security`	false	Skip security scan phase
`--depth shallow`	`deep`	`shallow` checks file existence and top-level patterns only; `deep` reads function bodies
`--output-dir <path>`	`_fleet/`	Directory for assessment output files

PHASE 1: Module Decomposition

Break the codebase into assessable units. The decomposition strategy depends on what the manifest tells you.

Strategy A: Route/Endpoint Based (web apps, APIs)

If the manifest shows a web framework (Next.js, Express, FastAPI, Rails, etc.):

Each route group or API endpoint group becomes a module
Shared services, utilities, and middleware are separate modules
Database layer (schema, migrations, ORM models) is its own module
Background jobs / workers are separate modules

Strategy B: Package Based (monorepos)

If the manifest shows a monorepo (Turborepo, Nx, Lerna, Cargo workspace):

Each package is a module
Shared packages get individual assessment
App packages are further decomposed using Strategy A

Strategy C: Service Based (microservices)

If the manifest shows multiple services:

Each service is a module
Shared libraries are separate modules
Infrastructure-as-code is its own module

Strategy D: Spec Based (existing planning artifacts)

If existingSpecs in the manifest points to story/requirement files:

Use the spec boundaries as module boundaries where possible
This produces the most useful output for fleet-specgen downstream
Cross-reference spec modules against actual directory structure

Output

Build a module list. For each module record:

id — short kebab-case identifier (e.g., api-auth, db-schema, web-dashboard)
name — human-readable name
paths — list of file globs this module covers
entryPoints — main files (route handler, service export, package index)
specRef — link to existing spec if Strategy D applies, otherwise null

PHASE 2: Per-Module Deep Analysis

For EACH module, perform the following checks. Use Agent tool to parallelize across modules when the codebase is large.

Step A: Implementation Reality Check

For every entry point and key source file in the module:

Does the file exist? Glob for it.
Does it have real implementation? Grep for these patterns:

# Direct stub/mock indicators
TODO:|FIXME:|HACK:|XXX:|PLACEHOLDER
mock|Mock|MOCK|fake|Fake|stub|Stub|placeholder|dummy|simulated|hardcoded
getMock|createMock|buildMock|fake|fixture

# Structural stub indicators
_entityId|_userId|_req|_res|_ctx    (unused params with _ prefix)
void someImport;                    (imported but unused, type-check only)
return \[\]|return \{\}|return null  (empty returns in functions that should return data)
throw new Error\('not implemented   (explicit not-implemented markers)
console\.log\('TODO                 (logged TODOs)

# Commented-out real code
// In production|// TODO: replace|// Will be implemented|// Real implementation

Does it do what it claims? Read the function bodies. A function named createUser that returns { id: 1, name: 'test' } is a stub. A function named createUser that calls db.insert(users).values(...) is real.

Step B: Test Quality Assessment

For each module, find associated test files and grade them:

Grade	Meaning	Detection
`real`	Tests exercise actual business logic with meaningful assertions	Imports source code, calls real functions, asserts on computed output
`structural`	Tests verify files/configs exist but not runtime behavior	`existsSync`, `typeof`, schema shape checks only
`mock-only`	Tests create mock data and assert on that same mock data	Never imports or calls the real module under test
`no-op`	Assertions are trivially true regardless of code	`expect(true).toBe(true)`, `expect(1+1).toBe(2)`
`none`	No test files found for this module	No matching `.test.` or `.spec.` files

Count: total tests, real assertions, skipped tests (describe.skip, it.skip, test.skip), no-op assertions.

Step C: Security Scan

Quick check for common vulnerabilities in the module:

Hardcoded secrets, API keys, tokens (grep for patterns like sk-, key=, secret=, password=)
SQL injection vectors (string concatenation in queries)
Missing auth checks on protected routes
Missing input validation / sanitization
Exposed debug endpoints or verbose error messages
Missing CORS configuration or overly permissive CORS
Unencrypted sensitive data storage

Step D: Classify the Module

Assign ONE classification:

Classification	Meaning
`complete`	All functionality works with real implementations. No stubs. Has real tests.
`mostly-complete`	Core functionality works, but 1-2 minor pieces are missing (e.g., edge case handler, one integration not wired up). List what is missing.
`partial`	Some parts have real implementations, others are stubbed or missing. List what works and what does not.
`stub`	Files exist but implementation is fake — mock data, TODO comments, no real service/DB calls. Looks complete at a glance but is not.
`missing`	Module should exist based on manifest or specs but no source files found.
`broken`	Code exists but does not compile, has runtime errors, or has failing tests that indicate fundamental breakage.

PHASE 3: Infrastructure Assessment

Assess the health of shared infrastructure that is not tied to a single module.

Test Framework

Is a test runner configured? (Vitest, Jest, Pytest, Go test, etc.)
Does the config actually work? Try running <test-command> --help or a dry run.
Are there test utilities, fixtures, or helpers?
Is there a coverage configuration?
Is there an E2E framework? (Playwright, Cypress, Selenium)

CI Pipeline

Does a CI config exist? (.github/workflows, .gitlab-ci.yml, Jenkinsfile, etc.)
What gates are present? (lint, typecheck, test, build, deploy)
What gates are MISSING? (compare against what the stack requires)
Do the CI commands actually match the project? (e.g., CI runs npm test but project uses pnpm)

Dev Experience

Is there a working build command?
Is there a working dev/serve command?
Is there a linter configured? Does it pass?
Is there a formatter configured?
Is there a lockfile? Is it consistent with the package manager?
Are there environment variable templates? (.env.example, .env.template)

Database (if applicable)

Are there migrations? Do they form a coherent sequence?
Is there a seed script?
Are RLS policies or access controls defined?
Is the schema documented or typed? (Prisma schema, Drizzle schema, SQLAlchemy models)

PHASE 4: Assessment Report

Produce TWO output files.

`_fleet/assessment.json`

Full machine-readable assessment following this schema:

{
  "$schema": "fleet-assessment-v1",
  "timestamp": "ISO-8601",
  "repoRoot": "/absolute/path",
  "manifestRef": "_fleet/manifest.json",
  "summary": {
    "totalModules": 0,
    "classifications": { "complete": 0, "mostly-complete": 0, "partial": 0, "stub": 0, "missing": 0, "broken": 0 },
    "honestCompletionRate": 0.0,
    "totalTestFiles": 0,
    "totalTestCases": 0,
    "testQuality": { "real": 0, "structural": 0, "mock-only": 0, "no-op": 0, "none": 0 },
    "securityIssues": 0,
    "infrastructureScore": {
      "testFramework": "configured | partial | missing",
      "ciPipeline": "complete | partial | missing",
      "devExperience": "good | acceptable | poor",
      "database": "healthy | partial | missing | n/a"
    }
  },
  "modules": [{
    "id": "module-id",
    "name": "Human-Readable Name",
    "paths": ["src/modules/foo/**"],
    "entryPoints": ["src/modules/foo/index.ts"],
    "specRef": "path/to/spec.md | null",
    "classification": "complete | mostly-complete | partial | stub | missing | broken",
    "confidence": "high | medium | low",
    "implementation": {
      "realFiles": 0, "stubFiles": 0, "missingFiles": 0,
      "stubs": [{ "file": "path", "line": 42, "pattern": "description", "description": "what it does vs should do" }],
      "todos": [{ "file": "path", "line": 15, "text": "TODO text" }]
    },
    "tests": {
      "grade": "real | structural | mock-only | no-op | none",
      "files": ["path/to/test.ts"],
      "totalCases": 0, "realAssertions": 0, "skippedCases": 0, "noOpAssertions": 0,
      "coverage": "unknown"
    },
    "security": {
      "issues": [{
        "severity": "critical | high | medium | low",
        "type": "hardcoded-secret | sql-injection | missing-auth | missing-validation | exposed-debug | insecure-cors | unencrypted-data",
        "file": "path", "line": 8, "description": "what was found"
      }]
    },
    "notes": "Free-text notes about this module"
  }],
  "infrastructure": {
    "testFramework": {
      "status": "configured | partial | missing", "tool": "name | null",
      "configFile": "path | null", "works": true, "hasE2E": false, "e2eTool": "name | null", "coverageConfigured": false
    },
    "ciPipeline": {
      "status": "complete | partial | missing", "provider": "name | null", "configFile": "path | null",
      "gates": { "lint": true, "typecheck": true, "test": true, "build": true, "deploy": false },
      "missingGates": ["e2e", "security-scan"]
    },
    "devExperience": {
      "status": "good | acceptable | poor",
      "buildWorks": true, "devServerWorks": true, "linterConfigured": true, "linterPasses": false,
      "formatterConfigured": true, "lockfileConsistent": true, "envTemplate": true
    },
    "database": {
      "status": "healthy | partial | missing | n/a",
      "hasMigrations": true, "migrationCount": 0, "hasSeed": false, "hasRLS": false,
      "schemaTyped": true, "orm": "name | null"
    }
  },
  "stubInventory": [{
    "file": "path", "line": 42, "module": "module-id",
    "pattern": "mock | TODO | hardcoded | unused-param | not-implemented",
    "current": "Returns static array of 3 items",
    "expected": "Should query users table with pagination"
  }],
  "recommendations": [{
    "priority": 0,
    "type": "broken-fix | infra | security | stub-upgrade | new-feature | test-gap",
    "module": "module-id", "description": "What needs to happen", "effort": "trivial | moderate | significant"
  }]
}

`_fleet/assessment.md`

Human-readable summary mirroring the JSON data. Must include these sections:

Executive Summary — total modules, classification counts, honest completion rate, test/security stats
Module Assessment — one subsection per module with classification, paths, test grade, what works, what is stubbed (file:line), what is missing, security issues
Infrastructure — test framework status, CI pipeline gates (present and missing), dev experience, database health
Stub Inventory — table with columns: File, Line, Module, Pattern, Current Behavior, Expected Behavior
Security Issues — table with columns: Severity, Type, File, Line, Description
Recommendations — ordered by priority (0=broken first, 5=test gaps last), each with module, description, effort

SCALING STRATEGY

Adapt analysis depth to codebase size (use fileCount from manifest):

Small (< 500 files)

Full deep analysis of every file
Read every test file completely
No parallelization needed

Medium (500 - 5,000 files)

Deep analysis of entry points and key files per module
Sample test files: read all, but only grade assertions in files > 200 lines by sampling first 100 + last 100 lines
Parallelize with Agent tool: 3-5 modules per subagent batch

Large (5,000 - 50,000 files)

Shallow pass first: grep-based pattern detection across all files
Deep dive only on modules flagged by grep (stubs detected, no tests, security patterns)
Parallelize with Agent tool: one subagent per package (monorepo) or per service (microservices)
Sample 30% of test files per module, extrapolate grades

Very Large (50,000+ files)

Package-level or service-level assessment only (do not decompose further)
Grep-based classification: count stub patterns vs real implementation patterns per package
Test assessment by config and coverage reports only (do not read individual test files)
Report confidence as low for all modules and recommend targeted re-assessment
Log a warning: "Codebase exceeds 50K files. Assessment is approximate. Run with --modules to deep-assess specific areas."

PARALLELIZATION

When using Agent tool for subagent analysis:

Each subagent gets 1-5 modules depending on size
Subagent prompt includes: module definition, file globs, classification rubric, stub detection patterns
Subagent returns: per-module classification, stub list, test grade, security issues
Main agent aggregates results, resolves cross-module dependencies, writes final report

PHASE 5: BMAD Reconciliation

If BMAD stories exist in _bmad-output/implementation-artifacts/, reconcile assessment findings against them. This is the critical bridge between "what BMAD planned" and "what the code actually is."

5A: Load All BMAD Stories

Read every .md file in _bmad-output/implementation-artifacts/. For each story, extract:

Story ID and title (from filename and H1)
Status field
Acceptance Criteria (the Given/When/Then list)
Source files referenced in Tasks/Technical Notes
Test Coverage section (if present)

5B: Verify Each AC Against Code

For each acceptance criterion in each BMAD story:

Identify the code that should satisfy this AC — use source files from the story, grep for function names, route paths, or DB queries mentioned in the AC
Check if the code exists and is real:
- Does the file exist?
- Does the function/route/handler exist?
- Is it a stub (matches stub patterns from Phase 2) or real?
Check if a test covers this AC:
- Search for test files that import the relevant source
- Does any test exercise this specific AC's behavior?
Classify the AC:
- verified — code exists, is real, has a meaningful test
- implemented-untested — code exists, is real, but no test covers it
- stubbed — code exists but is a stub/mock/placeholder
- missing — no code found that satisfies this AC

5C: Reconcile Story Status

For each BMAD story, compute the honest status:

AC Results	Correct Status
All ACs `verified`	`complete`
All ACs `verified` or `implemented-untested`	`implemented` (needs tests)
Mix of real and stubbed ACs	`partial`
All ACs `stubbed`	`stub`
Any ACs `missing`	`incomplete`

If the story's current status disagrees with the computed status:

Update the story file — change the Status field to the computed value

Add a reconciliation note — append to Technical Notes:

## Fleet Reconciliation ({date})
- Status changed: {old} → {new}
- ACs verified: {N}/{total}
- ACs needing tests: {list}
- ACs still stubbed: {list}
- ACs missing: {list}

Add test requirement ACs if the story has implemented code with no tests:

{N+1}. Given the implementation of AC {ref}, when tests are run, then all behavior described in AC {ref} is verified by at least one test with meaningful assertions

5D: Identify Unplanned Gaps

After reconciling all BMAD stories, check if the assessment found issues that NO BMAD story covers:

Infrastructure gaps (no BMAD story for missing test framework)
Security findings (no BMAD story for hardcoded secrets)
Code that exists but has no corresponding BMAD story

For each unplanned gap, record it in the assessment output under a new field:

"unplannedGaps": [{
  "type": "infra | security | orphaned-code",
  "description": "What was found",
  "files": ["paths"],
  "recommendation": "What BMAD story should be created"
}]

These will be handled by fleet-specgen, which creates new BMAD stories for them.

5E: Write Reconciliation Report

Save to _fleet/reconciliation.json:

{
  "timestamp": "ISO-8601",
  "stories_checked": 0,
  "stories_accurate": 0,
  "stories_corrected": 0,
  "corrections": [{
    "story": "path/to/story.md",
    "old_status": "complete",
    "new_status": "partial",
    "acs_verified": 3,
    "acs_untested": 2,
    "acs_stubbed": 1,
    "acs_missing": 0,
    "test_acs_added": 2
  }],
  "unplanned_gaps": 0
}

IMPORTANT NOTES

The honestCompletionRate is calculated as: (complete + mostly-complete) / totalModules * 100. This is the number that matters. Do not inflate it.
confidence on each module should be high if you read the actual code, medium if you relied on grep patterns, low if you sampled or estimated.
The recommendations array feeds directly into fleet-specgen. Each recommendation becomes a candidate for BMAD story creation or update. Order them by priority (0 = fix broken things first, 5 = nice-to-have test gaps last).
If --modules is specified, only assess those modules but still produce the full JSON structure (other modules get classification: "not-assessed").
Create the _fleet/ directory if it does not exist.
BMAD reconciliation (Phase 5) MODIFIES files in _bmad-output/implementation-artifacts/. This is intentional — BMAD is the source of truth for specs, and Fleet's job is to keep it honest.

fleet-assess

Invocation

Tool Access

Context Preview

SKILL.md

fleet-assess

Invocation

Tool Access

Context Preview

SKILL.md

Fleet Assess — Honest Brownfield Audit

PREREQUISITES

CRITICAL RULES

ARGUMENTS

PHASE 1: Module Decomposition

Strategy A: Route/Endpoint Based (web apps, APIs)

Strategy B: Package Based (monorepos)

Strategy C: Service Based (microservices)

Strategy D: Spec Based (existing planning artifacts)

Output

PHASE 2: Per-Module Deep Analysis

Step A: Implementation Reality Check

Step B: Test Quality Assessment

Step C: Security Scan

Step D: Classify the Module

PHASE 3: Infrastructure Assessment

Test Framework

CI Pipeline

Dev Experience

Database (if applicable)

PHASE 4: Assessment Report

_fleet/assessment.json

_fleet/assessment.md

SCALING STRATEGY

Small (< 500 files)

Medium (500 - 5,000 files)

Large (5,000 - 50,000 files)

Very Large (50,000+ files)

PARALLELIZATION

PHASE 5: BMAD Reconciliation

5A: Load All BMAD Stories

5B: Verify Each AC Against Code

5C: Reconcile Story Status

5D: Identify Unplanned Gaps

5E: Write Reconciliation Report

IMPORTANT NOTES

Similar Skills

Fleet Assess — Honest Brownfield Audit

PREREQUISITES

CRITICAL RULES

ARGUMENTS

PHASE 1: Module Decomposition

Strategy A: Route/Endpoint Based (web apps, APIs)

Strategy B: Package Based (monorepos)

Strategy C: Service Based (microservices)

Strategy D: Spec Based (existing planning artifacts)

Output

PHASE 2: Per-Module Deep Analysis

Step A: Implementation Reality Check

Step B: Test Quality Assessment

Step C: Security Scan

Step D: Classify the Module

PHASE 3: Infrastructure Assessment

Test Framework

CI Pipeline

Dev Experience

Database (if applicable)

PHASE 4: Assessment Report

_fleet/assessment.json

_fleet/assessment.md

SCALING STRATEGY

Small (< 500 files)

Medium (500 - 5,000 files)

Large (5,000 - 50,000 files)

Very Large (50,000+ files)

PARALLELIZATION

PHASE 5: BMAD Reconciliation

5A: Load All BMAD Stories

5B: Verify Each AC Against Code

5C: Reconcile Story Status

`_fleet/assessment.json`

`_fleet/assessment.md`

`_fleet/assessment.json`

`_fleet/assessment.md`