From foundry
Relentless spec-to-code verification with fresh eyes. Reads the spec line by line, reads the actual code, and does not stop until every requirement is provably implemented — not just present, but correct. Catches intent gaps, systemic issues, and spec drift that mechanical audits miss.
How this skill is triggered — by the user, by Claude, or both
Slash command
/foundry:proveopusThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **Foundry Integration:** This skill is used as the ASSAY phase (Phase F4) in `/foundry`. When invoked standalone, it works as before. When invoked by Foundry, output goes to `foundry/verdicts.json` and defects feed the GRIND loop. The assayer agent (`agents/assayer.md`) wraps this methodology for Foundry's standalone agent pattern.
Foundry Integration: This skill is used as the ASSAY phase (Phase F4) in
/foundry. When invoked standalone, it works as before. When invoked by Foundry, output goes tofoundry/verdicts.jsonand defects feed the GRIND loop. The assayer agent (agents/assayer.md) wraps this methodology for Foundry's standalone agent pattern.
Read the spec BEFORE the code. Form expectations of what the implementation SHOULD look like based on the spec alone. Then read the code and compare. The order is always: Spec -> Expectations -> Code -> Verdict. Never Code -> Spec.
Invocation: /critic <spec_path> or /critic <spec_path> --focus US-1,US-3
Your default assumption is that the code is broken. You are not here to confirm it works — you are here to find where it doesn't. If you find zero non-VERIFIED items, you are almost certainly wrong. Go back and read the hardest functions again.
Read the spec BEFORE reading any code. This prevents rationalization bias.
Read the spec fresh — load the file. Do NOT read source code yet.
Extract EVERY verifiable requirement — not summaries. Every single thing the spec says the system should do:
Extract implicit requirements — things the spec clearly implies:
Number every item — VC-1, VC-2, ... The Critic does not stop until every
item has a verdict.
Write your EXPECTATION per item — based on spec text alone, what function/ endpoint/component should exist? What should it do? What inputs/outputs? Example:
Derive OBSERVABLE TRUTHS per item (3-5 each) — goal-backward from a USER's perspective. Not "handler exists" but "a user can do X and see Y." Example:
Observable truths are harder to rubber-stamp than code existence checks. They require reading actual logic, not just seeing an import.
For EACH checklist item, in order:
Locate the implementation — grep, glob, read. If Serena MCP is available,
use find_symbol / find_referencing_symbols for deterministic wiring checks.
Read the actual function body — not the signature, not the file name. THE BODY.
Compare against your expectation — mismatches are findings even if the code "works." Trust pre-code expectations over post-code rationalizations.
Mental execution — trace concrete inputs through the function line by line.
Then try a bad input. Bug Hunter's Checklist: see rules/audit-reference.md.
Verdict — one of (defined in rules/audit-reference.md):
Evidence required for each verdict: spec text (quoted), your pre-code expectation, file:line, what the code actually does, the gap.
Do not batch-verify. Each requirement gets individual verification with its own evidence. Do not stop or summarize early — verify EVERY item.
For each major feature, enumerate reasonable scenarios a real user would expect:
Document as "Scenario Coverage" in the report:
After verifying what the spec requires, flip the question: what code exists that the spec does NOT justify? This is the surgeon's eye — ruthless identification of code that should be cut.
For each file touched by the implementation:
List every function/type/route in the file
For each one, find its spec justification — which VC-N item requires it?
No justification = DISPLACED — it's either:
find_referencing_symbols)Report as DX-N findings alongside CR-N findings:
old_auth_handler in auth.go — superseded by new middleware, 0 referencesLegacyUserType in models.go — old type, new User type replaces itutils/format_date.go — entire file unused, no importsVerdict additions:
In foundry PROVE/ASSAY mode: DX-N findings become defects with fix direction "DELETE — no spec justification, N references." GRIND teammates remove the dead code as part of their fix cycle.
The surgeon's rule: If you can't point to a spec requirement that needs this code, it shouldn't exist. New features should REPLACE old code, not pile on top.
Look across ALL non-VERIFIED items for systemic issues:
Read audit reports AFTER completing your own verification (fresh eyes first):
Create quality_reports/critic-{timestamp}.md. Required sections:
Standalone (/critic): present report. Done.
Foundry ASSAY phase (Phase F4): return report path, finding counts, verification %
via foundry_add_verdict. Four parallel ASSAY agents each verify a domain slice with
effort: max. SP-N patterns become single fix items (fix root cause, not instances).
HOLLOW verdicts are highest priority. Fix direction for HOLLOW/PARTIAL must be "FILL
OUT" — stubs exist because something belongs there.
After generating the report, if the codsworth MCP server is available:
validate_report with schema_name: "critic" on the report file to validate
the appended JSON block against the built-in schema.verify_citations with the spec path and report path to verify traceability —
every spec requirement should have a verdict, every non-VERIFIED verdict should cite
spec text.These are advisory — warn on failures but do not block the report.
Critic runs every INSPECT→GRIND iteration. Each time: re-read the spec fresh from disk, rebuild the full checklist, re-verify every item (even previously VERIFIED — regressions happen). THIN counts as non-verified. The foundry loop continues until everything is VERIFIED or max cycles are reached.
Recommended effort: max (Opus only). Exhaustive spec-to-code verification demands
maximum reasoning depth. When building API requests, use effort: "max". On Sonnet,
fall back to effort: "high".
When the spec is loaded as a document source, enable citations so every verdict traces back to exact spec text:
{"type": "document", "source": {...}, "citations": {"enabled": True}}
Every VERIFIED/HOLLOW/PARTIAL/MISSING/WRONG verdict MUST cite the specific spec text it
verifies against. The cited_text does not count toward output tokens.
Important: Citations cannot be combined with structured JSON output (json_schema
format). When citations are enabled, use the markdown report format with the JSON block
appended separately (not as the response format).
Graceful degradation: If the API does not support citations (e.g., older model versions), fall back to manual spec references (section/line numbers). The verdict quality is the same — citations just make traceability automatic.
When outputting verdicts (especially in foundry ASSAY mode or CI), append a JSON block
at the end of the markdown report for machine-parseable consumption. This JSON feeds
the foundry_add_verdict MCP tool for defect tracking.
{
"type": "object",
"properties": {
"findings": {
"type": "array",
"items": {
"type": "object",
"properties": {
"id": {"type": "string", "description": "Finding ID (CR-N, SP-N)"},
"severity": {"type": "string", "enum": ["critical", "high", "medium", "low"]},
"category": {"type": "string", "description": "missing|hollow|partial|thin|letter-only|wrong|systemic"},
"file": {"type": "string", "description": "Primary file path"},
"line": {"type": "integer", "description": "Line number (if applicable)"},
"description": {"type": "string", "description": "What's wrong, with spec text quoted"},
"spec_reference": {"type": "string", "description": "VC-N item or spec section cited"},
"suggested_fix": {"type": "string", "description": "Concrete fix direction"}
},
"required": ["id", "severity", "category", "file", "description"]
}
},
"summary": {
"type": "object",
"properties": {
"total": {"type": "integer"},
"by_severity": {
"type": "object",
"properties": {
"critical": {"type": "integer"},
"high": {"type": "integer"},
"medium": {"type": "integer"},
"low": {"type": "integer"}
}
},
"verdict": {"type": "string", "enum": ["PASS", "WARN", "FAIL"]}
}
}
}
}
Verdict rules:
Every verdict MUST cite the exact spec section it verifies against. Use the format
[SPEC:section_id] to create traceable links from code back to requirements.
Format: [SPEC:US-1.AC-2], [SPEC:FR-3], [SPEC:Section 4.2]
Example:
VC-7: "Users can filter credentials by type" [SPEC:US-3.AC-1]
Expectation: GET /credentials?type=postgres returns filtered list
Code: handler.go:52 — ListCredentials reads query param, passes to repo filter
Verdict: VERIFIED — filter works for valid types, returns empty array for unknown types
Every VC-N item in the verification checklist must have a [SPEC:...] reference. If a
requirement cannot be traced to a specific spec section, flag it as [SPEC:implicit]
and document the inference.
When no spec is provided: Note [SPEC:none — no spec available for citation] on
each verdict and base verification on observable behavior and code intent.
Future API integration: When building API calls that include spec documents, enable citations for automatic traceability:
{"type": "document", "source": {"type": "text", "data": spec_text}, "citations": {"enabled": True}}
The cited_text in responses doesn't count toward output tokens (free), and citations
guarantee valid pointers into the provided document.
[SPEC:...] citationsnpx claudepluginhub alphabravocompany/codsworth-marketplace --plugin foundryVerifies implementation completion by running tests, code hygiene review, spec compliance validation, and drift checks; blocks claims on failures. Use before commits or merges.
Verifies implementation against a spec with evidence-based checks and three independent self-consistency passes. Ensures every requirement is backed by verbatim evidence before merge.
Reviews implementation against task file requirements, checking every spec scenario and Done When criterion to identify gaps before shipping.