Skill

ground-truth-audit

Use when auditing a repository or project against a goal — launch/release readiness, security posture, pre-merge or pre-cutover health, dependency or compliance state — and you need a prioritized, severity-ranked findings report you can actually trust. Use when asked "is this ready to ship", "what's blocking launch", "audit this repo", "what's the real state of X", or before any release, cutover, or handoff decision. Use especially when a project's docs, READMEs, changelogs, status trackers, or memory might overstate what is actually true, and findings must be grounded in evidence rather than claims.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/proving-ground:ground-truth-audit

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

An audit is only as good as its evidence. Docs, READMEs, changelogs, status trackers, and memory describe what someone *intended* or *believed at the time* — not what is *true now*. A trustworthy audit grounds every finding in verifiable evidence, openly tags what it could not verify, and ranks findings by impact on a stated goal.

SKILL.md

139 lines · ~2.2k tokens

Stats

Stars0

MaintenanceGood

Last CommitJun 5, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Ground-Truth Audit

Overview

An audit is only as good as its evidence. Docs, READMEs, changelogs, status trackers, and memory describe what someone intended or believed at the time — not what is true now. A trustworthy audit grounds every finding in verifiable evidence, openly tags what it could not verify, and ranks findings by impact on a stated goal.

Core principle: One unverified "it's done" that turns out false is worse than ten honest "couldn't confirm." The most dangerous thing an audit can do is repeat a claim from a doc as if it had been verified.

The recurring real-world failure this skill prevents: repo state ≠ deployed reality. A fly.toml in the repo does not mean the service is running. A migration file does not mean it was applied to prod. A README saying "X is live" is a claim, not a fact.

When to Use

Readiness audits ("is this ready to launch / ship / cut over / hand off")
Security or compliance posture ("what's our exposure", "audit this for X regulation")
Pre-merge or pre-release health checks on a branch or repo
"What's the actual state of X" investigations where claims may be stale

Not for:

Reviewing a single diff or PR for bugs — use a code-review skill instead.
A goal you can't name. If you don't know what you're auditing for, define that first (step 0) or the findings are unrankable.

The Method

0. Anchor the goal — before reading any code

State what you are auditing for and what "ready" means for this goal. The same repo audited for "launch readiness" vs "security posture" vs "pre-merge health" surfaces different blockers. Write the goal and the scope (what's in, what's deliberately out) at the top of the report.

If the user didn't give a goal, ask — or state the goal you're assuming, explicitly, so they can correct it.

1. Map coverage first

List the areas the goal implies before diving in — e.g. for launch readiness: application code, infra/deploy, data/migrations, payments, auth, legal/compliance, customer-facing truthfulness, ops/monitoring. You will report which areas you examined and which you skipped. Silent truncation reads as "all clear" when it isn't.

2. Gather evidence, and tag how you know

For every candidate finding, record how you know it. Assign one of three evidence tiers (below). This tag travels with the finding into the report. Tagging is not optional — an untagged finding is an unverified claim wearing a confident face.

3. Separate verifiable from unverifiable

Anything you cannot confirm from the repo alone — is the service actually deployed? is the secret set in the prod environment? did the webhook get registered in the dashboard? is the IAM user real? — goes into a dedicated "Cannot verify from repo — needs live check" section. Do not blend these into the confident findings. They are the audit's blind spots, and naming them is more valuable than hiding them.

4. Adversarially verify each blocker before reporting it

For every BLOCKER and HIGH finding, try to disprove it before you write it down. Is that function really missing, or is it under a different name or path? Re-grep. Re-read. Check the obvious alternative location. A false blocker burns the reader's trust and their time — assume each one is wrong until you've failed to refute it.

5. Rank by a defined rubric, not by feel

Severity is a function of: impact on the goal × likelihood it bites × how hard it is to reverse. Use the tiers below so the ranking is reproducible by someone else (or by you next week), not a vibe.

6. Write actions that respect uncertainty

If a finding is 📄 Claimed or ❓ Unverifiable, its action must start with verification: "Confirm X via <check>; if true, do Y." Don't prescribe a fix for a problem you haven't confirmed exists.

Evidence Tiers

Tag every finding with exactly one:

Tag	Meaning	How to cite
✅ Verified	Confirmed from code or command output you actually read	`file:line`, or the command + its result
📄 Claimed	Asserted by a doc, README, changelog, comment, or memory — not independently confirmed	name the source + the word "unverified"
❓ Unverifiable from repo	Depends on live or external state (deployment, prod env vars, third-party dashboards)	name the live check that would confirm it

Never present 📄 or ❓ as ✅. When a doc claims something important, either verify it (and promote to ✅) or tag it 📄 and move it toward the blind-spot section.

The tag describes the claim, not the act of reading. Reading a doc, comment, or checklist that asserts something is 📄 — even though you read it with your own eyes. ✅ means you confirmed the underlying fact: saw the code, ran the command, read the output. A checklist line [ ] counsel letter sent is ✅ evidence that the checklist says it's unsent, but only 📄/❓ on whether it was actually sent (which may have happened offline). The absence of a /privacy route, by contrast, you can confirm directly from the repo — that's ✅.

Mixed findings: tag the load-bearing claim, or split it. A finding often blends a code-fact with a live-state claim — e.g. "this worker is the only process that runs extraction (✅ from code) and it may not be deployed (❓ live)." Tag each part, or lead with the tag of the claim the severity actually rests on. Don't let a ✅ sub-fact launder a ❓ conclusion.

Severity Rubric

Tier	Test
BLOCKER	Goal cannot be met as-is. Product non-functional, or legal/criminal exposure. Shipping means broken or illegal.
HIGH	Real harm at the goal — user-facing failure, misleading conduct, data risk — but not total failure.
MEDIUM	Material correctness or quality issue. Degrades the outcome; doesn't break it.
LOW	Polish, ops hygiene, latent future risk.

For readiness / release audits, add one more class:

Tier	Test
HARD GATE	A blocker whose resolution has external lead time you cannot compress by working harder — legal sign-off, third-party review, certification, a counterparty's turnaround. Surface these first; they dominate the timeline.

Output Template

Always use this structure:

# [Goal] Audit — [date]

## Summary
3–5 sentences. Is the goal met? What is the single most important thing
blocking it? No padding.

## Coverage
Areas examined: ...
Areas skipped (and why): ...

## Findings
Grouped by severity. Each finding:
- [SEVERITY] [evidence tag] Title
- Evidence: file:line / command output / named source
- Impact: what it does to the goal
- Action: respecting the evidence tier (verify-then-fix if not ✅)

## Cannot verify from repo — needs live check
The ❓ bucket. Each item: what you suspect, and the live check that
would confirm or refute it.

## What's working
Brief. Acknowledge solid ground; do not pad to seem balanced.

Common Mistakes

Mistake	Fix
Repeating a doc/memory claim as if verified	Tag it 📄, or actually verify it and tag ✅
Mixing "couldn't check" into confident findings	Move it to the ❓ blind-spot section
Assigning severity by feel	Apply the rubric: impact × likelihood × reversibility
Reporting a blocker without trying to disprove it	Try to refute every BLOCKER/HIGH first; kill false positives
Leaving coverage gaps silent	State what you examined and what you skipped
Prescribing a fix for an unconfirmed problem	"Verify X; if true, do Y"
Padding "what's working" to seem balanced	Keep it short — the reader came for the blockers

Red Flags — stop and re-check

"The README says it's done, so it's done" → that's 📄, not ✅.
"It's probably deployed / configured / applied" → ❓ bucket, name the live check.
"This is obviously a blocker" → did you try to disprove it?
"I think I covered everything" → then write down what you skipped.
Every finding in your draft reads equally confident → you forgot the evidence tags.

ground-truth-audit

Invocation

Context Preview

SKILL.md

ground-truth-audit

Invocation

Context Preview

SKILL.md

Ground-Truth Audit

Overview

When to Use

The Method

0. Anchor the goal — before reading any code

1. Map coverage first

2. Gather evidence, and tag how you know

3. Separate verifiable from unverifiable

4. Adversarially verify each blocker before reporting it

5. Rank by a defined rubric, not by feel

6. Write actions that respect uncertainty

Evidence Tiers

Severity Rubric

Output Template

Common Mistakes

Red Flags — stop and re-check

Similar Skills

Ground-Truth Audit

Overview

When to Use

The Method

0. Anchor the goal — before reading any code

1. Map coverage first

2. Gather evidence, and tag how you know

3. Separate verifiable from unverifiable

4. Adversarially verify each blocker before reporting it

5. Rank by a defined rubric, not by feel

6. Write actions that respect uncertainty

Evidence Tiers

Severity Rubric

Output Template

Common Mistakes

Red Flags — stop and re-check

Similar Skills