From proving-ground
Use when auditing a repository or project against a goal — launch/release readiness, security posture, pre-merge or pre-cutover health, dependency or compliance state — and you need a prioritized, severity-ranked findings report you can actually trust. Use when asked "is this ready to ship", "what's blocking launch", "audit this repo", "what's the real state of X", or before any release, cutover, or handoff decision. Use especially when a project's docs, READMEs, changelogs, status trackers, or memory might overstate what is actually true, and findings must be grounded in evidence rather than claims.
How this skill is triggered — by the user, by Claude, or both
Slash command
/proving-ground:ground-truth-auditThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
An audit is only as good as its evidence. Docs, READMEs, changelogs, status trackers, and memory describe what someone *intended* or *believed at the time* — not what is *true now*. A trustworthy audit grounds every finding in verifiable evidence, openly tags what it could not verify, and ranks findings by impact on a stated goal.
An audit is only as good as its evidence. Docs, READMEs, changelogs, status trackers, and memory describe what someone intended or believed at the time — not what is true now. A trustworthy audit grounds every finding in verifiable evidence, openly tags what it could not verify, and ranks findings by impact on a stated goal.
Core principle: One unverified "it's done" that turns out false is worse than ten honest "couldn't confirm." The most dangerous thing an audit can do is repeat a claim from a doc as if it had been verified.
The recurring real-world failure this skill prevents: repo state ≠ deployed reality. A fly.toml in the repo does not mean the service is running. A migration file does not mean it was applied to prod. A README saying "X is live" is a claim, not a fact.
Not for:
State what you are auditing for and what "ready" means for this goal. The same repo audited for "launch readiness" vs "security posture" vs "pre-merge health" surfaces different blockers. Write the goal and the scope (what's in, what's deliberately out) at the top of the report.
If the user didn't give a goal, ask — or state the goal you're assuming, explicitly, so they can correct it.
List the areas the goal implies before diving in — e.g. for launch readiness: application code, infra/deploy, data/migrations, payments, auth, legal/compliance, customer-facing truthfulness, ops/monitoring. You will report which areas you examined and which you skipped. Silent truncation reads as "all clear" when it isn't.
For every candidate finding, record how you know it. Assign one of three evidence tiers (below). This tag travels with the finding into the report. Tagging is not optional — an untagged finding is an unverified claim wearing a confident face.
Anything you cannot confirm from the repo alone — is the service actually deployed? is the secret set in the prod environment? did the webhook get registered in the dashboard? is the IAM user real? — goes into a dedicated "Cannot verify from repo — needs live check" section. Do not blend these into the confident findings. They are the audit's blind spots, and naming them is more valuable than hiding them.
For every BLOCKER and HIGH finding, try to disprove it before you write it down. Is that function really missing, or is it under a different name or path? Re-grep. Re-read. Check the obvious alternative location. A false blocker burns the reader's trust and their time — assume each one is wrong until you've failed to refute it.
Severity is a function of: impact on the goal × likelihood it bites × how hard it is to reverse. Use the tiers below so the ranking is reproducible by someone else (or by you next week), not a vibe.
If a finding is 📄 Claimed or ❓ Unverifiable, its action must start with verification: "Confirm X via <check>; if true, do Y." Don't prescribe a fix for a problem you haven't confirmed exists.
Tag every finding with exactly one:
| Tag | Meaning | How to cite |
|---|---|---|
| ✅ Verified | Confirmed from code or command output you actually read | file:line, or the command + its result |
| 📄 Claimed | Asserted by a doc, README, changelog, comment, or memory — not independently confirmed | name the source + the word "unverified" |
| ❓ Unverifiable from repo | Depends on live or external state (deployment, prod env vars, third-party dashboards) | name the live check that would confirm it |
Never present 📄 or ❓ as ✅. When a doc claims something important, either verify it (and promote to ✅) or tag it 📄 and move it toward the blind-spot section.
The tag describes the claim, not the act of reading. Reading a doc, comment, or checklist that asserts something is 📄 — even though you read it with your own eyes. ✅ means you confirmed the underlying fact: saw the code, ran the command, read the output. A checklist line [ ] counsel letter sent is ✅ evidence that the checklist says it's unsent, but only 📄/❓ on whether it was actually sent (which may have happened offline). The absence of a /privacy route, by contrast, you can confirm directly from the repo — that's ✅.
Mixed findings: tag the load-bearing claim, or split it. A finding often blends a code-fact with a live-state claim — e.g. "this worker is the only process that runs extraction (✅ from code) and it may not be deployed (❓ live)." Tag each part, or lead with the tag of the claim the severity actually rests on. Don't let a ✅ sub-fact launder a ❓ conclusion.
| Tier | Test |
|---|---|
| BLOCKER | Goal cannot be met as-is. Product non-functional, or legal/criminal exposure. Shipping means broken or illegal. |
| HIGH | Real harm at the goal — user-facing failure, misleading conduct, data risk — but not total failure. |
| MEDIUM | Material correctness or quality issue. Degrades the outcome; doesn't break it. |
| LOW | Polish, ops hygiene, latent future risk. |
For readiness / release audits, add one more class:
| Tier | Test |
|---|---|
| HARD GATE | A blocker whose resolution has external lead time you cannot compress by working harder — legal sign-off, third-party review, certification, a counterparty's turnaround. Surface these first; they dominate the timeline. |
Always use this structure:
# [Goal] Audit — [date]
## Summary
3–5 sentences. Is the goal met? What is the single most important thing
blocking it? No padding.
## Coverage
Areas examined: ...
Areas skipped (and why): ...
## Findings
Grouped by severity. Each finding:
- [SEVERITY] [evidence tag] Title
- Evidence: file:line / command output / named source
- Impact: what it does to the goal
- Action: respecting the evidence tier (verify-then-fix if not ✅)
## Cannot verify from repo — needs live check
The ❓ bucket. Each item: what you suspect, and the live check that
would confirm or refute it.
## What's working
Brief. Acknowledge solid ground; do not pad to seem balanced.
| Mistake | Fix |
|---|---|
| Repeating a doc/memory claim as if verified | Tag it 📄, or actually verify it and tag ✅ |
| Mixing "couldn't check" into confident findings | Move it to the ❓ blind-spot section |
| Assigning severity by feel | Apply the rubric: impact × likelihood × reversibility |
| Reporting a blocker without trying to disprove it | Try to refute every BLOCKER/HIGH first; kill false positives |
| Leaving coverage gaps silent | State what you examined and what you skipped |
| Prescribing a fix for an unconfirmed problem | "Verify X; if true, do Y" |
| Padding "what's working" to seem balanced | Keep it short — the reader came for the blockers |
npx claudepluginhub aidenhiew/proving-ground --plugin proving-groundProvides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.