From claude-commands
Pre-report gate that verifies all planned evidence artifacts exist before writing comparison or validation reports. Prevents making quantitative claims without backing data.
How this skill is triggered — by the user, by Claude, or both
Slash command
/claude-commands:validation-gateThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Before writing ANY evidence report, comparison report, or validation summary that makes quantitative claims.
Before writing ANY evidence report, comparison report, or validation summary that makes quantitative claims.
Before executing ANY validation plan, the agent MUST present this to the user and wait for approval.
CRITICAL DEFAULT: Start with the ideal test as your actual plan. Do NOT start with the easiest/fastest approach and list what it can't prove. Instead, design the real test first, then only downgrade specific steps when you can name a concrete, non-hypothetical blocker. "It might be slow" or "it could use API budget" are NOT blockers — ask the user and let them decide.
Anti-pattern to avoid: LLMs have a training bias toward synthetic/safe experiments. They default to "run a command and capture metrics" when the right answer is often "create real test data and measure real operations." Always ask: "Can I just do the real thing?" before reaching for a synthetic substitute. Common real-test resources that are often available but overlooked:
ao spawn for creating real worker sessions## Validation Intent Check
**What you asked**: <restate the user's actual question in plain language>
**What this plan measures**: <what the experiment will actually test>
**Gap**: <if these differ, say so explicitly>
### Plan (default: the real test)
<describe the REAL test — real PRs, real CI, real sessions, real measurement>
<this should be your actual plan, not an aspirational ideal>
### Compromises (only if a specific blocker exists)
| Downgrade | Specific blocker | Impact on validity | User approved? |
|---|---|---|---|
| <only list if you MUST downgrade> | <name the exact blocker, not "might be slow"> | <what this loses> | <wait for yes> |
If the compromise table is empty, good — you're doing the real test.
### What the results CAN prove
<honest list>
### What the results CANNOT prove
<honest list — this is the most important section>
If the user sees the "CANNOT prove" list and says "that's not good enough", redesign the experiment before executing. Do not execute a plan the user hasn't explicitly approved after seeing the compromises.
Locate the validation/test plan that specifies what artifacts should be produced. Common locations:
roadmap/*.md (validation plans)/tmp/*/ (evidence directories)Parse the plan for every file path listed under "Evidence artifacts", "Output", or "Artifacts" sections. Build a checklist.
For each required artifact:
# Check exists and has content
for f in <artifact_list>; do
if [ ! -f "$f" ]; then
echo "MISSING: $f"
elif [ $(wc -c < "$f") -lt 50 ]; then
echo "EMPTY/TRIVIAL: $f ($(wc -c < "$f") bytes)"
else
echo "OK: $f ($(wc -c < "$f") bytes)"
fi
done
| Claim class | Artifact must contain |
|---|---|
| Runtime measurement | Timestamps, process output, actual command invocations |
| Code analysis | Source file references, line numbers, function names |
| Integration test | Real I/O logs, API call evidence, timing data |
| Projection | Stated assumptions, formula, sensitivity analysis |
If an artifact contains code analysis but the claim is "runtime measurement", flag it as MISMATCH.
Before finalizing any report:
npx claudepluginhub jleechanorg/claude-commands --plugin claude-commandsValidates AI agent claims against evidence trail in coding workflows. Catches unsubstantiated 'done', 'tests pass', 'fixed' without proof like outputs, diffs, or logs. Auto-triggers on completion keywords.
Validates implementation against spec using 6 gates (coverage, proof artifacts, credential safety) and generates a coverage matrix report.
Validates execution of an implementation plan by verifying success criteria, checking git history, and identifying deviations. Use after implementing a plan to ensure correctness.