Skill

Proof Verify

Freezes acceptance criteria before building, then uses independent agents to verify each criterion against the plan. Prevents self-verification bias and ensures build-to-spec traceability.

testing

automation

Popularity

Stars

126

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/claude-code-config:proof-verify

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Plan-based verification: freeze acceptance criteria BEFORE building, verify AFTER with independent agents.

Supporting Files

references/kb-aware-verification.md

SKILL.md

237 lines · ~1.7k tokens

Stats

LanguagePython

Stars126

Forks19

MaintenanceExcellent

Last CommitJun 15, 2026

Actions

View Source View Plugin View on GitHub View README

Proof Verify

Plan-based verification: freeze acceptance criteria BEFORE building, verify AFTER with independent agents.

When to Use

After completing a feature/fix that was built from a plan
When you need independent confirmation that work meets spec
When the builder should NOT verify their own work
Trigger phrases: "verify against plan", "check the implementation", "proof check", "independent review"

The Pattern

PHASE 1: PLAN (before any code)
  Create .proof/PLAN.md with numbered acceptance criteria
  Each AC: testable, specific, has a verification command or check
  Plan is FROZEN - no changes during build

PHASE 2: BUILD (normal work)
  Implement against the plan
  Mark progress in .proof/PROGRESS.md
  Builder does NOT self-verify

PHASE 3: VERIFY (after build, independent agent)
  Fresh agent reads PLAN.md (never saw the build process)
  Walks through each AC, runs verification commands
  Writes .proof/VERDICT.md with PASS/FAIL per criterion
  If any FAIL → .proof/PROBLEMS.md with specific fixes

PHASE 4: FIX (if needed)
  Builder reads PROBLEMS.md, makes minimal fixes
  Back to PHASE 3 (re-verify)
  Loop until all PASS

Phase 1: Create Plan

Create .proof/PLAN.md in the project root:

# Verification Plan

**Created:** YYYY-MM-DD HH:MM
**Task:** [one-line description]
**Builder:** [session ID or "current"]
**Status:** FROZEN

## Acceptance Criteria

### AC1: [short name]
**Description:** [what must be true]
**Verify:** [exact command or check to run]
**Expected:** [what success looks like]

### AC2: [short name]
**Description:** [what must be true]
**Verify:** [exact command or check to run]
**Expected:** [what success looks like]

### AC3: [short name]
...

## Out of Scope
- [explicitly what this plan does NOT cover]

## Constraints
- [time, resource, or technical constraints]

Rules for good ACs:

Testable - there is a command or check that produces PASS/FAIL
Specific - "function returns correct value" not "code works"
Independent - each AC can be verified without the others
3-8 ACs - fewer than 3 = loopholes, more than 8 = checklist gaming
Frozen - once written, do not modify during build

Phase 2: Build

Normal implementation. The only additions:

Create .proof/PROGRESS.md as you work:

# Build Progress

### AC1: [name]
- [x] Implemented in `src/foo.py:42`
- Files changed: `src/foo.py`, `tests/test_foo.py`

### AC2: [name]
- [x] Implemented in `src/bar.py:18`
- Files changed: `src/bar.py`
- Note: chose approach B because [reason]

After build is complete, write .proof/EVIDENCE.md:

# Evidence

### AC1: [name]
**Command:** `pytest tests/test_foo.py -v`
**Output:**
\```
tests/test_foo.py::test_returns_correct PASSED
tests/test_foo.py::test_handles_edge PASSED
\```
**Result:** PASS

### AC2: [name]
**Command:** `grep -c "TODO" src/bar.py`
**Output:** `0`
**Result:** PASS

Builder collects evidence but does NOT write the verdict. That is the verifier's job.

Phase 3: Verify (Independent Agent)

This is the critical phase. The verifier MUST be:

A fresh agent (new session or subagent) that never saw the build
Given ONLY: PLAN.md + access to the codebase
NOT given: PROGRESS.md, EVIDENCE.md, or any build context

Verifier prompt template

You are an independent verifier. Your job is to check whether
the implementation meets the acceptance criteria in .proof/PLAN.md.

Rules:
1. Read .proof/PLAN.md first. This is your ONLY specification.
2. For each AC, run the verification command yourself.
3. Do NOT read .proof/PROGRESS.md or .proof/EVIDENCE.md
   (those are the builder's claims - you verify independently).
4. Write your verdict to .proof/VERDICT.md in this format:

# Verification Verdict

**Verifier:** [your session ID]
**Date:** YYYY-MM-DD HH:MM
**Plan hash:** [first 8 chars of md5 of PLAN.md]

## Results

### AC1: [name]
**Status:** PASS | FAIL
**Evidence:** [what you saw when you ran the check]
**Notes:** [any observations]

### AC2: [name]
...

## Summary
- Total: N criteria
- Passed: X
- Failed: Y
- **Overall:** PASS | FAIL

5. If any AC fails, also create .proof/PROBLEMS.md:

# Problems

### AC2: [name]
**Expected:** [from PLAN.md]
**Actual:** [what you found]
**Suggested fix:** [smallest change that would fix it]
**Affected files:** [list]

6. Do NOT fix anything. You are read-only. Report only.

How to spawn the verifier

Option A: Subagent (same session)

Agent({
  description: "Independent verification against plan",
  prompt: "[verifier prompt above]",
  mode: "plan"  // read-only first
})

Option B: Fresh session (stronger isolation) Write handoff with instruction: "Start by reading .proof/PLAN.md and running verification."

Option C: Multiple verifiers (highest confidence) Spawn 2-3 verifiers independently. If they disagree on any AC, that AC needs investigation.

Phase 4: Fix Loop

If VERDICT.md shows any FAIL:

Builder reads PROBLEMS.md
Makes minimal fixes (not refactoring, not "while I'm here")
Updates EVIDENCE.md with new evidence for failed ACs
Verifier runs again (Phase 3)
Loop until all PASS

Typical: 1-2 fix rounds. If 3+ rounds on same AC → the AC itself might be wrong. Revisit PLAN.md.

File Structure

.proof/
  PLAN.md        # frozen acceptance criteria (Phase 1)
  PROGRESS.md    # builder's notes (Phase 2)
  EVIDENCE.md    # builder's evidence (Phase 2)
  VERDICT.md     # verifier's verdict (Phase 3)
  PROBLEMS.md    # verifier's findings (Phase 3, if failures)

Gotchas

Builder reads VERDICT, not the reverse. Verifier never sees builder's evidence. This prevents confirmation bias.
"PASS with concerns" is FAIL. Either it passes or it doesn't. No soft passes.
Plan hash in verdict. If someone edited PLAN.md mid-build, the hash won't match. Catch.
Time limit. If verification takes >30 min, the ACs are too vague. Rewrite them.
Don't verify style. ACs should be functional ("function returns X"), not stylistic ("code is clean"). Style is for code review, not proof loop.

Troubleshooting

Symptom	Cause	Fix
Verifier passes everything	ACs too vague	Rewrite with specific commands
3+ fix rounds on same AC	AC is wrong or untestable	Revisit PLAN.md
Verifier disagrees with builder's evidence	Different env or stale state	Both run from clean state
Builder keeps editing PLAN.md	Not frozen	Hash check catches this

Sources

Proof Loop (Principle 02) - the theoretical foundation
OpenClaw-RL - spec freeze → build → fresh verify
Agent-R - failed-then-fixed trajectories
oh-my-claudecode Ralph - PRD-driven persistence (practical inspiration)

Proof Verify

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Proof Verify

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Proof Verify

When to Use

The Pattern

Phase 1: Create Plan

Phase 2: Build

Phase 3: Verify (Independent Agent)

Verifier prompt template

How to spawn the verifier

Phase 4: Fix Loop

File Structure

Gotchas

Troubleshooting

Sources

Similar Skills

Proof Verify

When to Use

The Pattern

Phase 1: Create Plan

Phase 2: Build

Phase 3: Verify (Independent Agent)

Verifier prompt template

How to spawn the verifier

Phase 4: Fix Loop

File Structure

Gotchas

Troubleshooting

Sources

Similar Skills