From claude-commands
Defines and runs skeptic exit criteria for non-trivial tasks with an independent verification agent that has inverted incentive to find gaps.
How this skill is triggered — by the user, by Claude, or both
Slash command
/claude-commands:skeptic-agentThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
**Proactively activate when:**
Proactively activate when:
/skeptic or asks for skeptic verificationBefore starting the task, ask:
"This looks non-trivial. Want to define skeptic exit criteria? A separate agent will independently verify these when you think you're done. What does 'actually done' look like for this task?"
If the user declines, proceed without. If they define criteria, save them to specs/exit-criteria.md in the workspace.
# specs/exit-criteria.md
## Task: [task name]
### Criterion A: [name]
What to verify: [natural language description]
Command to run (if applicable): [exact command]
What PASS looks like: [expected output/state]
What FAIL looks like: [common proxy substitutions to watch for]
### Criterion B: [name]
...
The coding agent works normally. It does NOT see specs/exit-criteria.md.
It signals readiness by stating "I believe the task is complete" or similar.
When the coder signals completion, spawn or switch to a skeptic session. The skeptic's system instructions:
You are a QA Skeptic. Your job is to FIND GAPS in the implementation.
INVERTED INCENTIVE: You are rewarded for finding missing evidence.
A false PASS is YOUR failure. A thorough FAIL report is success.
Rules:
1. Read specs/exit-criteria.md
2. For each criterion, run the EXACT verification specified
3. Do NOT accept the coder's claims — verify independently
4. Unit tests do NOT satisfy E2E criteria
5. Manual tool calls do NOT satisfy pipeline criteria
6. "Code compiles" does NOT mean "feature works"
7. EVIDENCE MUST BE VIDEO: For UI and interactive terminal criteria, ONLY accept .gif, .mp4, or .cast videos tied to a commit SHA. Reject static screenshots unless explicitly justified.
Output format per criterion:
CRITERION: [quote verbatim from specs/exit-criteria.md]
EVIDENCE FOUND: [what you actually observed — commands run, output seen]
EVIDENCE MISSING: [what should exist but doesn't]
VERDICT: PASS | FAIL | INSUFFICIENT
REASON: [specific gap or confirmation]
If ANY criterion is FAIL or INSUFFICIENT:
## Skeptic Verification Report
Task: [name]
Date: [date]
Coder model: [model]
Skeptic model: [model]
Iterations: [N]
| Criterion | Verdict | Evidence |
|---|---|---|
| A | PASS | [brief evidence] |
| B | FAIL | [what's missing] |
Overall: PASS / FAIL
Agent tool with subagent_type="pair-verifier" and skeptic system promptcodex exec with task promptcodex exec with skeptic prompt + workspace accessao spawn worker sessionao spawn --skeptic (new flag, spawns with skeptic agentRules)/pair verifier phase to use skeptic LLM instead of verifyCommand bashverifyCommand first (fast, deterministic), then skeptic for nuanced criteriaRLHF makes agents want to complete tasks. The Skeptic's "task" IS finding gaps. Its RLHF bias pushes it toward thoroughness in criticism, not toward premature approval. This turns RLHF from a bug into a feature.
npx claudepluginhub jleechanorg/claude-commands --plugin claude-commandsValidates AI agent claims against evidence trail in coding workflows. Catches unsubstantiated 'done', 'tests pass', 'fixed' without proof like outputs, diffs, or logs. Auto-triggers on completion keywords.
Iterative task loop with Definition of Done verification. Claude proposes DoD criteria, user confirms, then Claude works autonomously until all items are verified or max iterations reached.
Auto-loop execution workflow with quality gates. Use when starting any non-trivial implementation task. Provides automatic task decomposition, code implementation, testing (L1-L4), and iterative quality gates until completion. Invoke with /autoworker.