From evaluator
Draft Verification critique: self-review + git history comparison. Run before marking any non-trivial task complete.
How this skill is triggered — by the user, by Claude, or both
Slash command
/evaluator:challengeThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Two-stage review inspired by Meta-Harness's Draft Verification pattern (arXiv 2603.28052). Stage 1 is your self-critique (the draft). Stage 2 retrieves historical evidence — both confirmers and challengers — to verify or refute the draft assessment.
Two-stage review inspired by Meta-Harness's Draft Verification pattern (arXiv 2603.28052). Stage 1 is your self-critique (the draft). Stage 2 retrieves historical evidence — both confirmers and challengers — to verify or refute the draft assessment.
Review the code you just wrote. Answer honestly:
For each file you changed, retrieve evidence from git history. This stage verifies your Stage 1 assessment against real project history.
For each modified file, run:
git log --oneline --diff-filter=M -10 -- <file>
Then for the most relevant commits, check what patterns they used:
git show <commit> -- <file>
Ask: Do your changes follow the patterns that have worked before in these files?
Search for reverted or fix-up commits touching the same files:
git log --oneline --all --grep="revert\|fix\|hotfix\|rollback" -- <file>
Search for changes that were reverted within a week:
git log --oneline --diff-filter=M --since="3 months ago" -- <file> | head -20
Ask: Have similar changes to these files caused issues before? Are you repeating a known anti-pattern?
If git history is available, also check:
git log --oneline --diff-filter=M --since="1 month" -- <file> — is this a file hotspot? High churn = high riskIf no git history is available (new repo, new files), skip Stage 2 and note it in the report.
CHALLENGE REPORT (Draft Verification)
======================================
Stage 1 — Self-Critique
Simplification: [Can/Cannot be simplified. If can: how]
Risk: [Highest risk area and why]
Weakest Part: [What and why]
Scope Match: [Yes/No. If no: what was added beyond scope]
Staff Approval: [Yes/Likely/No. If no: what needs to change]
Stage 2 — Historical Verification
Confirmers: [N past changes to same files followed similar patterns / no history]
Challengers: [N past issues found in same files / none found]
Churn Risk: [Low/Medium/High — based on recent change frequency]
Verdict: [CONFIRMED / CAUTION / REFUTED — does history support this change?]
Be ruthlessly honest. The point is to catch issues BEFORE the user does.
npx claudepluginhub artmin96/forge-studio --plugin evaluatorReviews recent git diffs and commits with brutal honesty before PRs, spotting 2am logic flaws, copy-paste artifacts, debug leftovers, hacks, and poor naming.
Runs pre-commit self-review checklist: re-reads diffs as a stranger, scans ±20 lines for unsafe patterns like hardcoded credentials, string-built SQL, unsafe deserialization before commits, PRs, or handoffs.
Reviews code changes against approved plans/tasks to catch missing requirements, YAGNI violations, dead code, risky patterns before merging or Hive tasks.