Skill

challenge

From evaluator

Draft Verification critique: self-review + git history comparison. Run before marking any non-trivial task complete.

Popularity

Parent stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/evaluator:challenge

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

ReadGrepGlobBash

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Two-stage review inspired by Meta-Harness's Draft Verification pattern (arXiv 2603.28052). Stage 1 is your self-critique (the draft). Stage 2 retrieves historical evidence — both confirmers and challengers — to verify or refute the draft assessment.

SKILL.md

108 lines · ~974 tokens

Stats

LanguageShell

Parent stars2

MaintenanceGood

Last CommitApr 3, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Challenge: Draft Verification Critique

Stage 1: Self-Critique (Draft)

Review the code you just wrote. Answer honestly:

1.1 Could This Be Simpler?

Is there a shorter, cleaner way to achieve the same result?
Did you add any abstraction that's used only once?
Are there any "just in case" additions that aren't needed?
Count the lines changed. Could you achieve it in half?

1.2 What Breaks?

What happens with null/empty/unexpected input?
What happens under concurrent access?
What if this runs twice? Is it idempotent?
What's the failure mode — silent corruption or loud error?

1.3 What's the Weakest Part?

Which section of the code are you least confident about?
Where did you make assumptions without verifying?
Is there any part where you're hoping it works rather than knowing it works?

1.4 Does This Match the Request?

Re-read the original task. Did you do ONLY what was asked?
Did you add anything beyond scope? Remove it.
Did you change anything you weren't asked to change? Revert it.

1.5 Would a Staff Engineer Approve This?

Is the code readable without comments?
Does it follow existing patterns in the codebase?
Would it survive a code review without changes?

Stage 2: Historical Verification

For each file you changed, retrieve evidence from git history. This stage verifies your Stage 1 assessment against real project history.

2.1 Retrieve Confirmers (similar changes that succeeded)

For each modified file, run:

git log --oneline --diff-filter=M -10 -- <file>

Then for the most relevant commits, check what patterns they used:

git show <commit> -- <file>

Ask: Do your changes follow the patterns that have worked before in these files?

2.2 Retrieve Challengers (similar changes that caused problems)

Search for reverted or fix-up commits touching the same files:

git log --oneline --all --grep="revert\|fix\|hotfix\|rollback" -- <file>

Search for changes that were reverted within a week:

git log --oneline --diff-filter=M --since="3 months ago" -- <file> | head -20

Ask: Have similar changes to these files caused issues before? Are you repeating a known anti-pattern?

2.3 Pattern Check

If git history is available, also check:

git log --oneline --diff-filter=M --since="1 month" -- <file> — is this a file hotspot? High churn = high risk
Look for test files associated with modified files. Do tests exist? Are they passing?

If no git history is available (new repo, new files), skip Stage 2 and note it in the report.

Output

CHALLENGE REPORT (Draft Verification)
======================================

Stage 1 — Self-Critique
  Simplification:  [Can/Cannot be simplified. If can: how]
  Risk:            [Highest risk area and why]
  Weakest Part:    [What and why]
  Scope Match:     [Yes/No. If no: what was added beyond scope]
  Staff Approval:  [Yes/Likely/No. If no: what needs to change]

Stage 2 — Historical Verification
  Confirmers:     [N past changes to same files followed similar patterns / no history]
  Challengers:    [N past issues found in same files / none found]
  Churn Risk:     [Low/Medium/High — based on recent change frequency]
  Verdict:        [CONFIRMED / CAUTION / REFUTED — does history support this change?]

Be ruthlessly honest. The point is to catch issues BEFORE the user does.

challenge

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

challenge

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

Challenge: Draft Verification Critique

Stage 1: Self-Critique (Draft)

1.1 Could This Be Simpler?

1.2 What Breaks?

1.3 What's the Weakest Part?

1.4 Does This Match the Request?

1.5 Would a Staff Engineer Approve This?

Stage 2: Historical Verification

2.1 Retrieve Confirmers (similar changes that succeeded)

2.2 Retrieve Challengers (similar changes that caused problems)

2.3 Pattern Check

Output

Similar Skills

Challenge: Draft Verification Critique

Stage 1: Self-Critique (Draft)

1.1 Could This Be Simpler?

1.2 What Breaks?

1.3 What's the Weakest Part?

1.4 Does This Match the Request?

1.5 Would a Staff Engineer Approve This?

Stage 2: Historical Verification

2.1 Retrieve Confirmers (similar changes that succeeded)

2.2 Retrieve Challengers (similar changes that caused problems)

2.3 Pattern Check

Output

Similar Skills