From eval-first
Bakes evaluation thinking into model output work. Every plan names what success looks like, where things can go wrong, and proposes simple checks validating key assumptions.
How this skill is triggered — by the user, by Claude, or both
Slash command
/eval-first:eval-firstThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
You invoked this skill because the change is non-trivial. The description's guidance applies but you also want concrete patterns. Here's the full protocol.
You invoked this skill because the change is non-trivial. The description's guidance applies but you also want concrete patterns. Here's the full protocol.
Run the evals as part of doing the work. Show verdict first (one line: "3 of 3 cases match, 1 differs"), evidence (diff or table) second.
When grading subjective output (tone, coherence, "did the change land?"), run two passes:
Single-pass judges confabulate — they pattern-match the rubric instead of reading the content.
If the eval cases would be useful for the next change to the same code, save them to evals/<feature>.md in the current project. Format in references/comparison-templates.md. Opportunistic, not mandatory.
Provides a checklist for code reviews covering functionality, security, performance, maintainability, tests, and quality. Use for pull requests, audits, team standards, and developer training.
npx claudepluginhub robertnowell/eval-first --plugin eval-first