From agentic-development-workflow
Reusable generator/evaluator pattern for honest artifact validation. Provides scoring framework, agent contracts, eval protocol, and findings format. Use directly for gen/eval loops or reference from other skills.
How this skill is triggered — by the user, by Claude, or both
Slash command
/agentic-development-workflow:gen-evalThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A reusable design pattern for honest evaluation of agent-produced artifacts. Separates the agent that creates work (generator) from the agent that evaluates it (evaluator), because agents consistently praise their own work.
A reusable design pattern for honest evaluation of agent-produced artifacts. Separates the agent that creates work (generator) from the agent that evaluates it (evaluator), because agents consistently praise their own work.
"When asked to evaluate work they've produced, agents tend to respond by confidently praising the work — even when, to a human observer, the quality is obviously mediocre." — Anthropic, "Harness Design for Long-Running Application Development"
This skill is both a utility library and a standalone skill:
references/ files for scoring, prompts, protocol, and findings format.| Skill | What it uses | Reference files |
|---|---|---|
/aep-build Phase 5 | Scoring framework + eval protocol | scoring-framework.md, eval-protocol.md |
/aep-launch | Dimension presets for brainstorming | scoring-framework.md (presets section) |
/aep-validate | Agent prompts + findings format | agent-contracts.md, findings-format.md, scoring-framework.md |
After sync with aep- prefix, reference files are at:
.claude/skills/aep-gen-eval/references/scoring-framework.md
.claude/skills/aep-gen-eval/references/agent-contracts.md
.claude/skills/aep-gen-eval/references/eval-protocol.md
.claude/skills/aep-gen-eval/references/findings-format.md
Generator and evaluator must be separate agents. This is not optional — it is the single most impactful quality improvement in agentic workflows.
Why:
Read these files for detailed specifications. Each file is self-contained.
| File | Contents | When to read |
|---|---|---|
references/scoring-framework.md | Dimension definitions (1-5 scale), hard failure thresholds, dimension presets (UI, API, security, data, mixed), few-shot examples, anti-patterns | Setting up evaluation criteria, scoring work, calibrating evaluators |
references/agent-contracts.md | Generator/evaluator role separation, prompt templates (generator, evaluator, protocol checker), context assembly rules | Spawning evaluation agents, assembling prompts |
references/eval-protocol.md | Eval request/response format, verification JSON schema, the eval loop (request → response → fix → re-evaluate), execution contexts (Task subagent, codex exec, tmux, workflow), the needs-human gate record | Running the evaluation loop, tracking verification state |
references/findings-format.md | Severity categorization (blocking/important/minor), deduplication protocol, presentation format, changelog entry format | Consolidating findings from multiple agents, presenting results |
When invoked directly, this skill runs a gen/eval loop on any artifact.
What is being evaluated? Options:
| Mode | Agents | When to use |
|---|---|---|
| Parallel | Generator + Evaluator spawned simultaneously | Documents, designs, product context — agents work independently |
| Sequential | Generator first, then Evaluator reads generator's work | Code review — evaluator needs to see the implementation |
| Loop | Generator → Evaluator → fix → repeat (max 5 rounds) | Active development — generator can fix issues between rounds |
Read references/scoring-framework.md and select the appropriate preset:
Or define custom dimensions for the specific artifact.
Read references/agent-contracts.md for prompt templates. Customize the templates with:
Read references/findings-format.md for how to consolidate, categorize, and present findings. Apply fixes to the artifact.
Why a utility skill, not just reference files:
/aep-gen-eval) for ad-hoc validationreferences/ directory is still accessible to other skills via pathWhy not merge with /aep-validate:
/aep-validate is a product-context skill with 4 specific modes (product, design, code, document)/aep-validate consumes the gen/eval pattern; it is not the pattern itselfWhy not keep in /aep-launch:
After running gen/eval, proceed based on what was evaluated:
/aep-dispatch/aep-launch/aep-buildnpx claudepluginhub memorysaver/agentic-engineering-patterns --plugin agentic-development-workflowProvides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.