From stdd-agents
Use when creating or updating agent evaluation suites. Defines eval structure, rubrics, and validation patterns.
How this skill is triggered — by the user, by Claude, or both
Slash command
/stdd-agents:evaluationThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Guidelines for creating comprehensive evaluation suites.
Guidelines for creating comprehensive evaluation suites.
Use this skill when:
All evaluations in evals/ follow a consistent structure with both code-based and LLM-as-judge validations.
Use this template for all eval spec.yaml files:
feature:
name: "[Feature Name] Evaluation"
as_a: evaluator
i_want: validate feature behavior
solutions:
- Ground truth validation
- Code-based validation
- LLM-as-judge validation
requirements:
- id: REQ-EVAL-XX-001
eval: G
description: Description of ground truth requirement
- id: REQ-EVAL-XX-002
eval: C
description: Description of code-based requirement
- id: REQ-EVAL-XX-003
eval: L
description: Description of LLM-judged requirement
- id: REQ-EVAL-XX-004
eval: O
description: Description of planned requirement
Template Rules:
REQ-EVAL-XX-NNN
XX = 2-3 letter eval abbreviation (e.g., AG for action_generation, AS for action_scenarios)NNN = Sequential 3-digit number starting at 001[G] = Ground truth validation (matches expected output)[C] = Code-based validation (deterministic checks)[L] = LLM-as-judge validation (quality assessment)[O] = Not yet implemented (planned for future)Use this template for all rubric.md files:
# [Feature Name] Reasoning Trace Rubric
## Format
`[PASS/FAIL] RUBRIC-ID: Criterion description`
## Based on: [Concrete example with specific values]
### [Category Name]
- [ ] RUB-XX-001: Specific, objective criterion
- [ ] RUB-XX-002: Another specific criterion
Template Rules:
RUB-XX-NNN (matches spec.yaml abbreviation)- [ ] format for LLM judge to mark pass/failGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub craigtkhill/stdd-agents --plugin stdd-agents