From speculator
Runs the eval authoring phase for a spec — presents each acceptance criterion and guides the author to write an intent-capturing eval, scores the eval set via the eval-intent-scorer agent, runs a SYSTEM-SPEC.md compatibility check and prior-spec regression check, and iterates until the configured quality threshold is met. Use when the user says "/sdlc eval", "author evals", "write evals for this spec", "eval phase", or when the sdlc-run pipeline reaches Phase 2a.
How this skill is triggered — by the user, by Claude, or both
Slash command
/speculator:eval-authoringThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are running the eval authoring phase. Evals are intent artifacts — markdown files that describe observable user outcomes for each acceptance criterion, independent of implementation.
/sdlc eval — Eval Authoring PhaseYou are running the eval authoring phase. Evals are intent artifacts — markdown files that describe observable user outcomes for each acceptance criterion, independent of implementation.
You will receive:
.claude/sdlc.local.md)interactive or full_auto (from sdlc-run context, default interactive)${CLAUDE_PLUGIN_ROOT}/lib/spec-resolution.md to find the active spec..claude/sdlc.local.md for:
gates.eval-intent.threshold (default 6.5)gates.eval-intent.max_eval_retries (default 3)gates.eval-intent.per_dimension_minimum (default 4)amends frontmatter.{spec_dir}/{spec_name}/evidence/.eval-session-partial:
If SYSTEM-SPEC.md exists at {spec_dir}/SYSTEM-SPEC.md:
${CLAUDE_PLUGIN_ROOT}/lib/system-spec-layout.md. Single-file: read SYSTEM-SPEC.md as before — unchanged. Split (index with a valid Domains table and/or SYSTEM-SPEC-*.md siblings): apply the subset-read rule — read the index plus only the domain file(s) matching the spec's declared domain: frontmatter (SYSTEM-SPEC-<domain>.md); read the index plus all domain files when the spec declares no domain (the conservative default). Crystallized behaviors come from the domain file(s) — the index is navigation only.amends frontmatter to find relevant sections.📋 SYSTEM-SPEC.md context — behaviors this spec amends:
Section: {section name}
Current behavior: {behavior text}
Declared change: {amends.change from spec frontmatter}
For each acceptance criterion (starting from last partial, or AC1):
Interactive mode: Display the AC:
─────────────────────────────────────────
AC{N}: {full AC text}
─────────────────────────────────────────
Write an eval for this AC. Describe the observable outcome a user would experience
if this AC is satisfied — without referencing source code, function names, or
implementation details.
Suggested structure:
Observable success: [what the user sees/experiences]
Anti-patterns this catches: [which spec anti-patterns would cause failure]
Would fail if: [concrete failure conditions in user-visible terms]
Accept the author's input and write it to docs/specs/{feature}/evals/ac-{N}.md.
Write the partial session marker after each AC: echo "{N}" > {spec_dir}/{spec_name}/evidence/.eval-session-partial
Full auto mode: Generate the eval autonomously by:
docs/specs/{feature}/evals/ac-{N}.md with the structure aboveWrite the partial session marker after each eval.
Dispatch the eval-intent-scorer agent with:
docs/specs/{feature}/evals/)domain: frontmatter (if any), written inline — in split-layout projects the scorer reads the index plus only that domain's file(s) for its conflict check, all domain files when undeclared (the subset-read rule in lib/system-spec-layout.md)max_eval_retries attempts)If result: fail in the scorecard:
Interactive mode: Present the blocking and recommended flags:
❌ Eval quality score: {overall} (threshold: {threshold})
Blocking issues to fix:
{blocking flags}
Suggested improvements:
{recommended flags}
Which eval files would you like to revise? (Enter AC numbers, e.g. "1 3" or "all")
Accept revisions, overwrite the eval files, re-dispatch the scorer.
Full auto mode: Read the flags. For each blocking flag:
If retries exhausted and still failing:
⚠️ Eval quality gate failed after {N} attempts.
Lowest score: {dimension} ({score}) — below minimum {per_dimension_minimum}
Blocking issues:
{blocking flags}
Options:
1. Override with written justification (recorded in evidence, does not affect trust score)
2. Stop here and manually revise evals, then re-run /sdlc eval
In full_auto mode: escalate to human. Stop the pipeline.
If the scorecard contains system_spec_conflicts entries:
If the user chooses to override:
gate-2a-eval-intent.yml with:
override:
overridden: true
justification: "{user's text}"
overridden_by: "{user identity if known, else 'manual override'}"
override_date: "{today}"
result: fail to result: override-passWhen called from sdlc-run in guided mode, after authoring is complete but BEFORE scoring:
AskUserQuestion: "Review the authored evals above. Options: (a) approve all, (e) edit specific AC number, (r) reject specific AC number — your choice?"On success:
git add docs/specs/{feature}/evals/ docs/specs/{feature}/evidence/gate-2a-eval-intent.yml
git commit -m "chore(sdlc): phase 2a — eval authoring complete ({score}/10)"
Remove the partial session marker:
rm -f docs/specs/{feature}/evidence/.eval-session-partial
Report to the user:
✅ Gate 2a passed: eval quality {score} (threshold: {threshold})
Evals authored: {N} (one per AC)
Stored: docs/specs/{feature}/evals/
Evidence: docs/specs/{feature}/evidence/gate-2a-eval-intent.yml
Storage failure: If any eval file cannot be written:
❌ Storage error: cannot write to {path}
Error: {error message}
Recovery: Fix permissions with `chmod 755 docs/specs/{feature}/evals/`, then re-run /sdlc eval
Do not silently continue with unwritten evals. Exit with error.
Interrupted session recovery: The .eval-session-partial marker file contains the last completed AC number. On next invocation, check for this file and resume from last_completed + 1.
Provides a checklist for code reviews covering functionality, security, performance, maintainability, tests, and quality. Use for pull requests, audits, team standards, and developer training.
npx claudepluginhub dmokong/claude-plugins --plugin speculator