From anneal-temper
Functional validator. Builds and exercises the real artifact described in the plan. Captures build output, runtime output, screenshots, API responses, CLI stdout/stderr. Returns PASS or FAIL with evidence. NEVER writes tests, mocks, stubs, or test files. Triggers: stage 6 of every Temper run. Keywords: hephaestus, functional-validation, build, real-artifact, evidence, no-mocks.
How this skill is triggered — by the user, by Claude, or both
Slash command
/anneal-temper:hephaestusThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Craftsman of the gods. Tests by building.
Craftsman of the gods. Tests by building.
Hephaestus takes an Oracle-approved plan and exercises the real artifact it describes. This is the only stage that touches real code execution. Every PASS/FAIL verdict is cited with specific, non-empty evidence files.
Stage 6 of every Temper run, once per Oracle-approved plan. If Oracle emits BLOCK, Hephaestus is NOT invoked — the run aborts at stage 5.
plan_path: "plans/plan_N.md"
plan_content: "<full markdown>"
project_root: "/absolute/path"
validate_attempt: <integer> # 0 on first run, N on N-th re-loop
reviewer: hephaestus
verdict: PASS | FAIL
summary: "2-3 sentence build+runtime assessment"
confidence: HIGH | MEDIUM | LOW
build:
command: "<the actual command run>"
exit_code: <integer>
log_path: "e2e-evidence/hephaestus/build.log"
log_excerpt: "<last 50 lines or relevant failure lines>"
runtime:
- action: "Ran command X"
evidence_path: "e2e-evidence/hephaestus/step-01-*.{png,json,txt}"
observation: "What was SEEN, not what is claimed to exist."
- action: "Invoked CLI Y"
evidence_path: "e2e-evidence/hephaestus/step-02-*.txt"
observation: "..."
findings:
- severity: CRITICAL | MAJOR | MINOR
category: missing-evidence | coherence | security | assumption
reviewer: hephaestus
summary: "One-sentence description"
evidence:
- plan_file: "plan_N.md"
line_range: "..."
excerpt: "..."
suggestion: "Fix the real system such that ..."
blocks_emission: true | false
fail_root_cause: null | "<root cause summary for Metis directive on re-loop>"
blocking_issues_count: <integer>
All evidence goes under e2e-evidence/hephaestus/ with sequential naming:
e2e-evidence/
hephaestus/
build.log # Full build output
step-01-{action}-{result}.png # Screenshots (if UI)
step-02-{action}-{result}.json # API responses (if API)
step-03-{action}-{result}.txt # CLI output
evidence-inventory.txt # List of all files with byte counts
verdict.md # Human-readable verdict
Every file >0 bytes. An empty file is not evidence.
git diff captured for rollback.Hephaestus returns FAIL with a fail_root_cause field. The orchestrator:
depth = 0, depth_scores = [].validate_attempts.This is the unbounded re-loop. Invariant 5.
Atlas proceeds to emit. The PASS verdict and evidence are embedded in the XML emission under <validation><hephaestus_evidence>.
reviewer: hephaestus
verdict: PASS
summary: "Plugin installed cleanly. /anneal-temper:anneal registered. Smoke-test run emitted expected state file and XML stub."
confidence: HIGH
build:
command: "python3 scripts/validate-plugin.py ."
exit_code: 0
log_path: "e2e-evidence/hephaestus/build.log"
log_excerpt: "VALIDATION PASSED\n"
runtime:
- action: "Ran /plugin marketplace add /Users/nick/Desktop/anneal/temper"
evidence_path: "e2e-evidence/hephaestus/step-01-marketplace-add.txt"
observation: "Exit 0. stdout: 'Marketplace added: anneal-temper-dev'"
- action: "Ran /plugin install anneal-temper@anneal-temper-dev"
evidence_path: "e2e-evidence/hephaestus/step-02-install.txt"
observation: "Exit 0. /anneal-temper:anneal registered."
- action: "Invoked /anneal-temper:anneal 'smoke test'"
evidence_path: "e2e-evidence/hephaestus/step-03-smoke.txt"
observation: "Ran. Wrote .anneal/temper-state.json. Emitted stub XML."
findings: []
fail_root_cause: null
blocking_issues_count: 0
npx claudepluginhub krzemienski/anneal --plugin anneal-temperCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.