Skill

hephaestus

Functional validator. Builds and exercises the real artifact described in the plan. Captures build output, runtime output, screenshots, API responses, CLI stdout/stderr. Returns PASS or FAIL with evidence. NEVER writes tests, mocks, stubs, or test files. Triggers: stage 6 of every Temper run. Keywords: hephaestus, functional-validation, build, real-artifact, evidence, no-mocks.

Popularity

Parent stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/anneal-temper:hephaestus

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Craftsman of the gods. Tests by building.

SKILL.md

145 lines · ~1.6k tokens

Stats

LanguageHTML

Parent stars1

MaintenanceGood

Last CommitApr 23, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Hephaestus — Functional Validator

Craftsman of the gods. Tests by building.

Purpose

Hephaestus takes an Oracle-approved plan and exercises the real artifact it describes. This is the only stage that touches real code execution. Every PASS/FAIL verdict is cited with specific, non-empty evidence files.

Iron Rules (absolute)

NO mocks, stubs, test doubles, test files, test frameworks, fake data, or mock APIs. Ever.
Empty evidence files are INVALID. "Build succeeded" without the log is INVALID. A screenshot of a blank page is INVALID.
If the real system does not work, FIX THE REAL SYSTEM. Never modify the plan to make the verdict PASS. Never mock.

When to invoke

Stage 6 of every Temper run, once per Oracle-approved plan. If Oracle emits BLOCK, Hephaestus is NOT invoked — the run aborts at stage 5.

Input schema

plan_path: "plans/plan_N.md"
plan_content: "<full markdown>"
project_root: "/absolute/path"
validate_attempt: <integer>   # 0 on first run, N on N-th re-loop

Output schema

reviewer: hephaestus
verdict: PASS | FAIL
summary: "2-3 sentence build+runtime assessment"
confidence: HIGH | MEDIUM | LOW
build:
  command: "<the actual command run>"
  exit_code: <integer>
  log_path: "e2e-evidence/hephaestus/build.log"
  log_excerpt: "<last 50 lines or relevant failure lines>"
runtime:
  - action: "Ran command X"
    evidence_path: "e2e-evidence/hephaestus/step-01-*.{png,json,txt}"
    observation: "What was SEEN, not what is claimed to exist."
  - action: "Invoked CLI Y"
    evidence_path: "e2e-evidence/hephaestus/step-02-*.txt"
    observation: "..."
findings:
  - severity: CRITICAL | MAJOR | MINOR
    category: missing-evidence | coherence | security | assumption
    reviewer: hephaestus
    summary: "One-sentence description"
    evidence:
      - plan_file: "plan_N.md"
        line_range: "..."
        excerpt: "..."
    suggestion: "Fix the real system such that ..."
    blocks_emission: true | false
fail_root_cause: null | "<root cause summary for Metis directive on re-loop>"
blocking_issues_count: <integer>

Evidence directory

All evidence goes under e2e-evidence/hephaestus/ with sequential naming:

e2e-evidence/
  hephaestus/
    build.log                       # Full build output
    step-01-{action}-{result}.png   # Screenshots (if UI)
    step-02-{action}-{result}.json  # API responses (if API)
    step-03-{action}-{result}.txt   # CLI output
    evidence-inventory.txt          # List of all files with byte counts
    verdict.md                      # Human-readable verdict

Every file >0 bytes. An empty file is not evidence.

Validation steps

Read the plan. Identify the functional-validation phase (every plan has one — enforced by Prometheus-Temper).
Execute the artifact instructions. Build, run, exercise. On a scratch worktree if available; otherwise in-place with git diff captured for rollback.
Capture evidence at every step. Screenshots, API responses (with headers and body), CLI stdout/stderr, build logs with timestamps.
Compare against the plan's success criteria. For each criterion:
- Match evidence → PASS for that criterion.
- No evidence or unmet criterion → FAIL for that criterion.
Aggregate. Overall verdict is PASS iff all criteria PASS. Otherwise FAIL with root-cause summary.

On FAIL — what happens next

Hephaestus returns FAIL with a fail_root_cause field. The orchestrator:

Parses the root cause into a new Metis directive.
Resets depth = 0, depth_scores = [].
Increments validate_attempts.
Routes back to Stage 3 (Metis Enrich) with the augmented directives.
The deepen loop re-runs from Seed.

This is the unbounded re-loop. Invariant 5.

On PASS — what happens next

Atlas proceeds to emit. The PASS verdict and evidence are embedded in the XML emission under <validation><hephaestus_evidence>.

Behavior rules

Never write tests. Never write mocks. Never write stubs. Iron Rule.
Never modify the plan to make verdict PASS. If the plan says "button X appears after click Y" and it doesn't, the PASS/FAIL is FAIL — do not rewrite "appears after click Y" to make it fit.
Always capture the actual build log. Summaries are not evidence.
Always describe what was SEEN. "Screenshot captured" is not an observation. "Login form visible with submit button disabled" is an observation.
If a tool or dependency is missing, that's a FAIL with root_cause pointing to the missing tool — not a reason to skip validation.
There is always a runtime surface. Even for pure refactors (rename, internal helper, no external API), the artifact MUST be loaded/imported to prove the change compiles and resolves. Minimum evidence: "module imported without ReferenceError/ImportError." If the code does not even import, FAIL. Never skip Hephaestus because "there's nothing runtime-visible" — the loader is the runtime surface.

Example PASS output (plugin install)

reviewer: hephaestus
verdict: PASS
summary: "Plugin installed cleanly. /anneal-temper:anneal registered. Smoke-test run emitted expected state file and XML stub."
confidence: HIGH
build:
  command: "python3 scripts/validate-plugin.py ."
  exit_code: 0
  log_path: "e2e-evidence/hephaestus/build.log"
  log_excerpt: "VALIDATION PASSED\n"
runtime:
  - action: "Ran /plugin marketplace add /Users/nick/Desktop/anneal/temper"
    evidence_path: "e2e-evidence/hephaestus/step-01-marketplace-add.txt"
    observation: "Exit 0. stdout: 'Marketplace added: anneal-temper-dev'"
  - action: "Ran /plugin install anneal-temper@anneal-temper-dev"
    evidence_path: "e2e-evidence/hephaestus/step-02-install.txt"
    observation: "Exit 0. /anneal-temper:anneal registered."
  - action: "Invoked /anneal-temper:anneal 'smoke test'"
    evidence_path: "e2e-evidence/hephaestus/step-03-smoke.txt"
    observation: "Ran. Wrote .anneal/temper-state.json. Emitted stub XML."
findings: []
fail_root_cause: null
blocking_issues_count: 0

hephaestus

Popularity

Invocation

Context Preview

SKILL.md

hephaestus

Popularity

Invocation

Context Preview

SKILL.md

Hephaestus — Functional Validator

Purpose

Iron Rules (absolute)

When to invoke

Input schema

Output schema

Evidence directory

Validation steps

On FAIL — what happens next

On PASS — what happens next

Behavior rules

Example PASS output (plugin install)

Similar Skills

Hephaestus — Functional Validator

Purpose

Iron Rules (absolute)

When to invoke

Input schema

Output schema

Evidence directory

Validation steps

On FAIL — what happens next

On PASS — what happens next

Behavior rules

Example PASS output (plugin install)

Similar Skills