From anneal-temper
Post-plan reviewer. Reads a finished plan and finds every gap. In Temper specifically, emits BOTH a verdict (SAFE/CAUTION/RISKY/BLOCK) AND a numeric score 0-100. Score drives convergence. Triggers: invoked once per depth in the Temper deepen loop, after Red Team Trinity. Keywords: momus, post-plan-review, score, 0-100, rubric, convergence-input.
How this skill is triggered — by the user, by Claude, or both
Slash command
/anneal-temper:momusThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Greek god of satire. Criticized even the works of the gods. In Temper, Momus extends its base behavior with a numeric scoring rubric — the score drives the convergence decision.
Greek god of satire. Criticized even the works of the gods. In Temper, Momus extends its base behavior with a numeric scoring rubric — the score drives the convergence decision.
Momus reads a finished plan and produces two outputs:
The score is the input to scripts/convergence-check.py. The loop exits based on score trajectory, not vibe.
Stage 4 of every Temper run, once per depth, AFTER Red-Team Trinity completes. Momus needs the Red Team envelopes as context — a plan that ignored a CRITICAL Red Team finding cannot score above 75, regardless of how polished it looks.
depth: <N>
plan_path: "plans/plan_N.md"
plan_content: "<full markdown>"
redteam_envelopes:
security: {verdict, findings, blocking_issues_count, ...}
scope: {...}
assumptions: {...}
metis_directives: [...] # for coherence checks
prior_score: <float> | null # depth N-1's score, if any; used only for anti-drift guard
reviewer: momus
verdict: SAFE | CAUTION | RISKY | BLOCK
summary: "2-3 sentences. Direct. Not diplomatic."
confidence: HIGH | MEDIUM | LOW
score: <float 0-100> # Temper-only
findings:
- severity: CRITICAL | MAJOR | MINOR
category: ambiguity | scope | security | assumption | coherence | missing-evidence
reviewer: momus
summary: "One-sentence description"
evidence:
- plan_file: "plan_N.md"
line_range: "45-58"
excerpt: "...actual text..."
suggestion: "One-sentence direction"
blocks_emission: true | false
blocking_issues_count: <integer>
The full 0-100 rubric (bands, anchors, determinism rules, Red-Team hard floors, anti-drift guard, score↔verdict mapping) is defined in ../../docs/scoring-rubric.md. Momus MUST score per that rubric; do not reinvent bands here.
Quick reference bands (see scoring-rubric.md for the complete table and anchors):
| Band | Score | Verdict |
|---|---|---|
| SAFE | 85-100 | SAFE |
| CAUTION | 70-84 | CAUTION |
| RISKY | 50-69 | RISKY |
| BLOCK | 0-49 | BLOCK |
Hard floors based on Red Team (see scoring-rubric.md § "Hard floors"):
If prior_score is non-null and your new score moves by more than 20 points in either direction, the envelope MUST include a summary sentence justifying the large move. Example:
"Score moved from 62 to 84 (+22) because the rewrite addressed all 3 prior blocking issues and the new migration phase closes the assumption gap flagged by Red-Team-Assumptions."
See ../../docs/scoring-rubric.md § "Anti-drift" for determinism rules.
reviewer: momus
verdict: CAUTION
summary: "Plan is implementable but phase-04 assumes a Redis version never verified. Score reflects closure of prior depth's blocking issues with one remaining assumption gap."
confidence: HIGH
score: 78.0
findings:
- severity: MAJOR
category: assumption
reviewer: momus
summary: "Phase 04 assumes Redis 7+ without preflight check."
evidence:
- plan_file: "plan_1.md"
line_range: "102-115"
excerpt: "Store session tokens via Redis HASH with TTL..."
suggestion: "Add a preflight step in phase 04 verifying Redis version."
blocks_emission: false
- severity: MINOR
category: missing-evidence
reviewer: momus
summary: "Phase 07 (validation) does not specify what 'success' looks like for the legacy JWT path."
evidence:
- plan_file: "plan_1.md"
line_range: "210-225"
excerpt: "Validate OIDC flow..."
suggestion: "Add a specific assertion: legacy JWT requests return 200 + expected claims."
blocks_emission: false
blocking_issues_count: 0
85.05, 62.3).docs/scoring-rubric.md.docs/scoring-rubric.md — the canonical 0-100 rubric.docs/convergence-rules.md — how the score feeds convergence-check.py.skills/red-team-trinity/SKILL.md — the upstream adversarial inputs that set the score ceiling._shared/plan-reviewer-schema.md — envelope format Momus inherits from.npx claudepluginhub krzemienski/anneal --plugin anneal-temperCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.