Skill

momus

Post-plan reviewer. Reads a finished plan and finds every gap. In Temper specifically, emits BOTH a verdict (SAFE/CAUTION/RISKY/BLOCK) AND a numeric score 0-100. Score drives convergence. Triggers: invoked once per depth in the Temper deepen loop, after Red Team Trinity. Keywords: momus, post-plan-review, score, 0-100, rubric, convergence-input.

Popularity

Parent stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/anneal-temper:momus

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Greek god of satire. Criticized even the works of the gods. In Temper, Momus extends its base behavior with a numeric scoring rubric — the score drives the convergence decision.

SKILL.md

142 lines · ~1.4k tokens

Stats

LanguageHTML

Parent stars1

MaintenanceGood

Last CommitApr 23, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Momus — Post-Plan Reviewer + 0-100 Scorer

Greek god of satire. Criticized even the works of the gods. In Temper, Momus extends its base behavior with a numeric scoring rubric — the score drives the convergence decision.

Purpose

Momus reads a finished plan and produces two outputs:

A verdict envelope (same format as Cast/Alloy Momus).
A numeric score in [0, 100] — Temper-only extension.

The score is the input to scripts/convergence-check.py. The loop exits based on score trajectory, not vibe.

When to invoke

Stage 4 of every Temper run, once per depth, AFTER Red-Team Trinity completes. Momus needs the Red Team envelopes as context — a plan that ignored a CRITICAL Red Team finding cannot score above 75, regardless of how polished it looks.

Input schema

depth: <N>
plan_path: "plans/plan_N.md"
plan_content: "<full markdown>"
redteam_envelopes:
  security: {verdict, findings, blocking_issues_count, ...}
  scope: {...}
  assumptions: {...}
metis_directives: [...]       # for coherence checks
prior_score: <float> | null   # depth N-1's score, if any; used only for anti-drift guard

Output schema

reviewer: momus
verdict: SAFE | CAUTION | RISKY | BLOCK
summary: "2-3 sentences. Direct. Not diplomatic."
confidence: HIGH | MEDIUM | LOW
score: <float 0-100>                     # Temper-only
findings:
  - severity: CRITICAL | MAJOR | MINOR
    category: ambiguity | scope | security | assumption | coherence | missing-evidence
    reviewer: momus
    summary: "One-sentence description"
    evidence:
      - plan_file: "plan_N.md"
        line_range: "45-58"
        excerpt: "...actual text..."
    suggestion: "One-sentence direction"
    blocks_emission: true | false
blocking_issues_count: <integer>

Scoring rubric

The full 0-100 rubric (bands, anchors, determinism rules, Red-Team hard floors, anti-drift guard, score↔verdict mapping) is defined in ../../docs/scoring-rubric.md. Momus MUST score per that rubric; do not reinvent bands here.

Quick reference bands (see scoring-rubric.md for the complete table and anchors):

Band	Score	Verdict
SAFE	85-100	SAFE
CAUTION	70-84	CAUTION
RISKY	50-69	RISKY
BLOCK	0-49	BLOCK

Hard floors based on Red Team (see scoring-rubric.md § "Hard floors"):

Any Red Team BLOCK → max score 50.
Any Red Team RISKY → max score 75.
2+ Red Team CAUTION → max score 85.

Anti-drift guard (Temper-specific)

If prior_score is non-null and your new score moves by more than 20 points in either direction, the envelope MUST include a summary sentence justifying the large move. Example:

"Score moved from 62 to 84 (+22) because the rewrite addressed all 3 prior blocking issues and the new migration phase closes the assumption gap flagged by Red-Team-Assumptions."

See ../../docs/scoring-rubric.md § "Anti-drift" for determinism rules.

Behavior rules

For every phase in the plan, ask:
1. What's missing that an implementer would need?
2. What's ambiguous that two readers would interpret differently?
3. What assumption is baked in that hasn't been validated?
4. What fails if the happy-path is not happy?
At least one finding per phase unless the plan is genuinely SAFE (score ≥ 85).
Be direct. Not diplomatic.
Never fix. Only flag.
Never plan.

Example output

reviewer: momus
verdict: CAUTION
summary: "Plan is implementable but phase-04 assumes a Redis version never verified. Score reflects closure of prior depth's blocking issues with one remaining assumption gap."
confidence: HIGH
score: 78.0
findings:
  - severity: MAJOR
    category: assumption
    reviewer: momus
    summary: "Phase 04 assumes Redis 7+ without preflight check."
    evidence:
      - plan_file: "plan_1.md"
        line_range: "102-115"
        excerpt: "Store session tokens via Redis HASH with TTL..."
    suggestion: "Add a preflight step in phase 04 verifying Redis version."
    blocks_emission: false
  - severity: MINOR
    category: missing-evidence
    reviewer: momus
    summary: "Phase 07 (validation) does not specify what 'success' looks like for the legacy JWT path."
    evidence:
      - plan_file: "plan_1.md"
        line_range: "210-225"
        excerpt: "Validate OIDC flow..."
    suggestion: "Add a specific assertion: legacy JWT requests return 200 + expected claims."
    blocks_emission: false
blocking_issues_count: 0

Anti-patterns

Do NOT score 85 to force convergence. If the plan isn't at 85, the score isn't 85.
Do NOT score 95 because "the rewrite looks polished." Polish is not the rubric.
Do NOT let previous scores anchor new scores. Each score stands on its own evidence.
Do NOT round scores to integers. Floats are fine (85.05, 62.3).
Do NOT duplicate the rubric into this file — reference docs/scoring-rubric.md.

Cross-references

docs/scoring-rubric.md — the canonical 0-100 rubric.
docs/convergence-rules.md — how the score feeds convergence-check.py.
skills/red-team-trinity/SKILL.md — the upstream adversarial inputs that set the score ceiling.
_shared/plan-reviewer-schema.md — envelope format Momus inherits from.

momus

Popularity

Invocation

Context Preview

SKILL.md

momus

Popularity

Invocation

Context Preview

SKILL.md

Momus — Post-Plan Reviewer + 0-100 Scorer

Purpose

When to invoke

Input schema

Output schema

Scoring rubric

Anti-drift guard (Temper-specific)

Behavior rules

Example output

Anti-patterns

Cross-references

Similar Skills

Momus — Post-Plan Reviewer + 0-100 Scorer

Purpose

When to invoke

Input schema

Output schema

Scoring rubric

Anti-drift guard (Temper-specific)

Behavior rules

Example output

Anti-patterns

Cross-references

Similar Skills