Skill

gen-eval-pair

Enforce four-role separation (Planner / Writer / Evaluator / Orchestrator) when drafting + judging completed work. Runs in two modes — interactive `/gen-eval-pair <prompt>` (current session reviews planner draft + implements as writer) and ralph-loop (headless writer dispatches planner; automatic P3 acts as gate instead of human review). Both modes share the same 5-phase pipeline: planner drafts contract → evaluator reviews contract → evaluator proposes rubric additions; orchestrator applies → writer implements → evaluator scores against rubric. The project's `.harness/rubric.md` is the only rule file evaluators read. Use when setting up autonomous coding loops, when the user mentions "evaluator-rubric", "sprint contract", "球員兼裁判", "player-referee", or wants to gate work behind structured review.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/harness:gen-eval-pair

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Writer/evaluator separation for autonomous coding loops. Closes the player-referee gap: the agent that writes the code MUST NOT be the agent that judges it.

SKILL.md

311 lines · ~5.7k tokens(exceeds 5k compaction limit)

Stats

LanguageHTML

Parent stars0

MaintenanceExcellent

Last CommitMay 26, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

gen-eval-pair

Writer/evaluator separation for autonomous coding loops. Closes the player-referee gap: the agent that writes the code MUST NOT be the agent that judges it.

$ARGUMENTS 解析

tokens = split $ARGUMENTS by whitespace
first_token = first non-flag token

if first_token == "lint" → lint sub-command（contract 結構檢查）
   path = second non-flag token = contract path
if first_token == "eval" → eval sub-command（既有 contract 重新評分）
   path = second non-flag token = contract path
otherwise → default invocation（interactive full pipeline；走 5-phase）
   prompt = entire $ARGUMENTS 作為 user task description

Default invocation · full 5-phase pipeline

呼叫方式：/gen-eval-pair <prompt>

驅動 5-phase pipeline：

P1 — User feeds a feature prompt
P2 — Planner subagent drafts .harness/contract/<task-id>/contract.md
P3 — Evaluator subagent reviews contract (Accept / Revise / Insufficient context)；Revise 重派 Planner，最多 N round
P4 — Evaluator subagent proposes rubric additions from contract → Orchestrator applies to .harness/rubric.md
P5 — Writer (current session) implements per contract
P6 — Evaluator subagent scores implementation against .harness/rubric.md

Contract path 自動產生於 .harness/contract/<task-id>/（task-id 用 sequential 或 timestamp）。

Sub-command: lint

呼叫方式：/gen-eval-pair lint <contract-path>

僅 lint：檢查 contract 結構與完整性，不跑 writer / evaluator scoring。

內部呼叫 ${CLAUDE_PLUGIN_ROOT}/scripts/run.sh --contract=<path> --phase=lint。

Sub-command: eval

呼叫方式：/gen-eval-pair eval <contract-path>

僅 eval：對既有 contract + 已完成 work 跑 evaluator，跳過 lint 與 planner phase。

內部呼叫 ${CLAUDE_PLUGIN_ROOT}/scripts/run.sh --contract=<path> --phase=eval。

命名提醒：以下 Mode A / Mode B 指 pipeline topology（interactive 人在場 vs ralph-loop 自動化），與上方 $ARGUMENTS 解析的 lint / eval sub-command 是不同維度。Topology Mode A/B 命名在 REFERENCE.md 與下游 persona/template 已穩定引用。

The 5-phase pipeline (both modes)

P1  user / driver provides feature description
P2  planner drafts contract                                  (subagent in both modes)
    Mode A: .harness/contract/<task-id>/contract.md
    Mode B: .ralph/sprints/<US-id>-contract.v<n>.md
P3  evaluator reviews contract                   → Accept | Revise | Insufficient
       └─ Revise/Insufficient: Mode A = re-prompt planner in parent session;
                               Mode B = write defects to .ralph/prompt.md, retry next iteration
P4  evaluator proposes rubric additions;
    orchestrator applies (only if rubric missing fields the contract needs)
P5  writer implements per contract
P6  evaluator scores per rubric                  → Accept | Revise | Block | Incomplete
        ├─ verdict has suggestedRubricAdditions[] → orchestrator applies → re-run P6 (max 3 rounds)
        └─ no additions → final verdict, end

The hard separation — four roles, none plays another's part:

Planner drafts the contract from user prompt + project rubric. Subagent persona (agents/planner.md), read-only tools. Never writes the file directly — returns markdown via JSON for the orchestrator to persist. Avoids the "writer-is-also-contract-author" bias where contract AC drift toward the implementation already half-formed in the writer's head.
Writer implements code against the contract. NEVER touches .harness/rubric.md. NEVER produces a verdict. NEVER drafts the contract directly (the Planner subagent does that at P2).
Evaluator reads rubric + contract + screenshot + code. NEVER writes files (read-only tools). Returns structured JSON {verdict, defects[], suggestedRubricAdditions[]} — proposes only.
Orchestrator (run.sh / this skill in interactive topology) is a deterministic script. The sole author of rubric.md — applies evaluator's suggestedRubricAdditions in a separate chore(harness): extend rubric ... commit. Re-invokes evaluator.

Mode A — interactive `/gen-eval-pair <prompt>`

Use when a human is in the loop (you're working in a Claude Code session and want gated review on the work you're about to do).

Aspect	Details
Planner	Spawned via Task tool with `subagent_type: harness:planner` at P2 — fresh context. Returns contract markdown + story_id + open questions; you review in this session before persisting.
Writer	The current Claude session (you, reading this skill) — implements at P5 against the persisted contract
Evaluator	Spawned via Task tool with `subagent_type: harness:evaluator` — fresh context every phase. The same persona is dispatched for P3 / P4 / P6, distinguished by `Phase=...` in the prompt (per-phase output schemas tracked in `agents/evaluator.md`).
Pause points	After Planner returns (you review draft); after P3 if Revise; after P6 if Revise/Block — you ask the user how to proceed

Workflow you (assistant) follow when `/gen-eval-pair <prompt>` is invoked

Treat <prompt> as the feature description for P1.
P2 — Planner drafts contract. Read <project>/.harness/rubric.md if it exists (path passed to planner). Pick <US-id> from prd.json if integrated with ralph; otherwise slug from <prompt>. Dispatch:
```
Task({
  subagent_type: "harness:planner",
  description: "Draft sprint contract",
  prompt: "Phase=draft-contract. user_prompt=<prompt>. rubric=<path>. suggested_story_id=<US-id>. Return JSON {story_id, version, title, contract_md, rubric_coverage_notes, open_questions}."
})
```
Review the returned contract_md in this session (you, parent agent). If open_questions is non-empty, surface them to the user before persisting. Make any tightening edits, then write to .harness/contract/<task-id>/contract.md. Versioning: subsequent revisions overwrite the same file; if you need to preserve drafts, suffix as contract.v2.md, contract.v3.md in the same directory.

P3 — evaluator reviews contract. Dispatch:

Task({
  subagent_type: "harness:evaluator",
  description: "Review sprint contract",
  prompt: "Phase=contract-review. Read <contract path>. Read <rubric path if exists>. Verify every AC is falsifiable + measurable + traceable to rubric rules. Return JSON {verdict, defects[]}."
})

Accept → continue to P4.
Revise → show defects to user, revise contract, re-dispatch.

P4 — propose rubric additions. Evaluator 先依下表偵測情境再分支執行；orchestrator (assistant qua orchestrator) 為唯一 rubric 寫入者，每筆變更獨立 commit；Writer 永遠不碰 rubric。

情境偵測：以 .harness/rubric.md 是否存在且非空為判準。

情境	偵測	Evaluator 行為（`Phase=propose-rubric`）	Orchestrator 輸出
Greenfield	`.harness/rubric.md` 不存在或為空	依 `templates/rubric.global.example.md` 八個 sections 從 contract 推導完整 initial rubric（Design References / Page Type Wrappers / Forbidden Patterns / Composition Assertions / Scoring Dimensions / Process Rules / Verdict Rules / Evolution Log）	新建 `.harness/rubric.md`，獨立 commit：`chore(harness): seed rubric — from <task-id>`
Brownfield	`.harness/rubric.md` 存在且非空	比對 contract 需求 vs 現有 rubric 覆蓋度，返回缺漏的 `suggestedRubricAdditions[]`	逐筆 append 至既有 `.harness/rubric.md`，每筆獨立 commit：`chore(harness): extend rubric — <reason>`

兩情境共用同一 evaluator dispatch（Phase=propose-rubric），由 evaluator 依輸入 rubric 路徑是否可讀且非空自行判斷分支。Greenfield seeding 與 brownfield extension 的 commit 都必須與 writer 的 story commit 分離。

P5 — implement. Write the code per contract. Run project Quality Requirements (pnpm svelte-check, lint, tests, etc.).

P6 — evaluator scores. Dispatch:

Task({
  subagent_type: "harness:evaluator",
  description: "Score implementation",
  prompt: "Phase=score. Read .harness/rubric.md (your ONLY rule file). Read <contract>. Read <screenshot>. Return JSON {verdict, defects[], suggestedRubricAdditions[]}."
})

suggestedRubricAdditions non-empty → orchestrator applies → re-dispatch (max 3 rounds).
Final verdict Accept → done. Revise → show, fix, re-dispatch. Block / Incomplete → halt + ask user.

Mode B — ralph-loop (`bun run .ralph/ralph.ts`)

Use when fully autonomous. The driver spawns a fresh writer agent each iteration; the writer dispatches the planner + evaluator (subagents OR external CLI).

Path convention: Mode B uses .ralph/sprints/ and .ralph/prompt.md exclusively. Mode A artefacts live under .harness/.

Aspect	Details
Planner	Dispatched from within writer's process via Task — `subagent_type: harness:planner`. No human review in this mode — P3 contract-review (run automatically) is the gate.
Writer	Headless agent spawned by `.ralph/ralph.ts` (`claude -p` / `copilot --yolo` / `gemini -p`) — fresh process per iteration; implements at P5
Evaluator	Dispatched from within writer's process — subagent via Task or external CLI (e.g. `copilot --model gpt-5.4`) per project config
Pause points	None — Revise leaves story `in_progress` for next iteration; Block flips story to `blocked`

Mode B P2 flow (no human in the loop)

The writer's prompt (.ralph/prompt.md) directs it through:

Read user-story prompt from prd.json (or wherever ralph stores it).
Dispatch Planner subagent (Task with subagent_type: harness:planner). Receive contract draft.
Write contract to .ralph/sprints/<story_id>-contract.v<n>.md without human review.
Call <plugin-target>/run.sh --contract=<path> --phase=contract-review → automatic P3 gate.
P3 verdict:
- Accept → continue to P4/P5/P6 (call run.sh --phase=all or equivalent).
- Revise / Insufficient → write the P3 defects + planner's open_questions into .ralph/prompt.md (or a story-specific feedback file), mark the story needs-revision, terminate this iteration. Next ralph iteration sees the feedback and re-dispatches Planner.

# Mode B example
<plugin-target>/run.sh --contract=.ralph/sprints/<US-id>-contract.md

run.sh orchestrates P3→P6. Phase flags (--phase=contract-review|propose-rubric|score|all) let the writer call subsets. Default all runs the full pipeline.

Why automatic P3 substitutes for human review

In Mode A, the human reviews the Planner's draft to catch: vague AC, missing verification, scope creep, contract-evaluator-injection. These are the exact things P3 contract-review (already implemented) is designed to catch. Mode B just makes the gate explicit and automatic instead of relying on a human pause. See REFERENCE.md "Planner dispatch — Mode A vs Mode B topology" for the full failure-mode mapping.

Rule file: default single `rubric.md`

The plugin does NOT ship runtime rules — only schema and examples. Each project owns its own rule file at:

<project>/.harness/
└── rubric.md          ← single file, mutates in place; git history is the audit trail

Why single-layer by default: at any given moment the evaluator only sees one rubric. Splitting "this rule is cross-story vs story-specific" requires N stories of evidence — premature classification at story 1 produces guess-based labels that bias toward task and never grow global. Start single; opt into the two-layer split when you have evidence it's needed (see Advanced below).

Example shipped by plugin (copy into project + customize):

templates/rubric.global.example.md — sections: Design References / Page Type Wrappers / Forbidden Patterns / Composition Assertions / Scoring Dimensions / Process Rules / Verdict Rules / Evolution Log. Use as the seed for .harness/rubric.md.

suggestedRubricAdditions schema (returned by evaluator at P4 / P6):

{
  "id": "<short-id>",
  "section": "Forbidden Patterns" | "Composition Assertions" | "Page Type Wrappers" | "Required Components" | "Process Rules" | "Scoring Dimensions" | "Verdict Rules",
  "row": { "<col>": "<val>" },
  "reasonShort": "<≤100 chars>"
}

Orchestrator appends each addition to rubric.md in place. Each write is a separate chore(harness): extend rubric — <reason> commit, never bundled with writer's story commit.

The hard rule

Writer never modifies .harness/rubric.md. Enforced via six layers:

Subagent mode: evaluator persona (agents/evaluator.md) uses Read/Grep/Glob/WebFetch — no Write/Edit. It proposes additions, doesn't apply them.
External mode: external CLI runs read-only via copilot --model <X> --allow-tool Read. Same restriction.
Orchestrator-only writes: only run.sh (or this skill in interactive topology) applies suggestedRubricAdditions, in its own commit.
Commit-topology tamper detection: rubric.ts startup checks (a) writer's HEAD commit doesn't touch rubric.md, (b) writer's working tree doesn't dirty rubric.md, (c) contract's Expected files touched doesn't list rubric.md. Any violation → Block.
Nonce + run-trace tamper detection: each evaluator spawn generates a random nonce, embeds it in the prompt, requires the evaluator to echo it, and pairs it with a <contract>.run-trace.json sidecar (exit code, timestamps, stdout hash). rubric.ts rejects visual reviews whose latest section doesn't match the trace nonce. Defends against synthetic visual-review.md files. See REFERENCE.md "Tamper hardening — nonce + run-trace".
Addition gate: update-rubric.ts runs each addition's verification_command against <fixtures-dir>/known-good (must exit 0) and known-broken (must exit non-0). Rules that don't discriminate are rejected before commit. Modes: --gate=off|lenient|strict (default lenient). See REFERENCE.md "Addition gate".

Plus the global model rule:

External mode rejects --model containing claude (writer is assumed Claude). Subagent mode is read-only by tool restriction.

Current implementation status

⚠️ The 5-phase pipeline above is the target architecture. Current scripts implement P3 (lint-contract.ts, regex-only) and P6 (rubric.ts + visual-review.ts). Pending work:

Phase	Status
P3 contract-review	✏️ Deterministic lint shipped (`scripts/lint-contract.ts`); LLM-semantic review prompt available at `templates/prompts/contract-review-prompt.md` but no dispatcher script — caller-driven
P4 propose-rubric	✏️ Apply step shipped (`scripts/update-rubric.ts`); evaluator dispatch is caller-driven (drop JSON at `<contract>.propose-rubric.json`, then `run.sh --phase=propose-rubric`)
`agents/evaluator.md` per-phase schemas	✅ Complete (P3 / P4 / P6 each with own input + output schema)
`.harness/rubric.md` parser	✅ Shipped (`scripts/parse-rubric.ts`); `rubric.ts` reads project rubric's `## Scoring Dimensions` and `## Verdict Rules` to drive dim list + thresholds. Falls back to built-in 7-dim if neither section present. See REFERENCE.md "Scoring Dimensions + Verdict Rules — parser spec".
`run.sh --phase=` separation	✅ Complete (`contract-review` / `propose-rubric` / `score` / `all`; `lint` aliased with deprecation warning)
Re-score loop (P6 iteration)	✅ Shipped in `run.sh` `run_p6` — up to 3 rounds; each iteration runs visual-review + rubric.ts, applies `suggestedRubricAdditions[]` via `update-rubric.ts`, exits early on convergence (all-duplicates → exit code 3) or hard cap. See REFERENCE.md "Orchestrator algorithm".
Smoke test	✅ Shipped (`scripts/smoke-test.ts`); three modes (`baseline` / `project` / `both`). `init.ts` runs baseline at install; re-run with `--mode=project` after editing rubric.md or adding fixtures. See REFERENCE.md "Smoke test".
P2 Planner role	✅ Shipped — persona `agents/planner.md` + prompt template `templates/prompts/planner-prompt.md`. Mode A: assistant dispatches Planner subagent and reviews draft in parent session before persisting. Mode B (ralph): writer dispatches Planner, persists without human review, automatic P3 contract-review acts as gate; Revise/Insufficient → defects + open_questions written to `.ralph/prompt.md` for next iteration. See REFERENCE.md "Planner dispatch — Mode A vs Mode B topology".
split-rubric CLI	✅ Shipped (`scripts/split-rubric.ts`); single → two-layer migration in one commit. Classifies rules by Evolution Log recurrence (0 stories → global; 1 story → task; 2+ → global). `--dry-run` / `--yes` / interactive. See REFERENCE.md "Migration: single → two-layer".
Data-driven promotion	✅ Shipped (`scripts/rubric-stats.ts`); `update-rubric.ts` auto-records every task-layer apply to `<target>/.rubric-stats.json`. Run `rubric-stats.ts --propose` to surface rules in ≥N stories as promotion candidates. See REFERENCE.md "Data-driven promotion".
`uncheckedStates[]` in P6 verdict	✅ Shipped (variant B). Evaluator MUST declare every (state, viewport) it failed to cover. `rubric.ts` surfaces gaps in `score.json`. `run.sh --strict-unchecked` downgrades Accept → Incomplete when non-empty; default lenient mode preserves verdict. See REFERENCE.md "uncheckedStates handling".
`Tested across:` per AC	✅ Shipped (variant C). Planner persona + prompt template + sprint-contract template all require each AC to declare `Tested across: viewports=[..], states=[..]`. `lint-contract.ts` enforces; `--allow-default-tested-across` demotes to warning during migration.
Interactive evaluator	✅ Shipped (variant A). `agents/evaluator.md` tools extended with `Bash` / `Write` / `mcp__playwright__*` plus strict path + command whitelists. `run.sh --evaluator-probe=passive\|interactive` (default passive) gates the capability; subagent dispatcher renders mode-specific prompt section; external (copilot) dispatcher silently downgrades interactive → passive. See REFERENCE.md "Probe mode".

Use this skill as documentation of the target flow while building it out story by story.

Advanced

See REFERENCE.md for rubric.md grammar specifics, parsing rules, orchestrator algorithm, Planner dispatch topology (Mode A vs Mode B), smoke test modes, tamper hardening (nonce + run-trace), addition gate (verification_command + fixtures), troubleshooting (Windows MSYS / bun vs Node / timeout tuning), the rubric extension audit-log format, migration single → two-layer (split-rubric.ts), data-driven promotion (rubric-stats.ts), and the opt-in two-layer rubric (rubric.global.md + tasks/<US-id>-rubric.task.v<n>.md) for projects that have outgrown the single-file default and want cross-story vs per-story rule isolation.

Appendix · Prompt Templates

下列為各 phase 的 canonical Task() dispatch 結構。Mode A workflow 段（step 2/3/4/6）內的 Task() 範例為精簡版以維持行內可讀；當你需要決定 description 粒度、prompt 應帶入哪些 context、或在多輪 review 中如何重派時，以本附錄為準。完整 JSON output schema 不在此重複——統一定義於 agents/planner.md 與 agents/evaluator.md。本附錄僅針對 Mode A subagent dispatch；Mode B 走 run.sh --phase= 不適用此節（其 prompt 由 templates/prompts/ 變數注入式 template 處理）。

P3 contract-review dispatch

Description 範例（一行給 Task() description 欄位）：

首輪：Review sprint contract
二輪以後：Review sprint contract (revise round <n>)

Prompt 結構：

Phase=contract-review
contract: .harness/contract/<task-id>/contract.md
rubric: .harness/rubric.md      # 若不存在，明寫 "not yet created"
previous_verdict: <略，僅 revise 重派時附>
  - verdict: Revise
  - defects: [<前輪 defects[] 完整 JSON>]
  - revised_sections: [<本輪 contract 已修正的 section 名>]
Return JSON per agents/evaluator.md Phase: contract-review schema.

多輪派遣注意事項：P3 重派時 contract 應已寫回同一檔（覆寫或 contract.v2.md、contract.v3.md），prompt 內必須完整附帶前輪 verdict 的 defects[] JSON 以及一句 revised_sections 列表，讓 evaluator 能對焦驗證「前輪 defects 是否真被修掉」，而不是把新 contract 當作首次審查重來。previous_verdict 在首輪 dispatch 時省略整段。

P4 propose-rubric dispatch

Description 範例：

Greenfield（.harness/rubric.md 不存在或為空）：Seed rubric from contract
Brownfield（.harness/rubric.md 已存在且非空）：Propose rubric additions

Prompt 結構：

Phase=propose-rubric
contract: .harness/contract/<task-id>/contract.md
rubric: .harness/rubric.md                                # 路徑必傳；evaluator 自行判定 greenfield/brownfield
mode: single                                              # 或 two-layer（見 Advanced）
template_reference: templates/rubric.global.example.md    # 無條件帶入；greenfield 用以推導完整 rubric、brownfield 用以對齊 section 命名
Return JSON per agents/evaluator.md Phase: propose-rubric schema (suggestedRubricAdditions[]).

多輪派遣注意事項：P4 通常單輪（evaluator 一次回出 suggestedRubricAdditions[]，orchestrator apply 後即進 P5）。P6 觸發的 re-score 回饋會回到 P6 而非 P4——P6 verdict 內若帶 suggestedRubricAdditions[] 由 orchestrator 直接 apply、再 dispatch P6（最多 3 round），不會回頭重派 P4。Greenfield/brownfield 分支邏輯由 evaluator 依 rubric 路徑可讀且非空與否自行判斷（見 Mode A step 4 表格），dispatch 端不負責切換 phase token。

P6 score dispatch

Description 範例：

首輪：Score implementation
二輪以後（addition apply 後重派）：Re-score after rubric extension (round <n>)

Prompt 結構：

Phase=score
contract: .harness/contract/<task-id>/contract.md
rubric: .harness/rubric.md
screenshot: .harness/contract/<task-id>/screenshot.png   # 若有；fullPage + viewport 都附時逐行列
design_system: [DESIGN.md, MAPPING.md, src/lib/layout.css]
source: [<writer 本輪改動的檔案清單>]
previous_round: <略，僅 re-score 時附>
  - applied_additions: [<上輪 suggestedRubricAdditions[] 已 apply 的 id list>]
  - prior_verdict: <上輪 verdict>
Return JSON per agents/evaluator.md Phase: score schema (verdict, defects[], suggestedRubricAdditions[]).

多輪派遣注意事項：P6 re-score 由 orchestrator apply suggestedRubricAdditions[] 後觸發（最多 3 round，收斂條件為 additions 全為 duplicates；見 REFERENCE.md「Orchestrator algorithm」）。重派時 prompt 必須帶 previous_round.applied_additions 的 id 列表——讓 evaluator 知道「這次 rubric 已新增 X / Y / Z 規則」，避免它再次提出同樣的 addition 造成 round-trip 浪費。prior_verdict 一併附上方便 evaluator 對齊改進方向，但不要附前輪完整 defects——P6 是針對最新 rubric + 最新 screenshot 的獨立評分，前輪 defects 可能已不再適用。

gen-eval-pair

Invocation

Context Preview

SKILL.md

gen-eval-pair

Invocation

Context Preview

SKILL.md

gen-eval-pair

$ARGUMENTS 解析

Default invocation · full 5-phase pipeline

Sub-command: lint

Sub-command: eval

The 5-phase pipeline (both modes)

Mode A — interactive /gen-eval-pair <prompt>

Workflow you (assistant) follow when /gen-eval-pair <prompt> is invoked

Mode B — ralph-loop (bun run .ralph/ralph.ts)

Mode B P2 flow (no human in the loop)

Why automatic P3 substitutes for human review

Rule file: default single rubric.md

The hard rule

Current implementation status

Advanced

Appendix · Prompt Templates

P3 contract-review dispatch

P4 propose-rubric dispatch

P6 score dispatch

Similar Skills

gen-eval-pair

$ARGUMENTS 解析

Default invocation · full 5-phase pipeline

Sub-command: lint

Sub-command: eval

The 5-phase pipeline (both modes)

Mode A — interactive /gen-eval-pair <prompt>

Workflow you (assistant) follow when /gen-eval-pair <prompt> is invoked

Mode B — ralph-loop (bun run .ralph/ralph.ts)

Mode B P2 flow (no human in the loop)

Why automatic P3 substitutes for human review

Rule file: default single rubric.md

The hard rule

Current implementation status

Advanced

Appendix · Prompt Templates

P3 contract-review dispatch

P4 propose-rubric dispatch

P6 score dispatch

Similar Skills

Mode A — interactive `/gen-eval-pair <prompt>`

Workflow you (assistant) follow when `/gen-eval-pair <prompt>` is invoked

Mode B — ralph-loop (`bun run .ralph/ralph.ts`)

Rule file: default single `rubric.md`

Mode A — interactive `/gen-eval-pair <prompt>`

Workflow you (assistant) follow when `/gen-eval-pair <prompt>` is invoked

Mode B — ralph-loop (`bun run .ralph/ralph.ts`)

Rule file: default single `rubric.md`