From anzchy-skills
Tournament-based meta-prompting skill that iteratively refines a prompt across multiple rounds using parallel Claude + Codex candidate generation, an LLM-as-judge with binary 7-criterion rubric and quoted evidence, and a synthesizer that composes the next version from the best parts of each. Use this skill when the user wants to elevate a rough or high-stakes prompt for downstream coding tasks. Trigger keywords - "tournament prompt", "iteratively refine prompt", "meta-prompting", "/jack-prompt-master", "improve this prompt with multiple rounds". Distinct from one-shot /prompt-enhance.
How this skill is triggered — by the user, by Claude, or both
Slash command
/anzchy-skills:jack-prompt-masterThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Tournament-based meta-prompting for high-stakes coding prompts. Runs **inline** within a single skill invocation (not via `/loop`).
Tournament-based meta-prompting for high-stakes coding prompts. Runs inline within a single skill invocation (not via /loop).
/jack-prompt-master <draft> or asks for a "tournament", "multi-round refinement", "meta-prompting" pass on a prompt./prompt-enhance (one-shot) instead — this skill costs ~40k–125k tokens per run.This skill produces a copy-paste prompt block. It does NOT auto-execute the prompt downstream, does NOT pipe to another sub-agent, does NOT continue the conversation as the refined prompt. The user pastes the final prompt into a fresh chat or another skill themselves.
| Knob | Default | Env override |
|---|---|---|
| MAX_ITER | heuristic (2/3/5) | JPM_MAX_ITER |
| PASS_THRESHOLD | 6 of 7 | JPM_PASS_THRESHOLD (5/6/7) |
| CONTEXT_INGEST | ON if ./CLAUDE.md exists | JPM_CONTEXT (on/off) |
| CONTEXT_BYTE_CAP | 6000 bytes | JPM_CONTEXT_CAP |
| CODEX_MODEL | gpt-5.4 | JPM_CODEX_MODEL |
| CODEX_EFFORT | medium | JPM_CODEX_EFFORT |
| CODEX_TIMEOUT_SEC | 300 | JPM_CODEX_TIMEOUT |
| PROMPTS_DIR | ./.prompts/ | JPM_PROMPTS_DIR |
| GITIGNORE_PROMPT | ask once per project | JPM_GITIGNORE |
| DRAFT_MAX_BYTES | 50000 | JPM_DRAFT_MAX |
| JUDGE_RETRY_COUNT | 1 | JPM_JUDGE_RETRY |
| SYNTH_RETRY_COUNT | 1 | JPM_SYNTH_RETRY |
AskUserQuestion answers always override env. Validate at Phase 0; abort on out-of-range.
Run these BEFORE Phase 0a. Abort cleanly on any failure with a clear message — do NOT silently degrade.
Dependency probe (Bash, one call):
command -v jq >/dev/null || { echo "ABORT: jq missing — brew install jq / apt install jq"; exit 1; }
command -v sha256sum >/dev/null || command -v shasum >/dev/null || { echo "ABORT: neither sha256sum nor shasum on PATH"; exit 1; }
command -v codex >/dev/null || echo "WARN: codex not on PATH — fallback voice will be used every round"
command -v gtimeout >/dev/null || command -v timeout >/dev/null || echo "WARN: no timeout wrapper — relying on codex internal timeout"
Task tool probe: dispatch one Task call with subagent_type: general-purpose asking the agent to reply with the single word ok. If the call errors or returns empty, abort: "ABORT: Task tool / general-purpose sub-agent unavailable."
Config validation: read all JPM_* env vars; range-check each against the Configuration table above. Out-of-range → abort with the offending knob name.
If $ARGUMENTS is empty:
AskUserQuestion with a single question: "Paste your draft prompt:" (Header: "Draft"). Accept user-supplied text.Usage: /jack-prompt-master "<draft prompt>" and exit.If $ARGUMENTS is non-empty:
DRAFT_FILE=$(mktemp /tmp/jpm-draft-XXXXXXXX.txt) and printf '%s' "$ARGUMENTS" > "$DRAFT_FILE". Never inline a multi-line draft into shell args (it shell-mangles).wc -c < "$DRAFT_FILE" > DRAFT_MAX_BYTES, warn and offer truncation at 10k chars via AUQ.bash references/heuristic.sh "$DRAFT_FILE". Capture stdout as HEURISTIC_JSON.recommended_max_iter with jq. On any parse failure: default to 3, warn the user, continue.AskUserQuestion (one call, two questions):
2 / 3 / 5 / 7. The option matching recommended_max_iter gets the (recommended) label.5 of 7 / 6 of 7 (recommended) / 7 of 7../CLAUDE.md exists, run an additional AUQ for CONTEXT_INGEST: "Adapt this prompt to the current project? Detected: ./CLAUDE.md (N lines)." Options: Yes, adapt (recommended) / No, keep portable / Yes but show me what's ingested first. If user picks "show me first", cat the ingest preview then re-ask.~/.claude/skills/prompt-master/SKILL.md against $DRAFT_FILE. Produce a structured intent block over the 9 dimensions (task / target_tool / output_format / constraints / input / context / audience / success_criteria / examples). The Phase 1 keyword heuristic is purely for bucketing — Phase 2 is the real extraction.bash references/context-ingest.sh > $CONTEXT_FILE. The script reads ./CLAUDE.md then ./AGENTS.md (priority order), truncates at 6000 bytes on the last newline within the byte window, emits a <project-context>...</project-context> fenced block. If both files missing, emit empty file and flip scope: portable. Compute CONTEXT_SHA256=$(sha256sum "$CONTEXT_FILE" | cut -d' ' -f1); if context is empty, set CONTEXT_SHA256=none.DRAFT_SHA256=$(printf '%s' "$NORM_DRAFT" | sha256sum | cut -d' ' -f1). The resume key is the tuple (DRAFT_SHA256, scope, CONTEXT_SHA256).ls .prompts/*.json 2>/dev/null and parse each sidecar with jq to find a tuple match.
terminal: false: AUQ "Resume from latest? Found <filename> — round k winner scored N/7. Continue to round k+1 / Start fresh (archive existing) / Show me the latest file first."terminal: true: AUQ "Found a completed run for this exact draft (round k, score N/7). Show the final prompt / Start fresh (archive existing) / Open the file." NEVER offer "continue from round k+1" on a terminal checkpoint..prompts/*.{md,json} to .prompts/archive/ (create archive dir; on filename collision append .dup-$(date +%s)).<intent block>\n<context block (if any)>\n<draft>. On resume from round k, seed = <intent>\n<context>\n<synth output from .prompts/round-k.md> — but only read from disk at Phase 2 startup. During an active run, in-memory state is the source of truth.For each k from 1..MAX_ITER:
Round k MUST emit a single assistant message containing two tool_use blocks:
Task (subagent_type: general-purpose) — Candidate A (Claude "rigorous engineer" voice)Bash — Codex exec (Candidate B); OR a second Task with references/fallback-voice.md as system prompt if codex is not on PATH or known broken from a prior round.Splitting A and B across two assistant messages serializes them and doubles round latency. At MAX_ITER=7 this turns a 42-min budget into ~70+ min. The judge dispatch goes in the next assistant message, after both candidate results return.
Concrete shape:
[round-k assistant message]
tool_use: Task (Candidate A — Claude sub-agent)
tool_use: Bash (Candidate B — codex exec)
[round-k results return together]
[next assistant message]
tool_use: Task (Judge — rates both)
[judge result returns]
[if not early-stop, next assistant message]
tool_use: Task (Synthesizer)
Dispatch via Task with subagent_type: general-purpose. Self-contained prompt: paste seed + voice instructions + "Output the refined prompt only. Do not start with 'Sure', 'Here's', 'Okay', or any preamble. Do not emit a scope: line."
After receiving the result, strip preamble regex ^(Sure|Here'?s|Okay|Got it)[^\n]*\n from the start.
If empty: retry once with same prompt; on second empty, mark candidate as <empty> and let the judge see it (it will FAIL all criteria).
Use the pattern in references/codex-call.md. Critical bits:
PROMPT_FILE via mktemp.codex exec -), not argv — avoids ARG_MAX.gtimeout 300 (macOS) or timeout 300 (Linux); detect at runtime.Task dispatch using references/fallback-voice.md ("contrarian senior engineer") and mark this round's B_source = claude-fallback.Apply the same preamble strip to B's output.
Dispatch one Task with the system prompt at references/judge-prompt.md and pass: rubric, both candidates, threshold. The judge must emit strict JSON per the schema in references/judge-prompt.md.
Validate every judge response with jq -e:
jq -e '
(.verdicts | length == 14)
and (.verdicts | map(select(.quote == "" or .quote == null)) | length == 0)
and (.verdicts | map(.verdict) | all(. == "PASS" or . == "FAIL"))
and (.score_A | type == "number") and (.score_A >= 0) and (.score_A <= 7)
and (.score_B | type == "number") and (.score_B >= 0) and (.score_B <= 7)
and (.winner == "A" or .winner == "B" or .winner == "tie")
' < judge_output.json > /dev/null
On jq -e non-zero: retry ONCE with reminder "Emit valid JSON only, matching the schema exactly. No commentary." On second failure: regex-extract score_A / score_B / winner and flag the round's confidence as "degraded" in the output.
If JPM_JUDGE_FIXTURE env var is set, skip the Task call and read the JSON from that path verbatim (test harness only).
max(score_A, score_B) >= PASS_THRESHOLD, the winner becomes the final prompt. Write the checkpoint with terminal: true and early_stop: true. Exit loop.max(score_A, score_B) == 0, do NOT call the synthesizer. Return the higher-scoring candidate (A on tie) with a caveat banner: ⚠️ Both candidates failed all criteria. Returning best of bad options. Write checkpoint with terminal: true. Exit loop.Task with references/synthesizer-prompt.md). Inputs: candidate A, candidate B, judge JSON. Output: v(k+1) prompt only. Apply preamble strip + length check.
Call bash references/prompts-persist.sh with the round artifact via stdin (markdown body) + sidecar JSON args. The script writes BOTH a human-readable .md and a sidecar .json — you must pass the markdown body to its stdin.
The markdown body fed to stdin MUST contain:
## Original draft
<verbatim draft>
## Project context (ingested)
<truncated CLAUDE.md / AGENTS.md content, or "none">
## Candidate A (Claude)
<full A output>
## Candidate B (codex or claude-fallback)
<full B output>
## Judge verdicts
<judge JSON, pretty-printed>
## Synthesized v(k+1) (seed for next round)
<synth output, or "n/a — early stop" or "n/a — both failed">
## Final prompt (this round's winner or synth)
<the actual final-prompt text — winner of early-stop OR synth of continuing round>
The "Final prompt" section is mandatory on every checkpoint. On early-stop or terminal rounds, this is the prompt the user receives. The body is NOT optional — never invoke prompts-persist.sh with empty stdin.
Other persist-script behavior:
.prompts/ if missing; on EROFS or chmod, falls back to $TMPDIR/.prompts-$$/ and informs the user (skip resume next time)..prompts/ to .gitignore. Remember the answer in ~/.gstack/projects/<slug>/.prompts-gitignore-prompted.YYYY-MM-DD_HHMMSS_round-k.md (human, with body above) + YYYY-MM-DD_HHMMSS_round-k.json (sidecar with frontmatter fields for resume).In-memory state remains the source of truth for the next round's seed. Disk is checkpoint-only.
If the loop completes MAX_ITER rounds without score >= PASS_THRESHOLD:
terminal: true, early_stop: false.Step 1 — Persist the final prompt to disk (mandatory, before printing).
Write a dedicated final-prompt markdown file alongside the round checkpoints. This is the canonical retrievable artifact — users should be able to cat .prompts/<timestamp>_FINAL.md after the skill exits without scanning a multi-section round file.
FINAL_PATH="${JPM_PROMPTS_DIR:-./.prompts}/$(date -u +%Y-%m-%d_%H%M%S)_FINAL.md"
cat > "$FINAL_PATH" <<EOF
---
kind: final_prompt
date: $(date -u +%Y-%m-%dT%H:%M:%SZ)
draft_sha256: $DRAFT_SHA256
scope: $SCOPE
rounds_run: $ROUNDS_RUN
final_score: $FINAL_SCORE
exit_reason: $EXIT_REASON # early_stop | max_iter | both_failed
---
# Final prompt — jack-prompt-master tournament
$FINAL_PROMPT_BODY
---
## Score history
| round | score_A | score_B | B_source | winner | synth_score |
|-------|---------|---------|----------|--------|-------------|
$SCORE_HISTORY_ROWS
## Criteria flips
$CRITERIA_FLIPS
## Caveats
$CAVEAT_BANNER_OR_NONE
EOF
echo "Final prompt saved to: $FINAL_PATH"
If .prompts/ was fallback-tmpdir'd at checkpoint time, the FINAL.md goes to the same fallback path — never silently split locations.
Step 2 — Print to the conversation:
Final prompt as a fenced markdown block (copy-paste ready). No auto-execution.
Saved-to path — one line: ✅ Saved to <FINAL_PATH> (so the user knows the file exists without scrolling).
Score history table:
| round | score_A | score_B | B_source | winner | synth_score |
|-------|---------|---------|------------------|--------|-------------|
| 1 | 4 | 5 | codex | B | 5 |
| 2 | 6 | 6 | codex | A | — |
B_source ∈ {codex, claude-fallback}. synth_score is the score the synthesizer's output would get if scored (skip column if early-stop).
Criteria flips — one line per criterion that flipped PASS↔FAIL across rounds (auditability).
Caveat banner (only if any round's B_source == claude-fallback):
⚠️ Cross-model hedge was degraded in rounds {X, Y} — Codex unavailable.
Re-run when Codex is available for a stronger tournament.
Scope tag (parent-owned, not generated by sub-agents): if scope: project, print:
ℹ️ This prompt cites <repo-name> conventions; remove the <project-context> reference if reusing elsewhere.
Load on demand:
references/rubric.md — 7 binary criteria with PASS/FAIL examples per criterion.references/judge-prompt.md — judge sub-agent system prompt + JSON schema + retry instructions.references/synthesizer-prompt.md — synthesizer sub-agent system prompt + worked example.references/codex-call.md — codex exec bash invocation pattern (stdin, gtimeout, exit codes).references/fallback-voice.md — "contrarian senior engineer" voice for when Codex is unavailable.references/heuristic.sh — Phase 1 bash draft-complexity scorer (word_count, dims_present, ambiguity_markers, recommended_max_iter).references/context-ingest.sh — reads CLAUDE.md / AGENTS.md, truncates to 6000 bytes on line boundary.references/prompts-persist.sh — writes round artifact + sidecar JSON to .prompts/; handles archive, gitignore prompt, read-only fallback./prompt-enhance (legacy, in ~/.claude/CLAUDE.md): one-shot enhancement, quick polish.prompt-master (skill at ~/.claude/skills/prompt-master/): one-shot 9-dim intent extraction → single prompt.jack-prompt-master (this skill): multi-round tournament with Codex co-author, judged with rubric, synthesized between rounds.All three coexist. No auto-deprecation, no auto-redirect. Pick based on stakes.
npx claudepluginhub anzchy/skills --plugin anzchy-skillsCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.