From arbiter
Multi-AI orchestrator — dispatch to Codex CLI and Gemini CLI for review, ask, implement, panel, quorum, continue, diverge, and doctor modes. Diverge generates 3 independent implementations in isolated worktrees with different strategy hints, then compares and selects the best solution. Doctor checks provider health and reports partial success when some providers are unavailable. Use when user says "/arbiter", "/codex", "ask codex", "ask gemini", "codex review", "gemini review", "panel mode", "compare AIs", "second opinion from codex", "second opinion from gemini", "quorum", "go/no-go", "consensus vote", "verify", "fact-check", "cross-check", "is this true", or wants to delegate work to an external AI CLI (Codex, Gemini). Arbiter dispatches to multiple providers and presents compared results.
How this skill is triggered — by the user, by Claude, or both
Slash command
/arbiter:arbiterThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
- **User**: domain knowledge, final decisions
| Mode | Codex Command | Gemini Command | Default |
|---|---|---|---|
| ask | codex exec --ephemeral -C "$PWD" $CODEX_PROFILE_FLAG --output-last-message "$OUT" "<prompt>" | gemini -p "<prompt>" -o text | codex |
| review | codex review [flags] | git diff | gemini -p "<review_prompt>" -o text | codex |
| implement | codex exec --full-auto -C "<worktree>" "<spec>" | (cd "<worktree>" && gemini -p "<spec>" -y -o text) | codex |
| continue | Not supported in v1 | gemini -r latest -p "<follow-up>" -o text | gemini |
| diverge | 3 worktrees + codex/gemini | 3 worktrees + codex/gemini | all |
| doctor | N/A | N/A | N/A |
Note: quorum and verify are not in this table — they are multi-provider protocols, not single-provider dispatch. Quorum always runs both providers (see quorum mode). Verify always runs both providers (see verify mode).
Before provider CLI calls, resolve model from config if available:
_PROVIDERS_CONFIG="${EMPORIUM_PROVIDERS_CONFIG:-$HOME/.claude/emporium-providers.local.md}"
_resolve_model() {
local task="$1" provider="$2"
[ -r "$_PROVIDERS_CONFIG" ] || { echo ""; return 0; }
python3 -c "
import re, sys, os
config_path = os.environ.get('EMPORIUM_PROVIDERS_CONFIG', os.path.expanduser('~/.claude/emporium-providers.local.md'))
try:
text = open(config_path).read()
except Exception:
sys.exit(0)
if text.startswith('\ufeff'):
text = text[1:]
m = re.match(r'^---\s*\n(.*?)\n---', text, re.DOTALL)
if not m:
sys.exit(0)
def parse_yaml(yaml_text):
result = {}
stack = [(-1, result)]
for line in yaml_text.splitlines():
if not line.strip() or line.strip().startswith('#'):
continue
indent = len(line) - len(line.lstrip())
kv = line.strip()
while len(stack) > 1 and indent <= stack[-1][0]:
stack.pop()
if ':' not in kv:
if kv.startswith('- '):
v = kv[2:].strip().strip('\"')
d = stack[-1][1]
for k in reversed(list(d.keys())):
if isinstance(d[k], list):
d[k].append(v)
break
continue
key, _, value = kv.partition(':')
key = key.strip()
value = value.strip().strip('\"')
d = stack[-1][1]
if not value:
nd = {}
d[key] = nd
stack.append((indent, nd))
else:
try: d[key] = int(value)
except ValueError:
if value.lower() in ('true','false'): d[key] = value.lower() == 'true'
else: d[key] = value
return result
try:
import yaml
cfg = yaml.safe_load(m.group(1)) or {}
except ImportError:
cfg = parse_yaml(m.group(1))
task = sys.argv[1]
provider = sys.argv[2]
route = cfg.get('routing', {})
model = ''
if task in route and isinstance(route[task], dict) and provider in route[task]:
model = str(route[task][provider])
elif 'default' in route and isinstance(route['default'], dict) and provider in route['default']:
model = str(route['default'][provider])
else:
pd = cfg.get('defaults', {}).get(provider, {})
if isinstance(pd, dict): model = str(pd.get('model', ''))
if model and not re.match(r'^[A-Za-z0-9._:-]+$', model): model = ''
print(model, end='')
" "$task" "$provider" 2>/dev/null
}
Usage pattern:
MODEL=$(_resolve_model "<task>" "<provider>")
MODEL_FLAG=""
[ -n "$MODEL" ] && MODEL_FLAG="--model $MODEL"
Codex -p (profile) is optional. If ARBITER_CODEX_PROFILE is set, it is passed to all codex exec calls. If unset, codex uses its default profile.
CODEX_PROFILE_FLAG=""
[ -n "${ARBITER_CODEX_PROFILE:-}" ] && CODEX_PROFILE_FLAG="-p $ARBITER_CODEX_PROFILE"
NOTE: This version passes task/provider as sys.argv instead of bash string interpolation (addresses the security concern from signum code review).
For Gemini CLI, verified headless model strings in this environment are auto-gemini-3, gemini-3.1-pro-preview, gemini-3-flash-preview, gemini-2.5-pro, gemini-2.5-flash, pro, and flash. If Gemini 3 preview is not enabled for the account, fall back to pro / flash or gemini-2.5-*.
--model with aliases like haiku, sonnet, and opus. For nested Claude CLI calls, always sanitize the environment first: env -u ANTHROPIC_API_KEY -u CLAUDECODE claude ... so subscription auth is not shadowed by an unrelated API-key env var.-p for profiles (fast, default, full). Profile is optional — set ARBITER_CODEX_PROFILE env var to enable. -c model_reasoning_effort=high for reasoning override.-o text always (not -o json — that wraps in transport envelope). -y for auto-accept in implement.--model. For Gemini 3 routing, prefer verified headless strings like auto-gemini-3, gemini-3.1-pro-preview, and gemini-3-flash-preview. Use pro / flash or gemini-2.5-* as fallback when preview access is unavailable./arbiter review → codex review (uncommitted)
/arbiter review --via gemini → gemini review via diff pipe
/arbiter review --base main → PR review
/arbiter review --commit abc123 → specific commit review
/arbiter review "focus on SQL injection" → custom instructions
/arbiter ask "question" → default provider (codex)
/arbiter ask --via gemini "question" → explicit provider
/arbiter ask --with-diff "question" → include git diff as context
/arbiter implement "spec" → codex (default)
/arbiter implement --via gemini "spec" → gemini in yolo mode
/arbiter panel "question" → ask BOTH providers, compare
/arbiter quorum "question or proposal" → 2-provider vote + Claude synthesis
/arbiter quorum --with-diff "question" → include git diff as context
/arbiter verify "question or claim" → 3-round cross-verification
/arbiter verify --with-diff "question" → include git diff as context
/arbiter continue "follow-up" → gemini only (codex not supported in v1)
/arbiter → show help + provider status
/arbiter diverge "spec" → 3 providers, default strategies
/arbiter diverge --lite "spec" → design-level only, no code
/arbiter diverge --strategies "a,b,c" "spec" → custom strategy hints
/arbiter diverge --providers "claude,codex" → specific providers
/arbiter diverge --timeout 600 "spec" → custom timeout (default: 300s)
/arbiter diverge --run-tests "spec" → test all 3 before evaluation
/arbiter doctor → health check (binary + smoke API)
Parse the user's /arbiter invocation by matching these patterns in order:
/arbiter → review|ask|implement|panel|quorum|verify|continue|diverge|doctor--via <provider>: if present, set provider (codex|gemini). Not applicable to quorum (always both).--base <branch>, --commit <sha>--with-diff--with-diff--with-diff--lite, --strategies <csv>, --providers <csv>, --timeout <seconds>, --run-testsConflict rules:
--base + --commit → error: "Choose one: --base or --commit"--via unknown → error: "Unknown provider. Available: codex, gemini"quorum --via <any> → error: "Quorum always runs both providers. No --via flag."verify --via <any> → error: "Verify always runs both providers. No --via flag."continue --via codex → error: "Continue not supported for codex in v1. Use --via gemini."diverge --providers with only 1 provider → warn: "Only 1 provider. Results will show 3 strategies from 1 model."diverge --lite --run-tests → error: "Cannot run tests in lite mode (no code)."Purpose: Independent code review from an external AI.
Flow:
git diff --stat + git diff --cached --stat)--base <branch>: PR review (git diff <branch>...HEAD)--commit <sha>: specific commit (git diff <sha>~1..<sha>)Codex (default):
# NOTE: codex review may not support --model. If unsupported, omit MODEL_FLAG.
MODEL=$(_resolve_model "review" "codex")
MODEL_FLAG=""; [ -n "$MODEL" ] && MODEL_FLAG="--model $MODEL"
codex review $MODEL_FLAG 2>&1 # uncommitted
codex review $MODEL_FLAG --base main 2>&1 # PR review
codex review $MODEL_FLAG --commit abc123 2>&1 # specific commit
codex review $MODEL_FLAG "focus on auth boundaries" 2>&1 # custom instructions
codex review $MODEL_FLAG -c model_reasoning_effort=high 2>&1 # override reasoning
NOTE: codex review does NOT support -p (profile). Use -c for overrides. --model support is unconfirmed — test with codex review --help and remove MODEL_FLAG if unsupported.
Gemini (--via gemini):
# Determine diff source — include both unstaged and staged changes
# Validate refs before use to prevent bad-ref failures showing as empty diffs
if [ -n "$BASE" ]; then
git rev-parse --verify "$BASE" >/dev/null 2>&1 || { echo "Error: invalid base ref '$BASE'"; exit 1; }
DIFF_ARGS=("${BASE}...HEAD")
elif [ -n "$COMMIT" ]; then
git rev-parse --verify "$COMMIT" >/dev/null 2>&1 || { echo "Error: invalid commit '$COMMIT'"; exit 1; }
DIFF_ARGS=("${COMMIT}~1..${COMMIT}")
else
# Default: unstaged + staged combined
DIFF_ARGS=("HEAD")
fi
DIFF_LINES=$(git diff "${DIFF_ARGS[@]}" | wc -l)
if [ "$DIFF_LINES" -eq 0 ]; then echo "No changes to review."; exit 0; fi
MODEL=$(_resolve_model "review" "gemini")
MODEL_FLAG=""; [ -n "$MODEL" ] && MODEL_FLAG="--model $MODEL"
ERR=$(mktemp)
# Small diff (<300 lines) — single call
RESP=$(git diff "${DIFF_ARGS[@]}" | gemini $MODEL_FLAG -p "You are a code reviewer. Review the diff on stdin. \
Report ONLY: bugs, security issues, performance problems. Skip style. \
Format: list each finding as '- [file:line] severity: description'" -o text 2>"$ERR")
# Standard Gemini error handling (see Error Handling §7) — exit 1 on failure
# Large diff (>300 lines) — MUST chunk by file
ERR_CHUNK=$(mktemp)
git diff "${DIFF_ARGS[@]}" --name-only -z | while IFS= read -r -d '' FILE; do
RESP=$(git diff "${DIFF_ARGS[@]}" -- "$FILE" | gemini $MODEL_FLAG -p "Review this diff for $FILE. Report bugs, security, performance only." -o text 2>"$ERR_CHUNK")
# Standard Gemini error handling per chunk
# collect per-file findings
done
rm -f "$ERR_CHUNK"
## Review Findings (<provider>)
### Bugs
- [file:line] description
### Security
- [file:line] description
### Performance
- [file:line] description
### Summary
X issues found. Style issues filtered out.
Purpose: Get an external AI's opinion for debate or second perspective.
Flow:
Codex (default):
MODEL=$(_resolve_model "ask" "codex")
MODEL_FLAG=""; [ -n "$MODEL" ] && MODEL_FLAG="--model $MODEL"
OUT=$(mktemp) || { echo "Cannot create temp file"; return 1; }
codex exec --ephemeral -C "$PWD" $CODEX_PROFILE_FLAG $MODEL_FLAG \
--output-last-message "$OUT" "<prompt>" 2>&1
ANSWER=$(cat "$OUT"); rm -f "$OUT"
Codex with --with-diff:
MODEL=$(_resolve_model "ask" "codex")
MODEL_FLAG=""; [ -n "$MODEL" ] && MODEL_FLAG="--model $MODEL"
OUT=$(mktemp)
{ echo "Context (current changes):"; git diff HEAD; echo ""; echo "Question: <prompt>"; } | \
codex exec --ephemeral -C "$PWD" $CODEX_PROFILE_FLAG $MODEL_FLAG \
--output-last-message "$OUT" - 2>&1
ANSWER=$(cat "$OUT"); rm -f "$OUT"
Gemini (--via gemini):
MODEL=$(_resolve_model "ask" "gemini")
MODEL_FLAG=""; [ -n "$MODEL" ] && MODEL_FLAG="--model $MODEL"
ERR=$(mktemp)
RESP=$(gemini $MODEL_FLAG -p "<prompt>" -o text 2>"$ERR")
# Standard Gemini error handling (see Error Handling §7) — return 1 on failure
Gemini with --with-diff:
MODEL=$(_resolve_model "ask" "gemini")
MODEL_FLAG=""; [ -n "$MODEL" ] && MODEL_FLAG="--model $MODEL"
ERR=$(mktemp)
RESP=$(git diff | gemini $MODEL_FLAG -p "<prompt>" -o text 2>"$ERR")
# Standard Gemini error handling (see Error Handling §7) — return 1 on failure
## Two Perspectives
### <Provider> (<model>)
[Provider's answer, summarized]
### Claude — Own Analysis
[Your own independent analysis of the same question]
### Key Differences
[Where the two perspectives diverge]
### Recommendation
[Your synthesis — what you'd recommend and why]
Purpose: Delegate implementation to an external AI in an isolated worktree.
Flow:
BRANCH="arbiter-$(date +%s)-$$"
WORKTREE=$(mktemp -d "/tmp/arbiter-XXXXXX")
git worktree add "$WORKTREE" -b "$BRANCH" HEAD
Codex (default):
MODEL=$(_resolve_model "implement" "codex")
MODEL_FLAG=""; [ -n "$MODEL" ] && MODEL_FLAG="--model $MODEL"
codex exec --full-auto -C "$WORKTREE" $MODEL_FLAG "<spec>" 2>&1
Gemini (--via gemini):
MODEL=$(_resolve_model "implement" "gemini")
MODEL_FLAG=""; [ -n "$MODEL" ] && MODEL_FLAG="--model $MODEL"
ERR=$(mktemp)
# IMPORTANT: subshell to avoid changing Claude's CWD
(cd "$WORKTREE" && gemini $MODEL_FLAG -p "<spec>" -y -o text 2>"$ERR")
GEMINI_EXIT=$?
if [ $GEMINI_EXIT -ne 0 ]; then
if grep -qi "auth\|login\|403\|401\|credentials" "$ERR"; then
echo "GEMINI_ERROR: auth — run: gemini login"
else
echo "GEMINI_ERROR: exit $GEMINI_EXIT"; cat "$ERR"
fi
fi
rm -f "$ERR"
git -C "$WORKTREE" diff HEAD
## Implementation (<provider>)
### Files Changed
[list of files with +/- lines]
### Diff
[key changes, summarized — full diff available on request]
### Claude's Assessment
[Your review: correctness, style, completeness]
### Options
1. Merge into current branch
2. Cherry-pick specific files
3. Discard
git -C "$WORKTREE" add -A && git -C "$WORKTREE" commit -m "arbiter: <spec summary>"
git merge "$BRANCH"
git worktree remove "$WORKTREE"
git worktree remove --force "$WORKTREE"
git branch -D "$BRANCH"
git worktree list | grep arbiter/ for manual cleanup.Purpose: Ask BOTH providers the same question, compare answers.
IMPORTANT: Run codex and gemini in TRUE PARALLEL using run_in_background: true. Normal parallel Bash tool calls are serialized by Claude Code. Background execution is the only way to get real concurrency.
Flow:
run_in_background: true:Bash call 1 (codex) — run_in_background: true:
MODEL=$(_resolve_model "ask" "codex")
MODEL_FLAG=""; [ -n "$MODEL" ] && MODEL_FLAG="--model $MODEL"
OUT=$(mktemp)
codex exec --ephemeral -C "$PWD" $CODEX_PROFILE_FLAG $MODEL_FLAG --output-last-message "$OUT" "<prompt>" 2>&1
echo "---CODEX_ANSWER---"
cat "$OUT"; rm -f "$OUT"
Bash call 2 (gemini) — run_in_background: true:
MODEL=$(_resolve_model "ask" "gemini")
MODEL_FLAG=""; [ -n "$MODEL" ] && MODEL_FLAG="--model $MODEL"
ERR=$(mktemp)
RESP=$(gemini $MODEL_FLAG -p "<prompt>" -o text 2>"$ERR")
# Standard Gemini error handling (see Error Handling §7) — report GEMINI_ERROR on failure
echo "---GEMINI_ANSWER---"
echo "$RESP"
TaskOutput tool (block: true) for each background task ID. Form your own independent analysis.CONFIDENCE: line (e.g., from quorum-style prompts or self-assessed), extract and display it. Otherwise omit the confidence annotation.## Panel: "<question>"
### Codex [Confidence: 0.85]
[response summary]
### Gemini [Confidence: 0.70]
[response summary]
### Claude — Own Analysis
[Claude's independent analysis]
### Consensus (N/3 agree)
[What all or majority agree on]
### Split Decision
[Where they diverge — with Claude's recommendation, noting confidence asymmetry if present]
Partial failure OK — if one provider fails, report failure and show the others.
Purpose: Formal go/no-go gate for high-stakes decisions. Returns APPROVE/BLOCK/NEEDS_INFO.
When to use: Architecture choices, schema migrations, destructive operations, security decisions. For open-ended questions, use panel instead.
Quorum always runs both providers. No --via flag.
--with-diff: prepend git diff HEAD output (includes staged changes)git status --short + git log --oneline -3 (equalize context across providers)You are reviewing a proposal for a high-stakes decision.
Respond in EXACTLY this format (no other text):
VERDICT: APPROVE | BLOCK | NEEDS_INFO
CONFIDENCE: (0.0–1.0, where 1.0 = absolute certainty in your verdict)
BLOCKING_ISSUES: (numbered list or "none")
ASSUMPTIONS: (numbered list)
EVIDENCE_NEEDED: (what facts would change your verdict)
RATIONALE: (2-3 sentences)
Rules:
- BLOCK if: data loss risk, security vulnerability, irreversible without rollback
- NEEDS_INFO if: ambiguous, missing context, cannot evaluate
- APPROVE if: sound, safe, reversible or has rollback plan
- CONFIDENCE: 0.9+ = very certain, 0.7-0.9 = confident, 0.5-0.7 = uncertain, <0.5 = low confidence
run_in_background: true (Bash tool timeout: 120000) — same patterns as panel mode:Bash call 1 (codex) — run_in_background: true:
MODEL=$(_resolve_model "review" "codex")
MODEL_FLAG=""; [ -n "$MODEL" ] && MODEL_FLAG="--model $MODEL"
OUT=$(mktemp) || { echo "CODEX_ERROR: cannot create temp file"; exit 1; }
CONTEXT=""
# If --with-diff: CONTEXT="Context (current changes):\n$(git diff HEAD)\n\n"
STATE="State:\n$(git status --short)\n$(git log --oneline -3)\n\n"
PROMPT="${CONTEXT}${STATE}<contract_prompt>\n\n---\n\n<question>"
echo "$PROMPT" | codex exec --ephemeral -C "$PWD" $CODEX_PROFILE_FLAG $MODEL_FLAG \
--output-last-message "$OUT" - 2>&1
echo "---CODEX_ANSWER---"
cat "$OUT"; rm -f "$OUT"
Bash call 2 (gemini) — run_in_background: true:
MODEL=$(_resolve_model "review" "gemini")
MODEL_FLAG=""; [ -n "$MODEL" ] && MODEL_FLAG="--model $MODEL"
CONTEXT=""
# If --with-diff: CONTEXT="Context (current changes):\n$(git diff HEAD)\n\n"
STATE="State:\n$(git status --short)\n$(git log --oneline -3)\n\n"
PROMPT="${CONTEXT}${STATE}<contract_prompt>\n\n---\n\n<question>"
ERR=$(mktemp)
RESP=$(gemini $MODEL_FLAG -p "$PROMPT" -o text 2>"$ERR")
EXIT=$?
# Standard Gemini error handling (see Error Handling §7)
if [ $EXIT -ne 0 ]; then
if grep -qi "auth\|login\|403\|401\|credentials" "$ERR"; then echo "GEMINI_ERROR: auth"
else echo "GEMINI_ERROR: exit $EXIT"; cat "$ERR"
fi
fi
rm -f "$ERR"
echo "---GEMINI_ANSWER---"
echo "$RESP"
TaskOutput tool (block: true) for each background task ID. Show Run Summary (see Run Summary) before presenting quorum results. PARTIAL_SUCCESS handling subsumes the degraded quorum warning (Error #10).VERDICT: line → mark as PARSE_ERROR (not APPROVE, not N/A)TIMEOUTCONFIDENCE: value from each response. Parse as float, clamp to [0.0, 1.0]. If CONFIDENCE: line is missing or unparseable → default to 0.5 (neutral confidence)Claude sees both responses (including their confidence scores) and applies deterministic policy FIRST, then adds analysis.
Decision policy (deterministic, applied before LLM synthesis):
For split decisions (case 4) — conservative resolution: A BLOCK is a safety signal — confidence weighting must not auto-override it:
When Claude acts as tiebreaker, use adversarial prompt enriched with confidence context: "The providers disagree. Codex: (confidence: ). Gemini: (confidence: ). Your job: first, state the strongest case FOR the BLOCK. Then state the strongest case AGAINST it. Consider that the is more certain about their assessment. Only then give your tiebreaker verdict with rationale." This mitigates confirmation bias by forcing devil's advocate while surfacing confidence asymmetry.
## Quorum: "<question>"
### Votes
| Provider | Verdict | Confidence | Key Issue |
|----------|---------|------------|-----------|
| Codex | BLOCK | 0.85 | SQL injection in query builder |
| Gemini | APPROVE | 0.40 | — |
### Analysis
[Claude's synthesis: why they disagree, strength of the BLOCK argument, missed risks. Note confidence asymmetry if significant (≥0.3 gap)]
### Verdict: BLOCKED | APPROVED | NEEDS_INFO
[Rationale based on deterministic policy + confidence weighting + Claude analysis]
[If confidence-weighted resolution applied: "Resolved by confidence weighting: <provider> (<verdict>, confidence <score>) overrides <provider> (<verdict>, confidence <score>)"]
### Action Items
[If BLOCK: what to fix. If NEEDS_INFO: what to answer first]
Quorum does not persist any data to disk. Prompt content and verdicts remain in the conversation context only. This is consistent with Arbiter's trust boundary: "No data is stored by Arbiter."
Purpose: Cross-model verification with claim decomposition. Catches hallucinations, agreement bias, and unsupported claims that panel/quorum miss.
When to use: Factual questions, technical claims, "is X true?", any answer where correctness matters more than speed.
verify always runs both providers. No --via flag.
Same pattern as panel: two Bash calls with run_in_background: true.
Each provider answers the question independently, with NO shared context.
Bash call 1 (codex) — run_in_background: true:
MODEL=$(_resolve_model "review" "codex")
MODEL_FLAG=""; [ -n "$MODEL" ] && MODEL_FLAG="--model $MODEL"
OUT=$(mktemp) || { echo "CODEX_ERROR: cannot create temp file"; exit 1; }
CONTEXT=""
# If --with-diff: CONTEXT="Context (current changes):\n$(git diff HEAD)\n\n"
codex exec --ephemeral -C "$PWD" $CODEX_PROFILE_FLAG $MODEL_FLAG \
--output-last-message "$OUT" "<prompt>" 2>&1
echo "---CODEX_ANSWER---"
cat "$OUT"; rm -f "$OUT"
Bash call 2 (gemini) — run_in_background: true:
MODEL=$(_resolve_model "review" "gemini")
MODEL_FLAG=""; [ -n "$MODEL" ] && MODEL_FLAG="--model $MODEL"
CONTEXT=""
# If --with-diff: CONTEXT="Context (current changes):\n$(git diff HEAD)\n\n"
ERR=$(mktemp)
RESP=$(gemini $MODEL_FLAG -p "${CONTEXT}<prompt>" -o text 2>"$ERR")
EXIT=$?
# Standard Gemini error handling (see Error Handling §7)
if [ $EXIT -ne 0 ]; then
if grep -qi "auth\|login\|403\|401\|credentials" "$ERR"; then echo "GEMINI_ERROR: auth"
else echo "GEMINI_ERROR: exit $EXIT"; cat "$ERR"
fi
fi
rm -f "$ERR"
echo "---GEMINI_ANSWER---"
echo "$RESP"
Collect both results using TaskOutput tool (block: true) for each background task ID. Show Run Summary (see Run Summary) before presenting verify results.
Claude performs (NO external calls — this is Claude's own analysis):
Decompose BOTH answers into numbered atomic claims
Compare claims between answers:
If zero CONTESTED claims → skip Round 3, go to synthesis
Only runs if Round 2 found CONTESTED claims.
Build adversarial verification prompt with ANONYMOUS contested claims (Claude strips which model said what):
You are a skeptical fact-checker. Your goal is to FIND FLAWS, not confirm.
For each claim below, respond:
- PASS: claim is correct (explain why with supporting evidence)
- FAIL: claim is wrong (provide specific counterexample or contradiction)
- UNKNOWN: cannot determine (explain what evidence is needed)
Do NOT agree by default. If you cannot verify, say UNKNOWN.
Claims:
1. [anonymous claim]
2. [anonymous claim]
...
Two Bash calls with run_in_background: true:
Bash call 1 (codex) — run_in_background: true:
MODEL=$(_resolve_model "review" "codex")
MODEL_FLAG=""; [ -n "$MODEL" ] && MODEL_FLAG="--model $MODEL"
OUT=$(mktemp) || { echo "CODEX_ERROR: cannot create temp file"; exit 1; }
echo "$ADVERSARIAL_PROMPT" | codex exec --ephemeral -C "$PWD" $CODEX_PROFILE_FLAG $MODEL_FLAG \
--output-last-message "$OUT" - 2>&1
echo "---CODEX_VERIFY---"
cat "$OUT"; rm -f "$OUT"
Bash call 2 (gemini) — run_in_background: true:
MODEL=$(_resolve_model "review" "gemini")
MODEL_FLAG=""; [ -n "$MODEL" ] && MODEL_FLAG="--model $MODEL"
ERR=$(mktemp)
RESP=$(gemini $MODEL_FLAG -p "$ADVERSARIAL_PROMPT" -o text 2>"$ERR")
EXIT=$?
if [ $EXIT -ne 0 ]; then
if grep -qi "auth\|login\|403\|401\|credentials" "$ERR"; then echo "GEMINI_ERROR: auth"
else echo "GEMINI_ERROR: exit $EXIT"; cat "$ERR"
fi
fi
rm -f "$ERR"
echo "---GEMINI_VERIFY---"
echo "$RESP"
Collect both results using TaskOutput tool (block: true) for each background task ID.
For each claim, determine status:
## Verify: "<question>"
### Answer (synthesized)
[Merged answer using only VERIFIED claims. Contested claims presented as "X, though this is disputed because Y"]
### Claim Analysis
| # | Claim | Status | Detail |
|---|-------|--------|--------|
| 1 | ... | VERIFIED | Both providers agree |
| 2 | ... | CONTESTED | Codex: PASS, Gemini: FAIL — [counterexample] |
| 3 | ... | REJECTED | Both found errors — [correction] |
### Contested Claims (detail)
[For each CONTESTED claim: both sides of the argument]
### Confidence
X/Y claims verified, Z contested, W rejected.
[HIGH / MEDIUM / LOW / NEEDS HUMAN REVIEW]
Thresholds:
- HIGH: >80% VERIFIED, 0 REJECTED
- MEDIUM: >60% VERIFIED, ≤1 REJECTED
- LOW: <60% VERIFIED or >1 REJECTED
- NEEDS HUMAN REVIEW: >30% CONTESTED+REJECTED
Verify does not persist any data to disk. Same policy as quorum — no data stored.
Purpose: Resume the last Gemini session with a follow-up.
Codex continue is NOT supported in v1 (codex exec resume --last lacks --output-last-message).
Flow:
--via codex → error: "Continue not supported for codex in v1. Use --via gemini."MODEL=$(_resolve_model "ask" "gemini")
MODEL_FLAG=""; [ -n "$MODEL" ] && MODEL_FLAG="--model $MODEL"
ERR=$(mktemp)
RESP=$(gemini $MODEL_FLAG -r latest -p "<follow-up>" -o text 2>"$ERR")
# Standard Gemini error handling (see Error Handling §7) — return 1 on failure
Note: Gemini sessions are CWD-scoped.
Purpose: Health check — verify all providers are available and authenticated.
Flow:
CLAUDE_BIN=$(which claude 2>/dev/null) && CLAUDE_VER=$(claude --version 2>&1 | head -1)
CODEX_BIN=$(which codex 2>/dev/null) && CODEX_VER=$(codex --version 2>&1 | head -1)
GEMINI_BIN=$(which gemini 2>/dev/null) && GEMINI_VER=$(gemini --version 2>&1 | head -1)
JQ_BIN=$(which jq 2>/dev/null) && JQ_VER=$(jq --version 2>&1 | head -1)
run_in_background: true). Note: do NOT use timeout command (unavailable on macOS). Use Bash tool's built-in timeout parameter instead:Claude:
CLAUDE_MODEL=$(_resolve_model "doctor" "claude")
[ -n "$CLAUDE_MODEL" ] || CLAUDE_MODEL=$(_resolve_model "default" "claude")
[ -n "$CLAUDE_MODEL" ] || CLAUDE_MODEL="sonnet"
OUT=$(mktemp); ERR=$(mktemp)
START=$(date +%s)
CLAUDE_AUTH_JSON=$(env -u ANTHROPIC_API_KEY -u CLAUDECODE claude auth status 2>/dev/null || true)
env -u ANTHROPIC_API_KEY -u CLAUDECODE \
claude -p --output-format text --model "$CLAUDE_MODEL" \
"Reply with exactly: ARBITER_HEALTH_OK" >"$OUT" 2>"$ERR"
CLAUDE_API_EXIT=$?
CLAUDE_LATENCY=$(( $(date +%s) - START ))
CLAUDE_API_RESP=$(cat "$OUT")
if [ $CLAUDE_API_EXIT -ne 0 ]; then
if grep -qi "auth\|login\|403\|401\|credentials\|token\|Invalid API key" "$ERR"; then
CLAUDE_API_STATUS="auth_expired"
else
CLAUDE_API_STATUS="error"
fi
else
CLAUDE_API_STATUS="ok"
fi
rm -f "$OUT" "$ERR"
Codex:
CODEX_MODEL=$(_resolve_model "doctor" "codex")
CODEX_MODEL_FLAG=""; [ -n "$CODEX_MODEL" ] && CODEX_MODEL_FLAG="--model $CODEX_MODEL"
OUT=$(mktemp); ERR=$(mktemp)
START=$(date +%s)
codex exec --ephemeral $CODEX_PROFILE_FLAG $CODEX_MODEL_FLAG \
--output-last-message "$OUT" "Reply with exactly: ARBITER_HEALTH_OK" 2>"$ERR"
CODEX_API_EXIT=$?
CODEX_LATENCY=$(( $(date +%s) - START ))
CODEX_API_RESP=$(cat "$OUT"); rm -f "$OUT" "$ERR"
Gemini:
GEMINI_MODEL=$(_resolve_model "doctor" "gemini")
GEMINI_MODEL_FLAG=""; [ -n "$GEMINI_MODEL" ] && GEMINI_MODEL_FLAG="--model $GEMINI_MODEL"
ERR=$(mktemp)
START=$(date +%s)
GEMINI_API_RESP=$(gemini $GEMINI_MODEL_FLAG -p "Reply with exactly: ARBITER_HEALTH_OK" -o text 2>"$ERR")
GEMINI_API_EXIT=$?
GEMINI_LATENCY=$(( $(date +%s) - START ))
if [ $GEMINI_API_EXIT -ne 0 ]; then
if grep -qi "auth\|login\|403\|401\|credentials" "$ERR"; then
GEMINI_API_STATUS="auth_expired"
else
GEMINI_API_STATUS="error"
fi
else
GEMINI_API_STATUS="ok"
fi
rm -f "$ERR"
Determine per-provider status:
missing (suggest install URL)timeoutauth_expired (suggest codex auth / gemini login)errorhealthyConfig check:
_PROVIDERS_CONFIG="${EMPORIUM_PROVIDERS_CONFIG:-$HOME/.claude/emporium-providers.local.md}"
if [ -r "$_PROVIDERS_CONFIG" ]; then
CONFIG_STATUS="found"
else
CONFIG_STATUS="not found"
fi
Arbiter Doctor — Health Check
| Provider | Binary | Version | Model | API | Latency |
|----------|------------|---------|--------|--------|---------|
| Claude | ✓ found | 2.1.71 | sonnet | ✓ ok | 3.1s |
| Codex | ✓ found | 0.104 | gpt-5.3-codex | ✓ ok | 2.3s |
| Gemini | ✓ found | 0.30 | auto-gemini-3 | ✗ auth | — |
Tools: jq (<version>|missing — required for diverge scoring)
Config: ~/.claude/emporium-providers.local.md (found|not found)
Issues:
- Gemini: auth expired. Run: gemini login
- Claude: nested CLI auth is sanitized automatically by unsetting `ANTHROPIC_API_KEY` and `CLAUDECODE`
Overall: PARTIAL (2/3 providers healthy)
If all healthy: "All providers healthy. Ready to orchestrate." If issues: list each with actionable fix (install URL or auth command). If both missing: "No external providers found. Install codex or gemini to use multi-provider modes."
Purpose: Generate 3 independent implementations in isolated worktrees using different models and strategy hints, then compare and select the best solution. Use when a non-trivial spec has multiple valid approaches and you want to explore the solution space in parallel before committing.
All three agents run as background tasks for true parallelism.
Flow summary:
sonnet), Codex, and Gemini in parallelBefore starting, check whether diverge is warranted:
Trivial heuristic: spec < 50 words AND no architecture keywords AND context_files < 2
If ALL three conditions are true → warn:
> This spec looks straightforward. Diverge costs 4-6× a normal implement.
> Proceed? (yes / switch to implement)
If ANY condition is false, skip the warning and proceed.
Verify the working tree is clean:
if [ -n "$(git status --porcelain)" ]; then
echo "ERROR: Working tree has uncommitted changes."
echo "Diverge requires a clean tree so all worktrees start from identical state."
echo "Options: stash / commit / abort"
fi
git stash push -m "diverge: pre-run stash", proceedThen prune stale worktree registrations from prior crashes:
git worktree prune
Freeze all inputs into a single immutable artifact stored outside the repo:
RUN_ID="$(date -u +%Y%m%dT%H%M%SZ)-$$"
BASE_COMMIT="$(git rev-parse HEAD)"
RUN_DIR="$(mktemp -d "${TMPDIR:-/tmp}/arbiter-diverge.XXXXXX")"
chmod 700 "$RUN_DIR" # secure: owner-only access
Write $RUN_DIR/diverge-pack.json:
{
"run_id": "20260301T100000Z-12345",
"spec": "user's implementation spec (verbatim)",
"context_files": ["list of files read for context"],
"context_contents": {"path/to/file.py": "frozen file content..."},
"constraints": "extracted from exploration.md if in signum flow",
"base_commit": "abc123def",
"strategies": ["minimal", "refactor", "redesign"],
"providers": ["claude", "codex", "gemini"],
"timeout_seconds": 300,
"lite": false,
"created_at": "2026-03-01T10:00:00Z"
}
Copy the pack into each worktree after creation (untracked files are not inherited by worktrees):
for LABEL in A B C; do
mkdir -p "${WT[$LABEL]}/.dev"
cp "$RUN_DIR/diverge-pack.json" "${WT[$LABEL]}/.dev/diverge-pack.json"
done
Default strategy set (rotate across providers to avoid provider-strategy bias):
| Strategy | Hint injected into prompt | Default Provider |
|---|---|---|
minimal | "Solve with the smallest possible change. Minimize lines changed, new files, and new dependencies. Patch, don't rebuild." | Claude (sonnet) |
refactor | "Solve by improving the existing code structure. Extract helpers, rename for clarity, improve testability. Moderate change size OK." | Codex (GPT) |
redesign | "Solve with the best possible abstraction. You may introduce new modules, patterns, or interfaces if justified. Optimize for long-term maintainability." | Gemini |
Custom strategies (--strategies "a,b,c"): injected verbatim as hint text. If fewer than 3 → pad with defaults. If more than 3 → use first 3.
Provider redistribution: if a provider is unavailable, redistribute its strategy to an available provider as a second run in a separate worktree. Example: no Gemini → Claude gets minimal + redesign in two worktrees.
Worktrees are detached (no branch created upfront — branch is only created for the selected solution at merge time):
declare -A WT
for LABEL in A B C; do
WT[$LABEL]="$RUN_DIR/$LABEL"
git worktree add --detach "${WT[$LABEL]}" "$BASE_COMMIT"
done
Cleanup (trap on exit):
cleanup_diverge() {
for LABEL in A B C; do
if [ -d "${WT[$LABEL]}" ]; then
git worktree remove --force "${WT[$LABEL]}" 2>/dev/null
fi
done
rm -rf "$RUN_DIR"
}
trap cleanup_diverge EXIT
On any error, remind user: git worktree list | grep arbiter-diverge for manual cleanup.
Write a prompt file to each worktree (avoids OS argument length limits and shell quoting issues):
for LABEL in A B C; do
STRATEGY="${STRATEGIES[$LABEL]}"
cat > "${WT[$LABEL]}/.dev/diverge-prompt.md" <<PROMPT
# Implementation Task
## Strategy
$STRATEGY
## Specification
$SPEC
## Context Files
$(for f in "${CONTEXT_FILES[@]}"; do
echo "### $f"
echo '```'
cat "$f"
echo '```'
done)
## Constraints
$CONSTRAINTS
## Rules
- Do NOT commit your changes. Leave all modifications as uncommitted files.
- Write a self-assessment card to .dev/diverge-card.json with this structure:
{"approach": "5-10 line summary", "assumptions": [...], "risks": [...], "new_dependencies": [...]}
- Follow existing project patterns and conventions.
PROMPT
done
All three run as background tasks. Agents DO NOT commit — all changes remain as uncommitted modifications in the worktree.
run_with_timeout function (python process group kill for reliable child cleanup):
run_with_timeout() {
local timeout_s="$1"; shift
python3 -c "
import os, signal, subprocess, sys
timeout = int(sys.argv[1])
cmd = sys.argv[2:]
p = subprocess.Popen(cmd, start_new_session=True)
try:
rc = p.wait(timeout=timeout)
sys.exit(rc)
except subprocess.TimeoutExpired:
try:
os.killpg(p.pid, signal.SIGTERM)
except ProcessLookupError:
pass
try:
p.wait(timeout=5)
except subprocess.TimeoutExpired:
try:
os.killpg(p.pid, signal.SIGKILL)
except ProcessLookupError:
pass
sys.exit(124)
" "$timeout_s" "$@"
}
Exit code 124 = timeout (matches GNU timeout convention). Default $TIMEOUT = 300 (configurable via --timeout).
Claude subagent (configurable model, default sonnet) — run_in_background: true:
Resolve Claude model before dispatch:
CLAUDE_IMPLEMENT_MODEL=$(_resolve_model "implement" "claude")
[ -n "$CLAUDE_IMPLEMENT_MODEL" ] || CLAUDE_IMPLEMENT_MODEL=$(_resolve_model "default" "claude")
[ -n "$CLAUDE_IMPLEMENT_MODEL" ] || CLAUDE_IMPLEMENT_MODEL="sonnet"
subagent_type="general-purpose", model="$CLAUDE_IMPLEMENT_MODEL", run_in_background=true
Prompt: contents of ${WT[A]}/.dev/diverge-prompt.md
Working directory: ${WT[A]}
Bash call (codex) — run_in_background: true:
MODEL=$(_resolve_model "implement" "codex")
MODEL_FLAG=""; [ -n "$MODEL" ] && MODEL_FLAG="--model $MODEL"
PROMPT_FILE="${WT[B]}/.dev/diverge-prompt.md"
ERR="$RUN_DIR/codex-stderr.log"
# Guard against ARG_MAX — if prompt exceeds 128KB, reference the file instead
PROMPT_SIZE=$(wc -c < "$PROMPT_FILE")
if [ "$PROMPT_SIZE" -gt 131072 ]; then
CODEX_PROMPT="Read and follow the implementation spec in .dev/diverge-prompt.md"
else
CODEX_PROMPT=$(cat "$PROMPT_FILE")
fi
run_with_timeout "$TIMEOUT" \
codex exec --full-auto -C "${WT[B]}" $MODEL_FLAG \
"$CODEX_PROMPT" \
>"$RUN_DIR/codex-stdout.log" 2>"$ERR"
CODEX_EXIT=$?
if [ $CODEX_EXIT -ne 0 ]; then
if grep -qi "auth\|login\|403\|401\|credentials\|token" "$ERR"; then
echo "Codex auth error. Run: codex auth"
CODEX_STATUS="auth_expired"
elif [ $CODEX_EXIT -eq 124 ]; then
CODEX_STATUS="timeout"
else
echo "Codex error (exit $CODEX_EXIT):"; head -5 "$ERR"
CODEX_STATUS="error"
fi
else
CODEX_STATUS="ok"
fi
Bash call (gemini) — run_in_background: true:
MODEL=$(_resolve_model "implement" "gemini")
MODEL_FLAG=""; [ -n "$MODEL" ] && MODEL_FLAG="--model $MODEL"
PROMPT_FILE="${WT[C]}/.dev/diverge-prompt.md"
ERR="$RUN_DIR/gemini-stderr.log"
# Pipe prompt via stdin to avoid ARG_MAX / env size limits
run_with_timeout "$TIMEOUT" \
sh -c "cd '${WT[C]}' && cat '$PROMPT_FILE' | gemini $MODEL_FLAG -y -o text" \
>"$RUN_DIR/gemini-stdout.log" 2>"$ERR"
GEMINI_EXIT=$?
if [ $GEMINI_EXIT -ne 0 ]; then
if grep -qi "auth\|login\|403\|401\|credentials" "$ERR"; then
echo "Gemini auth error. Run: gemini login"
GEMINI_STATUS="auth_expired"
elif [ $GEMINI_EXIT -eq 124 ]; then
GEMINI_STATUS="timeout"
else
echo "Gemini error (exit $GEMINI_EXIT):"; head -5 "$ERR"
GEMINI_STATUS="error"
fi
else
GEMINI_STATUS="ok"
fi
NOTE: Always load the Gemini prompt into a variable first (PROMPT=$(cat "$PROMPT_FILE")), then pass via "$PROMPT". Single-quote expansion ('$(cat ...)') blocks variable substitution and will NOT work.
Collect all results using TaskOutput tool (block: true) for each background task ID.
Show Run Summary after collecting all dispatch results (see Run Summary). On PARTIAL_SUCCESS → redistribute failed provider's strategy or evaluate with fewer solutions. On FAILURE → Error #13 applies.
Problem: 3 full diffs simultaneously can overflow the evaluator's context (e.g., 3 × 50KB = 150k+ tokens). Solution: Two-pass evaluation.
Pass 1: Stats + Cards
for LABEL in A B C; do
git -C "${WT[$LABEL]}" diff --name-status "$BASE_COMMIT" \
> "$RUN_DIR/$LABEL-name-status.txt"
git -C "${WT[$LABEL]}" diff --numstat "$BASE_COMMIT" \
> "$RUN_DIR/$LABEL-numstat.txt"
if [ -f "${WT[$LABEL]}/.dev/diverge-card.json" ]; then
cp "${WT[$LABEL]}/.dev/diverge-card.json" "$RUN_DIR/$LABEL-card.json"
else
echo '{"approach":"(agent did not write card)","assumptions":[],"risks":[],"new_dependencies":[]}' \
> "$RUN_DIR/$LABEL-card.json"
fi
done
Build a Per-Solution Card from stats (labels anonymized as Solution 1/2/3 — provider-solution mapping stored in $RUN_DIR/label-map.json but NOT passed to the evaluator):
{
"label": "1",
"strategy": "minimal",
"status": "ok|timeout|error",
"files_changed": 3,
"lines_added": 42,
"lines_removed": 12,
"files_list": ["src/api.py", "tests/test_api.py"],
"card": {
"approach": "agent's self-summary",
"assumptions": ["..."],
"risks": ["..."],
"new_dependencies": []
}
}
Pass 2: Full Diff on Demand
Only after the Decision Matrix is presented:
git -C "${WT[$SELECTED]}" diff "$BASE_COMMIT"
--run-tests)Before running tests, verify a test suite exists:
# Detect test runner and set TEST_COMMAND
HAS_TESTS=false
TEST_COMMAND=""
if [ -f "package.json" ] && command -v npm &>/dev/null; then
HAS_TESTS=true; TEST_COMMAND="npm test"
elif [ -f "pytest.ini" ] || [ -f "setup.cfg" ] || [ -f "pyproject.toml" ] && command -v pytest &>/dev/null; then
HAS_TESTS=true; TEST_COMMAND="pytest"
elif [ -f "go.mod" ] && command -v go &>/dev/null; then
HAS_TESTS=true; TEST_COMMAND="go test ./..."
elif ls tests/ &>/dev/null || ls test/ &>/dev/null; then
# Test files exist but no runner detected
if command -v pytest &>/dev/null; then HAS_TESTS=true; TEST_COMMAND="pytest"
elif command -v jest &>/dev/null; then HAS_TESTS=true; TEST_COMMAND="jest"
fi
fi
If HAS_TESTS=false or TEST_COMMAND is empty → mark Tests: not run in decision matrix (do not error).
If test suite found:
for LABEL in A B C; do
(cd "${WT[$LABEL]}" && $TEST_COMMAND 2>&1) > "$RUN_DIR/$LABEL-test-results.txt"
echo "exit:$?" >> "$RUN_DIR/$LABEL-test-results.txt"
done
A solution that fails existing tests gets Correctness score capped at 1. Test results are passed to the evaluator as additional input.
Guard: --lite --run-tests → error: "Cannot run tests in lite mode (no code)." (also caught at input parsing).
The evaluator is a separate Claude invocation — NOT the orchestrator (mitigates self-preference bias):
Resolve evaluator model before dispatch:
CLAUDE_EVAL_MODEL=$(_resolve_model "evaluate" "claude")
[ -n "$CLAUDE_EVAL_MODEL" ] || CLAUDE_EVAL_MODEL=$(_resolve_model "default" "claude")
[ -n "$CLAUDE_EVAL_MODEL" ] || CLAUDE_EVAL_MODEL="sonnet"
subagent_type="general-purpose", model="$CLAUDE_EVAL_MODEL"
run_in_background=false
Evaluator prompt receives:
--name-status)The evaluator produces raw scores per criterion. Weighted totals are computed programmatically (not by LLM) via bash arithmetic:
# Weights: correctness=5, change_size=3, maintainability=4, test_coverage=4,
# backward_compat=3, performance=2, security=5
WEIGHTS=(5 3 4 4 3 2 5)
for SOL in 0 1 2; do
TOTAL=0
for i in "${!WEIGHTS[@]}"; do
SCORE=$(jq -r ".solutions[$SOL].scores[$i]" "$RUN_DIR/evaluator-output.json")
TOTAL=$((TOTAL + WEIGHTS[i] * SCORE))
done
echo "Solution $((SOL + 1)): $TOTAL"
done
Evaluator overflow fallback: if evaluator context overflows → fall back to stats/cards only (no diff content).
## Diverge Report: <spec summary>
**Evaluator:** independent (anonymized labels, formula-computed totals)
**Run ID:** 20260301T100000Z-12345
### Summary
| | Solution 1 | Solution 2 | Solution 3 |
|---|---|---|---|
| Strategy | minimal | refactor | redesign |
| Files changed | 3 | 5 | 8 |
| Lines +/- | +42/-12 | +87/-34 | +156/-67 |
| New deps | 0 | 0 | 1 (p-retry) |
| Tests | PASS | PASS | 2 FAIL |
### Decision Matrix
| Criterion (weight) | Sol 1 | Sol 2 | Sol 3 | Notes |
|---|---|---|---|---|
| Correctness (5) | 4 | 4 | 5 | Sol 3 handles edge case X |
| Change size (3) | 5 | 3 | 1 | Sol 1 is surgical |
| Maintainability (4) | 3 | 5 | 4 | Sol 2 cleanest structure |
| Test coverage (4) | 3 | 4 | 5 | Sol 3 adds most tests |
| Backward compat (3) | 5 | 4 | 3 | Sol 1 preserves all interfaces |
| Performance (2) | 4 | 4 | 4 | No significant difference |
| Security (5) | 4 | 4 | 4 | No issues found |
| **Weighted total** | **99** | **106** | **104** | formula-computed |
### Recommendation
**Solution 2 (refactor)** scores highest. Solution 1 is the safe fallback.
### Options
1. Merge Solution 1 → current branch
2. Merge Solution 2 → current branch
3. Merge Solution 3 → current branch
4. Inspect full diff (specify solution number)
5. Hybrid: cherry-pick files manually (`git restore --source <branch> -- <path>`)
6. Discard all
After user selects, reveal the provider mapping:
> Solution 1 = Claude (sonnet) | Solution 2 = Codex (GPT) | Solution 3 = Gemini
SELECTED_LABEL="B" # mapped from user's solution choice
SELECTED_BRANCH="diverge/$RUN_ID-selected"
# HEAD drift check
CURRENT_HEAD="$(git rev-parse HEAD)"
if [ "$CURRENT_HEAD" != "$BASE_COMMIT" ]; then
echo "WARNING: HEAD moved during diverge ($BASE_COMMIT → $CURRENT_HEAD)."
echo "Merge may have conflicts. Proceed? (yes / abort)"
fi
# Show files in selected worktree
git -C "${WT[$SELECTED_LABEL]}" status --short
# Create branch only for selected solution
git -C "${WT[$SELECTED_LABEL]}" switch -c "$SELECTED_BRANCH"
# Stage changes, excluding .dev/ artifacts
git -C "${WT[$SELECTED_LABEL]}" add -A -- . ':!.dev/'
git -C "${WT[$SELECTED_LABEL]}" commit -m "diverge: <spec summary> (strategy: <strategy>)"
# Merge into working branch
git merge --no-ff "$SELECTED_BRANCH"
# Cleanup handled by trap
Save report to .dev/diverge-report.md for audit trail. Append selection metadata.
mkdir -p .dev
Append JSONL metrics artifact (append-only, per-run stats):
cat >> .dev/diverge-metrics.jsonl <<EOF
{"run_id":"$RUN_ID","selected":"Solution 2","provider":"codex","strategy":"refactor","scores":{...},"selected_at":"$(date -u +%Y-%m-%dT%H:%M:%SZ)"}
EOF
--lite)Design-level divergence without code generation. ~1.5× token cost, ~10× faster than full diverge.
Each model produces a design doc (architecture + files + risks) instead of code. No worktrees are created — outputs are text artifacts stored in $RUN_DIR.
Prompt modification:
Do NOT write code. Instead, produce a design document with:
- ## Approach (5-10 lines describing the strategy)
- ## Files (list of files to create/modify with brief notes)
- ## Risks (top 5 risks and mitigations)
- ## Dependencies (new packages needed, if any)
Output: side-by-side comparison of 3 designs → user picks one.
Lite → Implementation handoff (after user selects a design):
.dev/diverge-selected-design.md.dev/diverge-report.md> Selected design saved. How to proceed?
> 1. signum build — run signum Phase 3 using this design
> 2. arbiter implement — delegate to a single provider
> 3. manual — implement yourself
build_strategy=diverge-lite): overwrite .dev/design.md with the selected design, proceed to Phase 3 (standard build, not diverge again)All multi-provider modes (panel, quorum, verify, diverge) MUST show a Run Summary at the top of their output.
## Run Summary
| Provider | Status | Time |
|----------|--------|------|
| Claude | ok | 3.2s |
| Codex | ok | 8.1s |
| Gemini | timeout| 120s |
Overall: PARTIAL_SUCCESS (2/3 providers)
⚠ Gemini timed out after 120s. Results below are from Claude + Codex only.
ok, timeout, error, auth_expired, abstain
okok AND ≥1 non-okok providersSUCCESS → proceed normallyPARTIAL_SUCCESS → proceed with available results + show warning listing failed providers and reasonsFAILURE → STOP. Show errors. Suggest fallback:
/arbiter ask with a single provider"/arbiter implement instead"When invoked without arguments (/arbiter), show:
Arbiter — multi-AI orchestrator
Providers:
codex <status> (<version>)
gemini <status> (<version>)
Modes:
review Code review (default: codex)
ask Ask a question (default: codex)
implement Delegate implementation (default: codex)
panel Ask all providers, compare
quorum Formal go/no-go verdict (high-stakes only)
verify Cross-verification with claim decomposition
continue Resume last session (gemini only)
diverge 3 independent solutions, compare & select
doctor Health check all providers
Usage: /arbiter <mode> [--via codex|gemini] [--base <branch>] [args]
Examples:
/arbiter review
/arbiter review --base main
/arbiter ask "should I use async here?"
/arbiter ask --via gemini --with-diff "is this thread-safe?"
/arbiter panel "best testing strategy for this module?"
/arbiter implement --via gemini "add input validation"
/arbiter quorum "should I drop the users table and recreate?"
/arbiter quorum --with-diff "are these migration changes safe?"
/arbiter verify "Is SQLite safe for concurrent writes in production?"
/arbiter verify --with-diff "is this thread-safe?"
/arbiter diverge "implement caching layer"
/arbiter diverge --lite "redesign auth module"
/arbiter diverge --run-tests "add input validation"
/arbiter doctor
Config: ~/.claude/emporium-providers.local.md (configure models per task)
Provider status: run which codex && codex --version / which gemini && gemini --version. Show checkmark if found, cross if missing.
Before every provider call, show the effective command:
> codex review --base main
codex auth / gemini login"[TIMEOUT] marker.git diff --stat first, stop early.ERR=$(mktemp)), check for auth keywords, discard noise. Standard pattern:
ERR=$(mktemp)
RESP=$(... 2>"$ERR")
EXIT=$?
if [ $EXIT -ne 0 ]; then
if grep -qi "auth\|login\|403\|401\|credentials" "$ERR"; then
echo "Gemini auth error. Run: gemini login"
else
echo "Gemini error (exit $EXIT):"; cat "$ERR"
fi
rm -f "$ERR"; return 1
fi
rm -f "$ERR"
All Gemini calls in every mode MUST use this pattern. Adapt return 1 / exit 1 to context.git worktree list | grep arbiter/ for manual cleanup.VERDICT: line, mark as PARSE_ERROR. Do not count as a vote.arbiter implement instead.If triggered via /codex (old trigger), handle normally — arbiter processes the command. No redirect message needed, just run the requested mode with codex as the provider.
2>/dev/null blindly, always check for auth errors first.npx claudepluginhub heurema/arbiterQueries multiple external AI agents (Codex, Gemini, OpenCode, Claude Code) in parallel for independent second opinions on code review, architecture decisions, and ambiguous problems.
Orchestrates multi-agent code review with Codex CLI, Gemini CLI, and five Claude specialist subagents (security, performance, logic, regression, robustness) then synthesizes findings into verified fixes. Use for deep reviews, second opinions, or council reviews on PRs, commits, or branches.
Performs symmetric two-AI peer reviews using OpenAI Codex CLI: independent blind reviews followed by structured per-issue debate for plans, code reviews, architecture, and recommendations.