From claude-commands
Documents how to run the pairv2 LangGraph-based pair-programming executor, including direct invocation, benchmark script usage, and what "minimax" means as a CLI option (injects MiniMax API endpoint via environment variables).
How this skill is triggered — by the user, by Claude, or both
Slash command
/claude-commands:pairv2-usageThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
`pair_execute_v2.py` is a LangGraph-based pair-programming executor. It:
pair_execute_v2.py is a LangGraph-based pair-programming executor. It:
--max-cycles times if verification failsFiles:
.claude/pair/pair_execute_v2.py.claude/pair/benchmark_pair_executors.py.claude/contracts/left_contract.template.json, right_contract.template.json--coder-cli / --verifier-climinimax is NOT a CLI binary. There is no minimax command on PATH.
--coder-cli minimax means: run the claude CLI binary with the MiniMax API endpoint injected via environment variables.
The orchestration library (orchestration/task_dispatcher.py) handles this:
"minimax": {
"binary": "claude", # still the claude CLI
"env_set": {
"ANTHROPIC_BASE_URL": "https://api.minimax.io/anthropic",
"ANTHROPIC_MODEL": "MiniMax-M2.5",
"ANTHROPIC_API_KEY": "<from MINIMAX_API_KEY>",
},
"env_unset": ["ANTHROPIC_AUTH_TOKEN", "CLAUDECODE"],
}
Auth is resolved automatically from $MINIMAX_API_KEY → ~/.bashrc → ~/.automation_env.
Other CLI options work similarly:
claude — Claude Code CLI, direct Anthropic OAuth (no key needed)codex — OpenAI Codex CLI (codex exec --yolo)gemini — Gemini CLI (gemini -m <model> --yolo)cursor — Cursor CLI (cursor-agent -f)codexs means for pair (local alias)In this environment, ~/.bashrc defines:
alias codexs='codexd -m gpt-5.3-codex-spark --config model_reasoning_effort=high'
pair_execute_v2.py does not accept codexs as a --coder-cli/--verifier-cli value.
Use codex for both roles, then pass the alias settings through extra args:
./vpython .claude/pair/pair_execute_v2.py \
--coder-cli codex \
--verifier-cli codex \
--coder-extra-arg=-m \
--coder-extra-arg=gpt-5.3-codex-spark \
--coder-extra-arg=--config \
--coder-extra-arg=model_reasoning_effort=high \
--verifier-extra-arg=-m \
--verifier-extra-arg=gpt-5.3-codex-spark \
--verifier-extra-arg=--config \
--verifier-extra-arg=model_reasoning_effort=high \
--max-cycles 1 \
--no-worktree \
"Your task description here"
venv/bin/python3 .claude/pair/pair_execute_v2.py \
--coder-cli minimax \
--verifier-cli minimax \
--no-worktree \
"Your task description here"
With explicit contracts:
venv/bin/python3 .claude/pair/pair_execute_v2.py \
--coder-cli minimax \
--verifier-cli codex \
--left-contract .claude/contracts/left_contract.template.json \
--right-contract .claude/contracts/right_contract.template.json \
--no-worktree \
--max-cycles 3 \
"Your task description here"
Key flags:
| Flag | Default | Description |
|---|---|---|
--coder-cli | minimax | Agent that writes code |
--verifier-cli | codex | Agent that reviews code |
--max-cycles | 5 | Max retry loops |
--no-worktree | off | Skip git worktree isolation |
--live-timeout | 600 | Heartbeat timeout in seconds |
--fan-out | 1 | Parallel coder attempts |
--validate-contracts-only | off | Check contracts without running agents |
Output JSON is printed to stdout and saved to --artifact-dir (default: /tmp/.../pairv2_latest/).
The benchmark runs pairv2 (and optionally legacy pair executors) on a task and records timing/verdict.
Use --task-preset when you want a common workload without re-typing the full prompt.
hello_world: legacy default contract smoke task.amazon_clone: Flask-based Amazon clone with browser test + video/screenshot evidence (11 expected files).amazon_clone_full: backward-compatible alias for amazon_clone.message_tests: unit tests for message coordination functions (8+ test cases).session_validation: TDD input validation for session management (6+ test cases).monitor_edge_cases: PairMonitor edge case tests (6+ test cases).agent_name_generation: comprehensive agent name generation tests (10+ test cases).sdui_blog_tdd: 10k+ LOC SDUI blog e-commerce, 12-phase TDD (70+ expected files).Presets with structured metadata (test_command, expected_files) are stored in
TASK_PRESET_META in the benchmark file. The shared source of truth is now
.claude/pair/benchmarks/benchmark_tasks.json (used by benchmark_pair_executors.py
and testing_llm/pair/run_pair_benchmark.py).
Show presets:
./vpython .claude/pair/benchmark_pair_executors.py --list-task-presets
Current defaults (as of Feb 2026):
pairv2-onlyminimaxminimax# Run with all defaults (pairv2-only, minimax+minimax, hello world task)
venv/bin/python3 .claude/pair/benchmark_pair_executors.py
# Run the Amazon-style clone example
venv/bin/python3 .claude/pair/benchmark_pair_executors.py \
--task-preset amazon_clone \
--pairv2-max-cycles 2 \
--timeout-seconds 1200
# Custom task
venv/bin/python3 .claude/pair/benchmark_pair_executors.py \
--task "Implement a Python function that reverses a linked list"
# pairv2 vs legacy pair comparison
venv/bin/python3 .claude/pair/benchmark_pair_executors.py \
--executor-set legacy-vs-v2 \
--benchmark-iterations 2
# All three executors in parallel
venv/bin/python3 .claude/pair/benchmark_pair_executors.py \
--executor-set all \
--parallel
Key output fields (from result JSON):
{
"mode": "pairv2_only",
"pairv2_langgraph": { "avg_ms": 404900, "count": 1 },
"pairv2_status_counts": { "completed": 1 },
"pairv2_raw": [{ "verdict": "PASS", "cycle_count": 1, "duration_ms": 404900 }]
}
The benchmark imposes a process-level timeout cap of 1800s (30 min).
minimax.md — MiniMax in the context of PR automation (pr-monitor, fixpr)pair-benchmark-all-executors.md — Benchmark executor-set referenceagents.md — Agent orchestration patternsnpx claudepluginhub jleechanorg/claude-commands --plugin claude-commandsRuns a single benchmark script comparing three pair executors: pairv2 (LangGraph), pair via Claude Teams, and pair via direct Python. Useful for regression testing or performance comparisons across executor modes.
Manages AI pair programming sessions with driver/navigator roles, TDD, real-time code review, and quality verification. Invoke when collaborating on code with role-based workflows.
Delegates implementation and review tasks to external AI CLI tools (Codex, Gemini) with cross-model adversarial review for cost savings and improved accuracy.