From llm-gateway
Dispatches tasks to the optimal LLM (Codex, Gemini, Claude, etc.) based on task type and security needs, with configurable approval strategies and polling behavior.
How this skill is triggered — by the user, by Claude, or both
Slash command
/llm-gateway:model-routingThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Choose the right LLM for each task. Based on real usage across 11+ VerivusAI projects.
Choose the right LLM for each task. Based on real usage across 11+ VerivusAI projects.
Apply these on every dispatch unless the caller has explicitly overridden a rule in the current turn:
model — let the gateway use its configured default per CLI. Nominating a model risks deprecated IDs (o3, o3-pro, gpt-4o, …) and capability mismatches. Call list_models only when the caller has asked for a specific variant.approvalStrategy:"mcp_managed" is the skill dispatch default (the gateway schema default is "legacy"). It gates the request before execution, then sets each provider to a safe accept-edits-level mode (auto-accept file edits; Bash and other dangerous tools stay gated): Claude and Grok --permission-mode acceptEdits, Mistral --agent accept-edits, and Gemini prompted default (the agy CLI has no accept-edits rung, so Gemini cannot auto-approve mutating tools under mcp_managed). Codex still needs fullAuto:true for autonomous file/shell work (its sandboxed workspace-write mode is unchanged). Full unattended execution requires the operator opt-in LLM_GATEWAY_APPROVAL_ALLOW_BYPASS=1, which restores each provider's full auto-approve mode (Claude bypassPermissions, Grok --always-approve, Mistral auto-approve, Gemini --dangerously-skip-permissions).idleTimeoutMs is a separate no-output safeguard.NOT APPROVED or conditional, fix + re-review → repeat. Escalate after 3 rounds. This rule does not apply to pure implementation or non-review analysis dispatches.All tool invocations below use the dispatch defaults above (omit model, approvalStrategy:"mcp_managed", fullAuto:true for Codex, poll every 60 s, loop on reviews).
| Task | Best LLM | Why | Tool |
|---|---|---|---|
| Code implementation | Codex | Strongest at writing correct code, handles large codebases | codex_request (fullAuto:true, approvalStrategy:"mcp_managed") |
| Code review (quality) | Codex | Thorough, finds real issues, gives actionable feedback | codex_request (fullAuto:true, approvalStrategy:"mcp_managed") |
| Code review (security) | Gemini | Strong security focus, OWASP awareness, edge case detection | gemini_request (approvalStrategy:"mcp_managed") |
| Architecture review | Claude | Best at high-level design, pattern recognition, trade-off analysis | claude_request (approvalStrategy:"mcp_managed") |
| Design doc review | Codex | Checks feasibility, completeness, finds gaps in plans | codex_request (fullAuto:true, approvalStrategy:"mcp_managed") |
| Bug investigation | Codex | Can read code, trace logic, identify root causes | codex_request (fullAuto:true, approvalStrategy:"mcp_managed") |
| Refactoring | Codex | Handles multi-file changes reliably | codex_request (fullAuto:true, approvalStrategy:"mcp_managed") |
| Documentation | Claude | Best prose quality, understands audience | claude_request (approvalStrategy:"mcp_managed") |
| Test generation | Codex | Understands test frameworks, generates comprehensive cases | codex_request (fullAuto:true, approvalStrategy:"mcp_managed") |
| Security audit | Gemini | Security-focused analysis, threat modeling | gemini_request (approvalStrategy:"mcp_managed") |
| Multi-file analysis | Codex | Handles large codebases with sqry integration | codex_request (fullAuto:true, approvalStrategy:"mcp_managed") |
| Diversity / tie-breaker review | Grok (xAI) | Independent fourth model from a different vendor family — useful when Claude/Codex/Gemini might share a blind spot | grok_request (approvalStrategy:"mcp_managed") |
| Maximum diversity | Mistral Vibe | Fifth independent vendor (EU / open-weights family); uncorrelated with OpenAI/Anthropic/Google/xAI | mistral_request (approvalStrategy:"mcp_managed") |
| Consensus / unanimous gate | All five in parallel | Catches issues any single model misses; use when correctness > cost | *_request_async for Claude/Codex/Gemini/Grok/Mistral |
The gateway uses sensible configured defaults. Omitting model is almost always correct.
codex_request({prompt: "...", fullAuto: true, approvalStrategy: "mcp_managed"})
gemini_request({prompt: "...", approvalStrategy: "mcp_managed"})
claude_request({prompt: "...", approvalStrategy: "mcp_managed"})
grok_request({prompt: "...", approvalStrategy: "mcp_managed"})
Treat old memory/config IDs such as o3, o3-pro, and gpt-4o as legacy unless list_models currently reports them for the target CLI.
If you see stale IDs in old configs or memory, prefer the configured default or call list_models.
The dispatch default is to omit model. Only include it if the user has explicitly named a model in the current turn.
// Only when the caller asked for this specific variant:
gemini_request({prompt: "...", model: "<explicit-user-request>", approvalStrategy: "mcp_managed"})
list_models() // All CLIs
list_models({cli: "codex"}) // Codex models only
promptParts to share stable prefix bytes across callsWhen the same long system / tools / context block is sent to multiple CLIs (parallel reviews, consensus, multi-round loops), switch from prompt to the structured promptParts field:
codex_request({
promptParts: {
system: "<long stable system instruction>",
tools: "<long stable tool description>",
context: "<long stable file dump / spec>",
task: "Implement X per the above."
},
fullAuto: true,
approvalStrategy: "mcp_managed"
})
The gateway concatenates in canonical order system → tools → context → task, so the stable prefix bytes are byte-identical across the parallel dispatch and across rounds — that raises implicit cache hit rate at each provider with no special-case API contortions. prompt and promptParts are mutually exclusive (the runtime returns provide exactly one of \prompt` or `promptParts`if both are supplied). The stable prefix hash is recorded in the flight recorder and queryable viacache-state://prefix/{hash}` so you can verify the prefix actually got shared.
For short one-off questions, plain prompt is fine.
The most common pattern. Codex with fullAuto handles implementation + testing:
codex_request({
prompt: "Implement [feature] in [path]. Requirements:\n- [req 1]\n- [req 2]\n\nInclude tests.",
fullAuto: true,
approvalStrategy: "mcp_managed"
})
Second most common. Codex reviews with full codebase access:
codex_request({
prompt: "Review [path] for [criteria]. End with APPROVED or NOT APPROVED with findings.",
fullAuto: true,
approvalStrategy: "mcp_managed"
})
For security-sensitive changes:
gemini_request({
prompt: "Security audit [path]. Check for injection, auth bypass, data leaks, OWASP Top 10. End with APPROVED or NOT APPROVED with findings.",
approvalStrategy: "mcp_managed"
})
For comprehensive coverage:
codex_request_async({prompt: "Review [path] for correctness... End with APPROVED or NOT APPROVED with findings.", fullAuto: true, approvalStrategy: "mcp_managed", correlationId: "review-codex"})
gemini_request_async({prompt: "Security audit [path]... End with APPROVED or NOT APPROVED with findings.", approvalStrategy: "mcp_managed", correlationId: "review-gemini"})
grok_request_async({prompt: "Independent review of [path]... End with APPROVED or NOT APPROVED with findings.", approvalStrategy: "mcp_managed", correlationId: "review-grok"})
Model routing affects session strategy:
| LLM | Session Continuity | Implication |
|---|---|---|
| Claude | Real (--continue / --session-id) | Can do multi-turn refinement |
| Codex | Real (codex exec resume <UUID> / --last) — sessionId must be a real Codex UUID from ~/.codex/sessions/; --full-auto dropped on resume | Good for iterative work; pass resumeLatest:true for the most recent cwd session |
| Gemini | Real (--resume) | Good for iterative analysis |
| Grok | Real (--resume <id> / --continue) | Good for iterative review/diversity rounds |
This means:
resumeLatest:true or a real Codex session UUID — gateway-generated gw-* IDs are rejectedfullAuto:true is silently dropped on resumegw-* IDs are bookkeeping handles and rejected if replayedfullAuto is the most autonomous but most expensive per callmodel so configured fast defaults such as Haiku or Flash can apply.fullAuto: true and approvalStrategy: "mcp_managed" when the task needs autonomous code edits, tests, or shell commands.model unless the caller asked for a specific variant.correlationId on every request for tracing.status:"deferred" in responses, then poll every 60s. Results are durable for 30 days (LLM_GATEWAY_JOB_RETENTION_DAYS) — re-issuing the same call within the dedup window (LLM_GATEWAY_DEDUP_WINDOW_MS, default 1h) reattaches to the live job. Pass forceRefresh:true only when inputs actually changed.cli_versions to inspect installed CLI versions. Use cli_upgrade with dryRun:true first; run real upgrades only when the caller wants the local CLI updated. Grok self-updates via grok update; the same cli_upgrade tool routes it for you.promptParts over prompt — same routing rules apply per CLI, but the gateway hashes the stable prefix so you can verify cache effectiveness via cache-state://global or cache-state://prefix/{hash} (tokens / hashes only, no prompt text).npx claudepluginhub verivus-oss/llm-cli-gateway --plugin llm-gatewayCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.