Skill

model-routing

Dispatches tasks to the optimal LLM (Codex, Gemini, Claude, etc.) based on task type and security needs, with configurable approval strategies and polling behavior.

ai-ml

developer-tools

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/llm-gateway:model-routing

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Choose the right LLM for each task. Based on real usage across 11+ VerivusAI projects.

SKILL.md

180 lines · ~2.9k tokens

Stats

LanguageTypeScript

Stars9

MaintenanceExcellent

Last CommitJun 18, 2026

Actions

View Source View Plugin View on GitHub View README

Model Routing

Choose the right LLM for each task. Based on real usage across 11+ VerivusAI projects.

Dispatch Defaults

Apply these on every dispatch unless the caller has explicitly overridden a rule in the current turn:

Omit model — let the gateway use its configured default per CLI. Nominating a model risks deprecated IDs (o3, o3-pro, gpt-4o, …) and capability mismatches. Call list_models only when the caller has asked for a specific variant.
approvalStrategy:"mcp_managed" is the skill dispatch default (the gateway schema default is "legacy"). It gates the request before execution, then sets each provider to a safe accept-edits-level mode (auto-accept file edits; Bash and other dangerous tools stay gated): Claude and Grok --permission-mode acceptEdits, Mistral --agent accept-edits, and Gemini prompted default (the agy CLI has no accept-edits rung, so Gemini cannot auto-approve mutating tools under mcp_managed). Codex still needs fullAuto:true for autonomous file/shell work (its sandboxed workspace-write mode is unchanged). Full unattended execution requires the operator opt-in LLM_GATEWAY_APPROVAL_ALLOW_BYPASS=1, which restores each provider's full auto-approve mode (Claude bypassPermissions, Grok --always-approve, Mistral auto-approve, Gemini --dangerously-skip-permissions).
No wallclock timeout; poll every 60 s — idleTimeoutMs is a separate no-output safeguard.
Iterate until unconditional APPROVED (review dispatches only) — every review prompt must end with "End with APPROVED or NOT APPROVED with findings." Loop: dispatch → parse verdict → on NOT APPROVED or conditional, fix + re-review → repeat. Escalate after 3 rounds. This rule does not apply to pure implementation or non-review analysis dispatches.

Decision Matrix

All tool invocations below use the dispatch defaults above (omit model, approvalStrategy:"mcp_managed", fullAuto:true for Codex, poll every 60 s, loop on reviews).

Task	Best LLM	Why	Tool
Code implementation	Codex	Strongest at writing correct code, handles large codebases	`codex_request` (`fullAuto:true`, `approvalStrategy:"mcp_managed"`)
Code review (quality)	Codex	Thorough, finds real issues, gives actionable feedback	`codex_request` (`fullAuto:true`, `approvalStrategy:"mcp_managed"`)
Code review (security)	Gemini	Strong security focus, OWASP awareness, edge case detection	`gemini_request` (`approvalStrategy:"mcp_managed"`)
Architecture review	Claude	Best at high-level design, pattern recognition, trade-off analysis	`claude_request` (`approvalStrategy:"mcp_managed"`)
Design doc review	Codex	Checks feasibility, completeness, finds gaps in plans	`codex_request` (`fullAuto:true`, `approvalStrategy:"mcp_managed"`)
Bug investigation	Codex	Can read code, trace logic, identify root causes	`codex_request` (`fullAuto:true`, `approvalStrategy:"mcp_managed"`)
Refactoring	Codex	Handles multi-file changes reliably	`codex_request` (`fullAuto:true`, `approvalStrategy:"mcp_managed"`)
Documentation	Claude	Best prose quality, understands audience	`claude_request` (`approvalStrategy:"mcp_managed"`)
Test generation	Codex	Understands test frameworks, generates comprehensive cases	`codex_request` (`fullAuto:true`, `approvalStrategy:"mcp_managed"`)
Security audit	Gemini	Security-focused analysis, threat modeling	`gemini_request` (`approvalStrategy:"mcp_managed"`)
Multi-file analysis	Codex	Handles large codebases with sqry integration	`codex_request` (`fullAuto:true`, `approvalStrategy:"mcp_managed"`)
Diversity / tie-breaker review	Grok (xAI)	Independent fourth model from a different vendor family — useful when Claude/Codex/Gemini might share a blind spot	`grok_request` (`approvalStrategy:"mcp_managed"`)
Maximum diversity	Mistral Vibe	Fifth independent vendor (EU / open-weights family); uncorrelated with OpenAI/Anthropic/Google/xAI	`mistral_request` (`approvalStrategy:"mcp_managed"`)
Consensus / unanimous gate	All five in parallel	Catches issues any single model misses; use when correctness > cost	`*_request_async` for Claude/Codex/Gemini/Grok/Mistral

Model Selection Rules

Rule 1: Omit the model parameter by default

The gateway uses sensible configured defaults. Omitting model is almost always correct.

codex_request({prompt: "...", fullAuto: true, approvalStrategy: "mcp_managed"})
gemini_request({prompt: "...", approvalStrategy: "mcp_managed"})
claude_request({prompt: "...", approvalStrategy: "mcp_managed"})
grok_request({prompt: "...", approvalStrategy: "mcp_managed"})

Rule 2: Avoid stale hardcoded model IDs

Treat old memory/config IDs such as o3, o3-pro, and gpt-4o as legacy unless list_models currently reports them for the target CLI.

If you see stale IDs in old configs or memory, prefer the configured default or call list_models.

Rule 3: Specify model only when the caller has asked for a specific variant

The dispatch default is to omit model. Only include it if the user has explicitly named a model in the current turn.

// Only when the caller asked for this specific variant:
gemini_request({prompt: "...", model: "<explicit-user-request>", approvalStrategy: "mcp_managed"})

Rule 4: Check available models when unsure

list_models()                    // All CLIs
list_models({cli: "codex"})      // Codex models only

Rule 5: Use `promptParts` to share stable prefix bytes across calls

When the same long system / tools / context block is sent to multiple CLIs (parallel reviews, consensus, multi-round loops), switch from prompt to the structured promptParts field:

codex_request({
  promptParts: {
    system:  "<long stable system instruction>",
    tools:   "<long stable tool description>",
    context: "<long stable file dump / spec>",
    task:    "Implement X per the above."
  },
  fullAuto: true,
  approvalStrategy: "mcp_managed"
})

The gateway concatenates in canonical order system → tools → context → task, so the stable prefix bytes are byte-identical across the parallel dispatch and across rounds — that raises implicit cache hit rate at each provider with no special-case API contortions. prompt and promptParts are mutually exclusive (the runtime returns provide exactly one of \prompt` or `promptParts`if both are supplied). The stable prefix hash is recorded in the flight recorder and queryable viacache-state://prefix/{hash}` so you can verify the prefix actually got shared.

For short one-off questions, plain prompt is fine.

Delegation Patterns

"Ask Codex to implement"

The most common pattern. Codex with fullAuto handles implementation + testing:

codex_request({
  prompt: "Implement [feature] in [path]. Requirements:\n- [req 1]\n- [req 2]\n\nInclude tests.",
  fullAuto: true,
  approvalStrategy: "mcp_managed"
})

"Ask Codex to review"

Second most common. Codex reviews with full codebase access:

codex_request({
  prompt: "Review [path] for [criteria]. End with APPROVED or NOT APPROVED with findings.",
  fullAuto: true,
  approvalStrategy: "mcp_managed"
})

"Ask Gemini for security perspective"

For security-sensitive changes:

gemini_request({
  prompt: "Security audit [path]. Check for injection, auth bypass, data leaks, OWASP Top 10. End with APPROVED or NOT APPROVED with findings.",
  approvalStrategy: "mcp_managed"
})

"Parallel review from multiple LLMs"

For comprehensive coverage:

codex_request_async({prompt: "Review [path] for correctness... End with APPROVED or NOT APPROVED with findings.", fullAuto: true, approvalStrategy: "mcp_managed", correlationId: "review-codex"})
gemini_request_async({prompt: "Security audit [path]... End with APPROVED or NOT APPROVED with findings.", approvalStrategy: "mcp_managed", correlationId: "review-gemini"})
grok_request_async({prompt: "Independent review of [path]... End with APPROVED or NOT APPROVED with findings.", approvalStrategy: "mcp_managed", correlationId: "review-grok"})

Session Continuity Implications

Model routing affects session strategy:

LLM	Session Continuity	Implication
Claude	Real (`--continue` / `--session-id`)	Can do multi-turn refinement
Codex	Real (`codex exec resume <UUID>` / `--last`) — sessionId must be a real Codex UUID from `~/.codex/sessions/`; `--full-auto` dropped on resume	Good for iterative work; pass `resumeLatest:true` for the most recent cwd session
Gemini	Real (`--resume`)	Good for iterative analysis
Grok	Real (`--resume <id>` / `--continue`)	Good for iterative review/diversity rounds

This means:

All four CLIs support multi-turn workflows through the gateway
Codex resume requires either resumeLatest:true or a real Codex session UUID — gateway-generated gw-* IDs are rejected
Resumed Codex sessions inherit the original approval policy; fullAuto:true is silently dropped on resume
Gemini-generated gw-* IDs are bookkeeping handles and rejected if replayed

Cost Considerations

Codex with fullAuto is the most autonomous but most expensive per call
Gemini is generally cheaper for review tasks
Claude is middle ground
Grok depends on xAI billing/pricing — treat as an extra reviewer slot for high-stakes paths, not the default
For routine reviews: single LLM (Codex) is sufficient
For critical reviews: parallel multi-LLM (see multi-llm-consensus skill); add Grok when consensus across vendor families matters
For huge codebases: use async variants to avoid blocking

Tips

For routine read-only analysis, drafting, or review, prefer Claude or Gemini with omitted model so configured fast defaults such as Haiku or Flash can apply.
Use Codex with fullAuto: true and approvalStrategy: "mcp_managed" when the task needs autonomous code edits, tests, or shell commands.
For security-specific work, always include Gemini. Add Grok for an independent vendor-family perspective when stakes are high.
Don't overthink model selection — the default is almost always fine. Omit model unless the caller asked for a specific variant.
Use correlationId on every request for tracing.
If a task exceeds 45s, it auto-defers. Check for status:"deferred" in responses, then poll every 60s. Results are durable for 30 days (LLM_GATEWAY_JOB_RETENTION_DAYS) — re-issuing the same call within the dedup window (LLM_GATEWAY_DEDUP_WINDOW_MS, default 1h) reattaches to the live job. Pass forceRefresh:true only when inputs actually changed.
Use cli_versions to inspect installed CLI versions. Use cli_upgrade with dryRun:true first; run real upgrades only when the caller wants the local CLI updated. Grok self-updates via grok update; the same cli_upgrade tool routes it for you.
For prefix-stable workloads, prefer promptParts over prompt — same routing rules apply per CLI, but the gateway hashes the stable prefix so you can verify cache effectiveness via cache-state://global or cache-state://prefix/{hash} (tokens / hashes only, no prompt text).

model-routing

Popularity

Invocation

Context Preview

SKILL.md

model-routing

Popularity

Invocation

Context Preview

SKILL.md

Model Routing

Dispatch Defaults

Decision Matrix

Model Selection Rules

Rule 1: Omit the model parameter by default

Rule 2: Avoid stale hardcoded model IDs

Rule 3: Specify model only when the caller has asked for a specific variant

Rule 4: Check available models when unsure

Rule 5: Use promptParts to share stable prefix bytes across calls

Delegation Patterns

"Ask Codex to implement"

"Ask Codex to review"

"Ask Gemini for security perspective"

"Parallel review from multiple LLMs"

Session Continuity Implications

Cost Considerations

Tips

Similar Skills

Model Routing

Dispatch Defaults

Decision Matrix

Model Selection Rules

Rule 1: Omit the model parameter by default

Rule 2: Avoid stale hardcoded model IDs

Rule 3: Specify model only when the caller has asked for a specific variant

Rule 4: Check available models when unsure

Rule 5: Use promptParts to share stable prefix bytes across calls

Delegation Patterns

"Ask Codex to implement"

"Ask Codex to review"

"Ask Gemini for security perspective"

"Parallel review from multiple LLMs"

Session Continuity Implications

Cost Considerations

Tips

Similar Skills

Rule 5: Use `promptParts` to share stable prefix bytes across calls

Rule 5: Use `promptParts` to share stable prefix bytes across calls