From llm-router
Routes tasks to the cheapest capable model via llm-router MCP tools (Ollama, Codex, paid APIs in free-first order). Maps task types to cost-optimized calls.
How this skill is triggered — by the user, by Claude, or both
Slash command
/llm-router:routingThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Route tasks to the cheapest capable model automatically using llm-router MCP tools.
Route tasks to the cheapest capable model automatically using llm-router MCP tools.
Before answering research, code, writing, or analysis tasks — call the appropriate llm-router tool instead of answering directly. The router picks the cheapest model that can handle the task (Ollama → Codex → paid APIs in free-first order).
| Task type | Tool to call | Why |
|---|---|---|
| Simple factual question | llm_query | Gemini Flash / Groq — 50× cheaper than o3 |
| Research / current events | llm_research | Perplexity (web-grounded, not stale) |
| Writing / summaries / brainstorm | llm_generate | Gemini Flash / Haiku |
| Deep analysis / debugging | llm_analyze | GPT-4o / Gemini Pro |
| Code generation / refactoring | llm_code | Ollama → Codex built-in → o3 |
| Don't know which type | llm_auto | Auto-classifies + routes, tracks savings |
llm_query(prompt="What is the capital of France?")
llm_code(prompt="Refactor this function to use async/await", complexity="moderate")
llm_research(prompt="What changed in Python 3.13?")
llm_auto(prompt="<the full user request>") # safest default
Routing simple tasks to Gemini Flash instead of o3 saves ~50–100×.
llm_auto shows cumulative savings every 5 calls automatically.
Run llm_savings anytime to see your totals.
npx claudepluginhub ypollak2/llm-router --plugin llm-routerRoutes tasks to the optimal LLM based on type and complexity, avoiding Claude API costs. Automatically classifies prompts using heuristics, local Ollama, or cheap API models.
Routes AI tasks to optimal LLMs by analyzing budget, deployment (local/cloud), and modality (text/vision/coding). Fetches live model data via curl and runs Python router script.
Routes OpenRouter API calls to optimal models by task (e.g., code review to Claude-3.5-Sonnet) or prompt complexity for cost, quality, latency optimization in multi-model apps.