By adibirzu
Route Claude Code to 16+ LLM backends through one gateway. Token tracking, cost dashboard, shared memory, model discovery, and slash commands — all local.
Ask a question to a specific LLM model (e.g., /llm-ask oca/gpt5 explain this code)
Query multiple LLMs in parallel for diverse perspectives
Discover available models from all LLM backends
Run MultiLLM production-readiness checks and show actionable setup issues
Search, store, or list shared LLM memories (cross-LLM local RAG)
Use this agent when the user needs a major architectural decision reviewed by multiple AI models simultaneously. It queries Claude Haiku, GPT-4o-mini, DeepSeek, and local Llama in parallel and synthesizes a consensus answer. Invoke for: "get multiple opinions", "council review", "compare LLMs on this", or any architectural decision that benefits from diverse perspectives.
Comprehensive reference agent for all cross-LLM collaboration patterns. Use this when you need to understand how to route work between models, share context across sessions, use shared memory, or leverage the full MultiLLM agent roster. This agent documents all available tools, agents, commands, and orchestration patterns.
Use this agent to compress large files, logs, or documents into a short summary using a FREE local Ollama model — preserving tokens in the main conversation. Invoke when: files are large (>200 lines), when exploring logs, or when the user says "summarize this file cheaply" or "save tokens".
Use this agent to do a security code review of any code, config, or infrastructure change. It asks GPT-4o (via OpenRouter) for a second opinion so you get a perspective from a different model family than Claude. Invoke when: code changes touch auth, crypto, network rules, IAM, secrets, or anything security-sensitive. Also invoke explicitly with "security-reviewer: review this".
Open the MultiLLM dashboard showing sessions, token usage, costs, and backend status. Use when the user asks about LLM usage, costs, dashboard, or wants to see model statistics.
Route work through the local MultiLLM gateway and decide when to ask other LLMs or helper agents for support. Use when Codex should leverage Claude, OCA, GPT, local models, or MultiLLM specialist agents for second opinions, architecture review, security review, context handoff, dashboard checks, or multi-device session consolidation.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Open-source multi-tenant LLM gateway. Route one API to 16+ backends, ship
docker compose up, own your data.
docker compose up brings up the whole gateway. No vendor account, no per-seat pricing, no telemetry that leaves your network.The shortest path from git clone to a working /v1/messages request, using a local Ollama backend.
Prerequisites: Docker (with docker compose) and Ollama already running locally (ollama serve and ollama pull llama3.2).
Clone
git clone https://github.com/adibirzu/multillm.git
cd multillm
Configure
cp .env.example .env
Start the gateway
docker compose up -d
Open the setup wizard
open http://localhost:8080/setup
Walk through the wizard. Create the admin account. On the backends pane, paste http://host.docker.internal:11434 as OLLAMA_URL and skip the other backends. Finish.
Send your first request
curl -X POST http://localhost:8080/v1/messages \
-H 'Content-Type: application/json' \
-d '{"model":"ollama/llama3.2","messages":[{"role":"user","content":"Say hi"}]}'
You should get back an Anthropic-format response containing the model's reply.
If you don't have Ollama installed, follow the same flow with any cloud backend by pasting its API key in the
/setupwizard's backends pane.
Claude Code / OpenAI SDK / curl
│
▼
┌────────────────────┐
│ MultiLLM :8080 │ FastAPI + httpx (HTTP/2 pooling)
│ ─ routing │
│ ─ streaming (SSE) │
│ ─ tracking │
│ ─ resilience │
│ ─ shared memory │
└────────┬───────────┘
│
┌────────┴───────────┐
│ 16 backends │
│ Ollama / LM Studio│
│ OpenAI / Anthropic│
│ Gemini / Groq … │
└────────────────────┘
Data lives in MULTILLM_HOME (defaults to ~/.multillm/ or the compose-mounted ./.multillm/): SQLite tracking, FTS5 shared memory, automatic pre-migration backups. For production deployment recipes (Docker Compose, systemd, Kubernetes) see docs/operations/deployment.md.
| Backend | Type | Auth mode | Streaming |
|---|---|---|---|
| Ollama | Local | — | ✓ (SSE) |
| LM Studio | Local | — | ✓ (SSE) |
| Codex CLI | Local | Local CLI | ✓ |
| Gemini CLI | Local | Local CLI | ✓ |
| OpenAI | Cloud | API key | ✓ (SSE) |
| Anthropic | Cloud | API key | ✓ (SSE) |
| Gemini | Cloud | API key | ✓ (SSE) |
| OpenRouter | Cloud | API key | ✓ (SSE) |
| Groq | Cloud | API key | ✓ (SSE) |
| DeepSeek | Cloud | API key | ✓ (SSE) |
| Mistral | Cloud | API key | ✓ (SSE) |
| Together | Cloud | API key | ✓ (SSE) |
| xAI (Grok) | Cloud | API key | ✓ (SSE) |
| Fireworks | Cloud | API key | ✓ (SSE) |
| Azure OpenAI | Cloud | API key | ✓ (SSE) |
| AWS Bedrock | Cloud | Cloud IAM | ✓ (SSE) |
| OCA | Enterprise | OAuth (PKCE) | ✓ (SSE) |
npx claudepluginhub adibirzu/multillmTenancy-agnostic Oracle Cloud Infrastructure (OCI) administration for Claude Code — safety-first skills for IAM, Security & Compliance, Observability & Database, Networking & Compute, Cost & Usage (FinOps), Log Analytics (OCL queries), Resource Manager (Terraform stacks), Data Safe, and Events & Functions (serverless). Plus a project lifecycle orchestrator and a Stage 0 solution-design front-end, all grounded in official Oracle docs (Open Knowledge Format) and routed against the upstream oracle/skills collection. Work by friendly context name instead of raw OCIDs; every mutation is preflighted, redacted, and confirmation-gated.
Recursive Language Model (RLM) v3 — dual-mode execution, git-aware incremental analysis, memory persistence, token-aware processing, FINAL protocol, adaptive budgets
Multi-LLM gateway plugin for Claude Code and Codex workflows — phase-based orchestration, 8 agents, 11 commands, checkpoint discipline
Pre-production security audit, dependency hardening, CI/CD validation, and Docker readiness checks for Claude Code
When calling LLM APIs from Python code. When connecting to llamafile or local LLM servers. When switching between OpenAI/Anthropic/local providers. When implementing retry/fallback logic for LLM calls. When code imports litellm or uses completion() patterns.
Spawn any third-party LLM provider with an Anthropic-compatible API (e.g. DeepSeek, GLM, Kimi, Qwen, MiniMax) as real Claude Code agent-team teammates or one-shot subagents — driven exactly like native teammates. Your main session's own auth is untouched (OAuth subscription or API key, either works); provider workers bill the provider API key via apiKeyHelper (the key never enters env/argv/history). Requires the `cc-fleet` binary on PATH, installed separately.
Flagship+ skill pack for OpenRouter - 30 skills for multi-model routing, fallbacks, and LLM gateway mastery
OpenRouter SDK plugin - unified interface for 500+ LLM models with intelligent routing, cost optimization, and framework integrations (Vercel AI SDK, LangChain, OpenAI SDK, PydanticAI)
Smart LLM routing with Claude subscription monitoring, complexity-first model selection, and 20+ AI providers
Fuse the Claude Code model with OpenAI Codex and Google Gemini: query all three in parallel, then Claude judges, synthesizes, and acts.