Universal LLM facade MCP server - dual-layer information architecture abstracting any LLM backend (local or cloud) with typed extensions
An MCP server that provides a universal abstraction layer for interacting with any LLM backend -- local or cloud -- through a single, stable interface.
One MCP server surface. Any LLM behind it. The consumer sends a generation request using a normalized vocabulary. The facade routes it to whichever backend is configured, translating parameters and response shapes as needed.
The architecture has two layers:
npm install
npm run build
The server communicates via stdio. Add it to your MCP client config:
{
"mcpServers": {
"server": {
"command": "node",
"args": ["/path/to/llm-api-facade/dist/index.js"]
}
}
}
Or install as a Claude Code plugin from the RedJay marketplace.
Providers auto-register when their env vars are set. Ollama is always on.
| Provider | Env Var | Adapter |
|---|---|---|
| Ollama (local) | Always on | OpenAI-compat |
| OpenAI | OPENAI_API_KEY | OpenAI-compat |
| Anthropic | ANTHROPIC_API_KEY | Dedicated |
| Google Gemini | GEMINI_API_KEY | Dedicated |
| Cohere | COHERE_API_KEY | Dedicated |
| Mistral | MISTRAL_API_KEY | OpenAI-compat |
| xAI (Grok) | XAI_API_KEY | OpenAI-compat |
| vLLM | VLLM_BASE_URL | OpenAI-compat |
| LM Studio | LMSTUDIO_BASE_URL | OpenAI-compat |
| llama.cpp | LLAMACPP_BASE_URL | OpenAI-compat |
| Tool | Description |
|---|---|
complete | Send messages to any LLM, receive a completion. Supports tools, structured output, all sampling parameters. |
stream_complete | Streaming variant. Returns accumulated chunks with usage. |
list_models | List configured providers. |
The architecture enforces a clean boundary -- the seam -- between two zones:
Consumer Side | THE SEAM | Provider Side
Layer 1: Universal | Normalizes | Provider-specific SDKs
Layer 2: Extensions | Organizes | Native API formats
Typed errors | | Raw error responses
Capability discovery | | Feature negotiation
Layer 1 normalizes (many shapes into one). Layer 2 organizes (provider-specific features into typed, discoverable extensions). Infrastructure concerns (auth, retry, transport) never cross the seam.
Implemented and tested (50 scenarios across Ollama + OpenAI):
Adapters (all 11 providers covered):
Not yet implemented:
Documentation/
Architecture/
Principles.md # 8 governing principles (dual-layer)
DomainModel.md # Universal concepts, behavioral contracts, the seam
McpServerSpec.md # MCP tools, resources, schemas, error codes (v0.3.0)
OntologicalTaxonomy.md # Categorical framework, cross-validated
TypeSpecification.md # Formal types, 48+ invariants, state machine
SoftSpots.md # 13 resolved weak points with positions taken
ToolCallingChoreography.md # Multi-turn tool flows, 7-dimension provider divergence
PositionPaper-*.md # Facade as information architecture
ExtensionCatalog.md # 5 extensions with schemas and adapter tables
Decisions/
ADR-001 through ADR-007 # Architecture decision records
Vendors/
OpenAI, Anthropic, Gemini, Mistral/Cohere/xAI, Local Runtimes
MIT
Admin access level
Server config contains admin-level keywords
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
npx claudepluginhub joshuaramirez/claude-code-plugins --plugin llm-api-facadeRoslyn-powered C# refactoring MCP server — 41 tools for code navigation, analysis, generation, and refactoring across entire .NET solutions
Integrates Azure DevOps with Claude Code via the official Microsoft Azure DevOps MCP server for work items, repositories, pipelines, and wikis
24 MCP tools for color space conversions, harmony generation, accessibility validation, and cultural meaning lookup.
Universal prompt creation engine — MCP server with 12-axis philosophical manifold for principled prompt construction
Pipes-and-filters orchestration for chunk-level analysis. Splits a corpus into chunks, runs a deterministic DAG of sub-agent filters (map / reduce / fan-out / fan-in / loop / terminal), and unions the results.
When calling LLM APIs from Python code. When connecting to llamafile or local LLM servers. When switching between OpenAI/Anthropic/local providers. When implementing retry/fallback logic for LLM calls. When code imports litellm or uses completion() patterns.
Smart LLM routing with Claude subscription monitoring, complexity-first model selection, and 20+ AI providers
OpenRouter SDK plugin - unified interface for 500+ LLM models with intelligent routing, cost optimization, and framework integrations (Vercel AI SDK, LangChain, OpenAI SDK, PydanticAI)
Spawn any third-party LLM provider with an Anthropic-compatible API (e.g. DeepSeek, GLM, Kimi, Qwen, MiniMax) as real Claude Code agent-team teammates or one-shot subagents — driven exactly like native teammates. Your main session's own auth is untouched (OAuth subscription or API key, either works); provider workers bill the provider API key via apiKeyHelper (the key never enters env/argv/history). Requires the `cc-fleet` binary on PATH, installed separately.
TrueFoundry AI Gateway plugin — onboarding, model routing, MCP servers, prompts, Skills Registry workflows, observability, guardrails, and codebase migration. Works across Claude Code, Codex, and Cursor.
Flagship+ skill pack for OpenRouter - 30 skills for multi-model routing, fallbacks, and LLM gateway mastery