By Emasoft
MCP server that offloads bounded LLM tasks from Claude Code to cheaper local (LM Studio, Ollama, vLLM, llama.cpp) or remote (OpenRouter) models. Profile-based configuration with ensemble mode.
Assess ONE OpenRouter model against EVERY LLM tool's per-tool REQUIREMENTS (TRDD-f45eeaa0) — free: makes NO LLM call (no token cost), only a public model-catalog fetch (no API key). Reports, per tool, whether the model meets that tool's hard requirements (cost / context / output / params) and which of the qualifying tools ALSO have a benchmark gate to run before assignment. Does NOT run any benchmark. Use to vet a candidate model across the whole tool surface at a glance. Trigger with "assess model X", "which tools can model X serve", "does model X meet the requirements".
Check, for every LLM tool that has a per-tool benchmark (security_scan, search_existing_implementations), whether its configured model has DEGRADED (per the durable model-health ledger) and, when so, run that tool's ADVISORY benchmark to surface the best SAME-OR-CHEAPER replacement. READ-ONLY by default — it writes a report and recommends, it never rewrites your settings. A pricier model is NEVER recommended. Trigger with "check tool replacements", "is any tool model degraded", "audit my per-tool models", or "auto-replace my LLM tool models". To actually adopt a recommendation, run the CLI `--apply` (below).
Benchmark every model in the active profile's free pool (or the bundled FREE_POOL_SEED if none is pinned) with one invocation. Same scoring as /llm-externalizer-benchmark but auto-fills the candidate set from the free-model list and refuses to run if any entry is not a ':free' model. Use this after switching free_only on, or to evaluate which free model best replaces a paid model.
Benchmark OpenRouter programming-category models against a TypeScript classification task. Filters by cost + capability, scores each candidate against 71 fixture functions + 3 literal keywords, writes a markdown comparison report. Use this to pick the cheapest model that still passes the real workload.
Redirect to user-only manual model/ensemble configuration. There is NO change_model MCP tool — the server is read-only by design. This command explains how to edit ~/.llm-externalizer/settings.yaml by hand and reload with reset, and points you at the optional benchmark / assess-model helpers for picking which models to use.
Opus-model variant. Verify and fix ONE LLM Externalizer per-file bug report. Input is a single absolute path to a report `.md`. Validates findings, applies minimal fixes only to REAL bugs, runs linters, writes a `.fixer.`-tagged summary, returns the summary path. Dispatched in parallel by `llm-externalizer-scan-and-fix` when the user picks "opus" on the model-menu prompt.
Sonnet-model variant. Verify and fix ONE LLM Externalizer per-file bug report. Input is a single absolute path to a report `.md`. Validates findings, applies minimal fixes only to REAL bugs, runs linters, writes a `.fixer.`-tagged summary, returns the summary path. Dispatched in parallel by `llm-externalizer-scan-and-fix` when the user picks "sonnet" on the model-menu prompt.
Use for a fast code review from the LLM Externalizer ensemble without loading scan output into the main context. Accepts a file/folder/glob and returns only report paths. Trigger with "review this file", "llm-ext review", "audit these files", "scan for bugs".
Opus-model variant. Fix exactly ONE bug from a markdown bug list produced by llm-externalizer-fix-found-bugs. Reads the bug-file absolute path, picks the highest-severity unfixed entry, applies a minimal surgical fix, updates the bug file with a ` — FIXED` marker plus a short post-mortem, returns a single-line summary. Dispatched per-bug when the user picks "opus" on the model-menu prompt.
Sonnet-model variant. Fix exactly ONE bug from a markdown bug list produced by llm-externalizer-fix-found-bugs. Reads the bug-file absolute path, picks the highest-severity unfixed entry, applies a minimal surgical fix, updates the bug file with a ` — FIXED` marker plus a short post-mortem, returns a single-line summary. Dispatched per-bug when the user picks "sonnet" on the model-menu prompt.
Internal maintainer dogfood harness for the llm-externalizer plugin — exercises every surface (CLI verbs, benchmark, MCP-tool wiring, all slash commands, all skills) with a zero-spend offline sweep plus an opt-in free-pool live smoke. NOT user-invocable; run via tests/dogfood/dogfood_test.py. Use when validating a release or after touching any tool, command, or skill.
Hugging Face Hub CLI (`hf`) reference for downloading, uploading, and managing models, datasets, and repos. Covers custom --local-dir placement, --include/--exclude file filters, --revision pinning, cache management, and `hf auth login` for gated repos. Use when the setup wizard's pre-built download_command needs extension. Loaded by llm-externalizer-setup-agent.
Find the highest-scoring models for a coding task by querying the official Hugging Face benchmark leaderboards, with memory-budget filtering and per-device fit. Use when the recommender script returns no compatible row and the user has explicitly widened the search. Loaded by llm-externalizer-setup-agent.
Run rigorous evaluations against Hugging Face Hub models using inspect-ai or lighteval on local hardware. Covers backend selection (vLLM / Transformers / accelerate), local GPU evals, smoke tests, and task selection. Use when the user wants a deeper benchmark than the wizard's 5-test compatibility check. Loaded by llm-externalizer-setup-agent.
Select GGUF artifacts and quantizations for llama.cpp on CPU, Mac Metal, CUDA, or ROCm runtimes. Covers Q4_K_M vs Q5_K_M vs Q6_K trade-offs, llama-server launch flags, --hf-repo/--hf-file fallback for non-standard naming, and conversion from Transformers weights when no GGUF exists. Use when the user picks llama.cpp / LM Studio / Ollama on non-Apple-Silicon platforms. Loaded by llm-externalizer-setup-agent.
Admin access level
Server config contains admin-level keywords
Uses power tools
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
This plugin requires configuration values that are prompted when the plugin is enabled. Sensitive values are stored in your system keychain.
openrouter_api_keyOpenRouter API key for remote and ensemble modes (https://openrouter.ai/keys). Stored in system keychain. If blank, falls back to $OPENROUTER_API_KEY from your shell environment — leave blank to keep existing shell-based setup.
${user_config.openrouter_api_key}Uses Bash, Write, or Edit tools
Uses Bash, Write, or Edit tools
Offload expensive code-scan work to cheap LLMs. Keep the fix loop local in Claude Code.
This plugin helps you review a codebase with a cheap model, and then fix the findings with your normal Claude Code session.
The work splits in two halves:
Keeping the fix half local means the expensive model only touches code when it actually needs to. The scan half does all the slow reading work on the cheap side.
Scan target: ~38 KLOC TypeScript repo. Per-run cost on OpenRouter — measured 2026-05.
/llm-externalizer:* — what you type in Claude Code)┌─────────────────────────────────────────────────────────────────────────┐
│ YOUR CLAUDE CODE SESSION (local — Sonnet / Opus / Haiku) │
│ │
│ /llm-externalizer:llm-externalizer-scan-and-fix │
│ │ │
│ │ 1. auto-discover codebase via git ls-files │
│ │ 2. call MCP tool "scan_folder" or "code_task" ───────┐ │
│ │ │ │
│ │ ▼ │
│ │ ┌─────────────────────────────────────────────────┐ │
│ │ │ MCP SERVER (bundled with plugin) │ │
│ │ │ │ │
│ │ │ FFD-batches files into ~400 KB payloads │ │
│ │ │ Streams each batch to the configured backend: │ │
│ │ │ • OpenRouter ensemble (3 models in parallel) │ │
│ │ │ • OpenRouter single model │ │
│ │ │ • LM Studio / Ollama / vLLM / llama.cpp │ │
│ │ │ • Nemotron free tier │ │
│ │ │ │ │
│ │ │ Writes per-file / per-group / merged reports │ │
│ │ │ to ./reports/llm-externalizer/*.md │ │
│ │ └─────────────────────────────────────────────────┘ │
│ │ │ │
│ │ 3. receive report paths (only paths — never bodies) │ │
│ │ 4. dispatch FIXER SUBAGENTS (local Claude Sonnet/Opus) │
│ │ • parallel: up to 15 concurrent, one per report │
│ │ • serial: one bug at a time from an aggregated list │
│ │ │
│ │ EACH FIXER subagent: │
npx claudepluginhub emasoft/emasoft-plugins --plugin llm-externalizerTask distribution, agent coordination, progress monitoring - executes plans via subagents. Requires AI Maestro for inter-agent messaging.
Comprehensive validation, management, and standardization suite for Claude Code plugins and marketplaces. Includes 190+ validation rules, plugin lifecycle management, marketplace operations, health checks, security auditing, GitHub repo validation, plugin/marketplace repo scaffolding, and standardization tooling. Features severity hierarchy, --strict mode, language-aware token estimation, and universal plugin/marketplace templates.
GHE (GitHub-Elements) - Automated project management for Claude Code using GitHub Issues as persistent memory with orchestrated DEV/TEST/REVIEW workflow.
Portable utility tools for Claude Code plugin marketplaces. Includes release automation and markdown TOC generation.
Exports current session segment (since last compaction) with system-reminder stripping -- main conversation, subagent transcripts, sidechains, and debug logs in structured markdown
When calling LLM APIs from Python code. When connecting to llamafile or local LLM servers. When switching between OpenAI/Anthropic/local providers. When implementing retry/fallback logic for LLM calls. When code imports litellm or uses completion() patterns.
Intelligent delegation framework for routing tasks to external LLM services while retaining strategic oversight
Delegate heavy code generation to a local LLM (Ollama / LM Studio). Save tokens, keep oversight.
Run AI models locally with Ollama - free alternative to OpenAI, Anthropic, and other paid LLM APIs. Zero-cost, privacy-first AI infrastructure.
Spawn any third-party LLM provider with an Anthropic-compatible API (e.g. DeepSeek, GLM, Kimi, Qwen, MiniMax) as real Claude Code agent-team teammates or one-shot subagents — driven exactly like native teammates. Your main session's own auth is untouched (OAuth subscription or API key, either works); provider workers bill the provider API key via apiKeyHelper (the key never enters env/argv/history). Requires the `cc-fleet` binary on PATH, installed separately.
AI-to-AI collaboration — review code, brainstorm ideas, and debate plans across Gemini, Codex, and Ollama