From nvidia-models
Use whenever the user wants a second opinion across LLMs, an ensemble comparison ("ask three models", "compare answers", "what do other LLMs think"), a cheap or fast alternative to Claude for a self-contained subtask, deep chain-of-thought reasoning from a reasoning-tuned model, or any explicit mention of NVIDIA models, NIM endpoints, Llama 3.x or Llama 4 Maverick, Kimi K2.6, DeepSeek V4 (Pro or Flash), Nemotron Super/Mini, Mixtral, Qwen3-Next, or "multi-agent". Routes via the nvidia-models MCP server (tools: list_models, nvidia_chat, nvidia_compare). Trigger phrases: "second opinion", "another model", "compare answers", "fan out", "ensemble", "ask three models", "delegate to llama", "use deepseek", "kimi", "qwen", "nemotron", "nvidia compare", "what would model X say".
How this skill is triggered — by the user, by Claude, or both
Slash command
/nvidia-models:nvidia-delegateThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Three MCP tools wired by this plugin:
Three MCP tools wired by this plugin:
mcp__nvidia-models__list_models — short keys → full NIM paths.mcp__nvidia-models__nvidia_chat(model, prompt, system?, max_tokens?, temperature?) — single-model call.mcp__nvidia-models__nvidia_compare(prompt, models?, system?, max_tokens?) — fan the same prompt across N models in one call. Default trio: llama-4-maverick, deepseek-v4-pro, qwen3-next-80b (three different vendors / architectures).| User intent | Tool | Suggested model |
|---|---|---|
| "Second opinion" / "what would another LLM say" | nvidia_compare | default 3 |
| "Compare answers from X, Y, Z" | nvidia_compare | user-specified list |
| "Quick draft" / "cheap inference" | nvidia_chat | llama-3.1-70b or mixtral-8x7b |
| Tiny / very fast | nvidia_chat | nemotron-mini-4b |
| Reasoning-heavy / chain-of-thought | nvidia_chat | deepseek-v4-pro or deepseek-v4-flash |
| Long-context generalist | nvidia_chat | llama-4-maverick |
| NVIDIA-tuned generalist | nvidia_chat | nemotron-super-49b |
| Qwen-family preference | nvidia_chat | qwen3-next-80b |
| "Ensemble" / "voting" / "multi-agent" | nvidia_compare | 3–5 generalists |
| Explicit "use " | nvidia_chat | the one named |
llama-4-maverick (or llama-3.3-70b for cheaper)deepseek-v4-pro (premium) or deepseek-v4-flash (fast)nemotron-mini-4bnvidia_compare (default trio: llama-4-maverick, deepseek-v4-pro, qwen3-next-80b)nvidia_compare: summarize agreements + disagreements, then surface the most useful answer.nvidia_chat: integrate the response into Claude's own answer; cite which model said what when it matters.NVIDIA_API_KEY not set → tell the user: setx NVIDIA_API_KEY "nvapi-..." on Windows, or export NVIDIA_API_KEY=... on macOS/Linux, then restart Claude Code.mcp__nvidia-models__list_models to find the right short key.llama-3.1-70b instead of llama-3.1-405b).The MCP server rejects requests that exceed: prompt 32,000 chars, system 8,000 chars, max_tokens 1–8192, temperature 0–2, models list 1–12 entries. Don't try to bypass — if the user asks for a longer prompt, chunk it first.
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
npx claudepluginhub najemwehbe/nvidia-models-plugin --plugin nvidia-models