From dgx-spark
Configure Claude Code to use the DGX Spark as a model backend — full local, hybrid (Opus primary + Spark for subagents), or failover mode. Use when switching between local and cloud inference, pointing Claude Code at Spark, or setting up hybrid workflows. Triggers on: "use local model", "switch to Spark", "switch to Anthropic", "hybrid mode", "point Claude Code at Spark", "use Spark for subagents".
How this skill is triggered — by the user, by Claude, or both
Slash command
/dgx-spark:spark-hybridThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Configure Claude Code sessions to use the Spark as a model backend, either fully or in hybrid mode.
Configure Claude Code sessions to use the Spark as a model backend, either fully or in hybrid mode.
All inference runs on the Spark. Best for proprietary code, offline work, or cost savings.
export ANTHROPIC_BASE_URL=http://your-spark.local:8000/v1
export ANTHROPIC_API_KEY=sk-dummy-key
export ANTHROPIC_MODEL=Qwen/Qwen3-Coder-Next
Prerequisites:
/spark-models serve)"hasCompletedOnboarding": true to ~/.claude.json if Claude Code still asks to sign inPrimary session uses Anthropic API (Opus), subagents and drafting use Spark-hosted models.
# Keep ANTHROPIC_BASE_URL pointing at Anthropic (default)
# Override the subagent model to point at Spark
export ANTHROPIC_DEFAULT_SONNET_MODEL=Qwen/Qwen3-Coder-Next
export ANTHROPIC_SONNET_BASE_URL=http://your-spark.local:8000/v1
Note: The exact environment variables for subagent model overrides must be verified against current Claude Code documentation. The vLLM integration docs are the primary reference: https://docs.vllm.ai/en/stable/serving/integrations/claude_code/
If the Spark is unreachable (detected by session-start hook), automatically fall back to Anthropic API. No special configuration — this is the default behavior when ANTHROPIC_BASE_URL is not set.
/spark-statusANTHROPIC_BASE_URLAfter switching, verify the backend is working:
# Check what model Claude Code is using
# The model name appears in the Claude Code status bar
# For vLLM endpoint, verify it responds
curl http://your-spark.local:8000/v1/models
# For Ollama endpoint
curl http://your-spark.local:11434/api/tags
Not all models work in the Claude Code harness. Requirements:
See the full model compatibility matrix in the spark-models skill, which includes token/s benchmarks and quantization details.
Models with / in HuggingFace names may need aliasing — Claude Code has issues with slashes in model names.
| Issue | Fix |
|---|---|
| Claude Code asks to sign in | Add "hasCompletedOnboarding": true and "primaryApiKey": "sk-dummy-key" to ~/.claude.json |
| Tool calls not working | Verify --enable-auto-tool-choice and correct --tool-call-parser flag on vLLM |
Model name with / rejected | Use a model alias or check vLLM --served-model-name flag |
| Slow responses | Check GPU memory pressure with /spark-status, consider smaller model |
| OOM errors | Use NVIDIA's vLLM container, not vllm/vllm-openai |
npx claudepluginhub jeremyeder/dgx-agentskills --plugin dgx-sparkProvides complete reference for Claude Code CLI including installation methods across macOS, Windows, Linux; basic usage; session management, output, and permission flags. Ideal for quick command lookups.
Builds LLM-powered apps with Claude API, Anthropic SDKs, or Agent SDK in Python, TypeScript/JS, Java/Kotlin/Scala, Go, Ruby, C#, PHP by detecting project language and providing tailored examples.