By jeremyeder
NVIDIA DGX Spark integration for Claude Code — local model serving, GPU monitoring, VM management, and hybrid AI workflows
Configure Claude Code to use the DGX Spark as a model backend — full local, hybrid (Opus primary + Spark for subagents), or failover mode. Use when switching between local and cloud inference, pointing Claude Code at Spark, or setting up hybrid workflows. Triggers on: "use local model", "switch to Spark", "switch to Anthropic", "hybrid mode", "point Claude Code at Spark", "use Spark for subagents".
Manage AI models on the DGX Spark — list, pull, serve, stop, and recommend models across Ollama and vLLM backends. Use when deploying models, checking what's running, pulling new models, or getting recommendations for a use case. Triggers on: model names (Qwen, Llama, DeepSeek, Gemma), "serve model", "pull model", "what models are running", "deploy model on Spark".
Set up and provision an NVIDIA DGX Spark from scratch or after factory reset. Use when configuring a new Spark, recovering from reset, or verifying system state. Triggers on: "set up DGX Spark", "configure Spark", "provision Spark", "factory reset".
Manage KVM/QEMU virtual machines on the DGX Spark. Create, start, stop, and snapshot VMs on the ARM64 hypervisor. Use when running VMs, creating virtual machines, or managing virtualization on the Spark. Triggers on: "create VM on Spark", "virtual machine", "KVM", "run Windows on Spark".
Set up and manage Tailscale VPN on the DGX Spark for remote access. Use when configuring remote access, setting up Tailscale, or troubleshooting VPN connectivity. Triggers on: "Tailscale", "VPN", "remote access to Spark", "access Spark from outside".
External network access
Connects to servers outside your machine
Uses power tools
Uses Bash, Write, or Edit tools
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
A Claude Code plugin for integrating the NVIDIA DGX Spark into AI development workflows. Provides local model serving, GPU monitoring, VM management, and hybrid local+cloud inference — all accessible through skills, commands, and MCP tools within Claude Code.
cp .env.example .env
# Edit .env with your Spark's hostname and SSH user
./deploy/install.sh
This rsyncs the project to your Spark, builds the MCP server Docker container, and starts it.
Update .mcp.json with your Spark's hostname:
{
"mcpServers": {
"dgx-spark": {
"type": "http",
"url": "http://YOUR-SPARK-HOSTNAME.local:3100/mcp"
}
}
}
Replace YOUR-SPARK-HOSTNAME with your Spark's actual hostname (e.g., jeder-spark). If using Tailscale for remote access, use the Tailscale hostname instead (e.g., http://jeder-spark:3100/mcp).
Claude Code reads this file to discover the MCP server. Without it, skills like /spark-status and all spark_* MCP tools will be unavailable.
# Add the marketplace (one-time)
claude plugin marketplace add jeremyeder/dgx-agentskills
# Install the plugin
claude plugin install dgx-spark@dgx-agentskills --scope user
Or from within a Claude Code session:
/plugin marketplace add jeremyeder/dgx-agentskills
/plugin install dgx-spark@dgx-agentskills
/spark-status
/spark-models pull qwen3.5:32b
/spark-models serve Qwen/Qwen3-Coder-Next --vllm
/spark-switch local
| Skill | Description |
|---|---|
spark-setup | Reproducible provisioning from scratch or after factory reset |
spark-models | Model lifecycle management across Ollama and vLLM |
spark-hybrid | Configure Claude Code to use Spark as model backend |
spark-vpn | Tailscale VPN setup for remote access |
spark-vms | KVM/QEMU virtual machine management |
| Command | Description |
|---|---|
/spark-status | Quick health check — system, GPU, models, VPN |
/spark-models [action] [model] | List, pull, serve, stop, or recommend models |
/spark-switch [mode] | Toggle between local, cloud, and hybrid backends |
| Tool | Description |
|---|---|
spark_get_status | System overview: uptime, CPU, memory, disk |
spark_gpu_utilization | GPU memory, compute %, temperature, power |
spark_list_models | All models across Ollama and vLLM |
spark_pull_model | Pull a model via Ollama |
spark_start_model | Start a vLLM container with tool-calling support |
spark_stop_model | Stop a model container |
spark_list_containers | All Docker containers on Spark |
spark_container_logs | Tail container logs |
spark_vpn_status | Tailscale connection state and peers |
spark_health_check | MCP server health with latency |
.env (repo root)| Variable | Description | Default |
|---|---|---|
SPARK_MCP_URL | MCP server URL | http://your-spark.local:3100 |
SPARK_MCP_URL_TAILSCALE | MCP URL via Tailscale | http://your-spark:3100 |
SPARK_HOST | Spark hostname for SSH | your-spark.local |
SPARK_USER | SSH username | jeder |
SPARK_VLLM_ENDPOINT | vLLM API endpoint | http://your-spark.local:8000 |
SPARK_OLLAMA_ENDPOINT | Ollama API endpoint | http://your-spark.local:11434 |
.env (deployed to ~/dgx-agentskills/.env)| Variable | Description | Default |
|---|---|---|
MCP_PORT | MCP server port | 3100 |
OLLAMA_HOST | Ollama API address | localhost:11434 |
VLLM_IMAGE | vLLM container image | nvcr.io/nvidia/vllm:latest |
VLLM_PORT | vLLM serving port | 8000 |
VLLM_GPU_MEMORY_UTILIZATION | GPU memory fraction for vLLM | 0.7 |
Mac (Claude Code)
│
├── Plugin (skills, commands, hooks)
│ └── .mcp.json → HTTP → DGX Spark MCP Server (:3100)
│
└── Claude Code session
└── ANTHROPIC_BASE_URL → DGX Spark vLLM (:8000)
DGX Spark (your-spark.local)
├── MCP Server (Docker container, port 3100)
│ ├── nvidia-smi (GPU metrics)
│ ├── docker CLI (container management)
│ ├── ollama CLI (model management)
│ └── tailscale CLI (VPN status)
├── Ollama (host, port 11434)
├── vLLM (Docker container, port 8000)
└── Tailscale (mesh VPN)
# Bootstrap dev environment
./scripts/setup-dev.sh
# Run tests
cd mcp-server && npm test
# Run linting
./scripts/lint.sh
npx claudepluginhub jeremyeder/dgx-agentskills --plugin dgx-sparkAI automation tools for development workflows: Python packaging analysis, Jira integration, RPM debugging, GitLab CI/CD, container CVE checking, and more.
AI-assisted inference on NVIDIA DGX Spark - run, manage, and stop LLM workloads
Claude Code skill pack for Vast.ai (24 skills)
SkyPilot agent skill for launching cloud VMs, Kubernetes pods, and Slurm jobs across 25+ clouds
Machine learning training and inference pipeline using cloud GPUs (Modal, Lambda Labs, RunPod) with HuggingFace ecosystem - no local GPU required
Skills for NVIDIAs ecosystem spans GPU acceleration, CUDA, AI agents, inference, robotics, Physical AI, Omniverse, and simulation. This plugin helps you understand the pieces, choose a path, validate your setup, and build practical NVIDIA-powered workflows.
Deploy and benchmark vLLM with Claude Code