From dgx-spark
Set up and provision an NVIDIA DGX Spark from scratch or after factory reset. Use when configuring a new Spark, recovering from reset, or verifying system state. Triggers on: "set up DGX Spark", "configure Spark", "provision Spark", "factory reset".
How this skill is triggered — by the user, by Claude, or both
Slash command
/dgx-spark:spark-setupThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Reproducible provisioning for the NVIDIA DGX Spark. Each phase is idempotent — safe to re-run.
Reproducible provisioning for the NVIDIA DGX Spark. Each phase is idempotent — safe to re-run.
.env configured with SPARK_HOST and SPARK_USERExecute in order. Skip phases that are already complete.
# Verify SSH access
ssh -o ConnectTimeout=5 ${SPARK_USER}@${SPARK_HOST} "echo 'SSH OK' && uname -a"
If SSH fails, guide the user through NVIDIA Sync setup or manual SSH key configuration.
ssh ${SPARK_USER}@${SPARK_HOST} "sudo apt update && sudo apt upgrade -y"
ssh ${SPARK_USER}@${SPARK_HOST} "nvidia-smi && nvcc --version"
Record CUDA version, driver version, and DGX OS version.
Ollama comes pre-installed on DGX Spark via snap.
# Verify Ollama is running
ssh ${SPARK_USER}@${SPARK_HOST} "ollama --version && systemctl status snap.ollama.daemon"
# Configure for remote access (bind to all interfaces)
ssh ${SPARK_USER}@${SPARK_HOST} "sudo snap set ollama bind=0.0.0.0:11434"
# Pull a starter model
ssh ${SPARK_USER}@${SPARK_HOST} "ollama pull llama3.1:8b"
Pull NVIDIA's custom vLLM container optimized for DGX Spark (Blackwell architecture, sm_121a).
# Log in to NVIDIA container registry
ssh ${SPARK_USER}@${SPARK_HOST} "docker login nvcr.io"
# Pull the vLLM image
ssh ${SPARK_USER}@${SPARK_HOST} "docker pull nvcr.io/nvidia/vllm:latest"
Do NOT use the standard vllm/vllm-openai image — it produces erroneous OOM errors on DGX Spark.
ssh ${SPARK_USER}@${SPARK_HOST} "curl -fsSL https://tailscale.com/install.sh | sh"
ssh ${SPARK_USER}@${SPARK_HOST} "sudo tailscale up"
# User completes auth in browser
ssh ${SPARK_USER}@${SPARK_HOST} "sudo tailscale set --ssh"
After auth, update Mac-side .env with SPARK_MCP_URL_TAILSCALE.
Docker and NVIDIA Container Runtime are pre-installed on DGX Spark.
# Verify
ssh ${SPARK_USER}@${SPARK_HOST} "docker info | grep -i runtime"
# Ensure user is in docker group (no sudo required)
ssh ${SPARK_USER}@${SPARK_HOST} "groups | grep -q docker || sudo usermod -aG docker \$USER"
# Disable password auth
ssh ${SPARK_USER}@${SPARK_HOST} "sudo sed -i 's/^#*PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config && sudo systemctl restart ssh"
# Enable UFW with SSH + MCP + Ollama + vLLM
ssh ${SPARK_USER}@${SPARK_HOST} "sudo ufw allow ssh && sudo ufw allow 3100/tcp && sudo ufw allow 11434/tcp && sudo ufw allow 8000/tcp && sudo ufw --force enable"
Run the deploy script from the Mac:
./deploy/install.sh
This rsyncs the project to the Spark, builds the Docker container, and starts the MCP server.
# Health check
curl http://${SPARK_HOST}:3100/health
# GPU status via MCP
# Use /spark-status command
# Test Ollama
curl http://${SPARK_HOST}:11434/api/tags
# Test vLLM (if a model is running)
curl http://${SPARK_HOST}:8000/v1/models
After all phases, generate spark-setup-report.md with:
npx claudepluginhub jeremyeder/dgx-agentskills --plugin dgx-sparkInstalls sparkrun CLI and configures NVIDIA DGX Spark clusters including SSH mesh for multi-node inference, CX7 networking, Docker groups, file permissions, page cache clearing, and earlyoom OOM protection.
Reviews NVIDIA GPU infrastructure deployments (DGX, HGX, MGX) against reference architectures, checking BMC segmentation, firmware, driver versions, ECC, persistence mode, and MIG configuration.
Installs and configures NVIDIA NemoClaw sandboxed AI agent platform on Linux using Docker, OpenShell, cgroup fixes, Cloudflare tunnels, and NVIDIA API keys.