From content-specialized
Manage remote CUDA development machine (cuda-dev) with RTX 5090 GPU. Use when the user needs to: (1) Connect to or check status of the cuda-dev machine, (2) Start/stop/manage vLLM inference servers, (3) Check GPU status or CUDA environment, (4) Compile or run CUDA programs, (5) Manage LiteLLM proxy for AI coding assistants, (6) Shutdown/reboot the remote machine, or (7) Configure VS Code Remote SSH for CUDA development.
How this skill is triggered — by the user, by Claude, or both
Slash command
/content-specialized:cuda-remote-managerThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Manage the remote CUDA development machine (cuda-dev) with RTX 5090 GPU for deep learning inference and CUDA programming.
Manage the remote CUDA development machine (cuda-dev) with RTX 5090 GPU for deep learning inference and CUDA programming.
Connection: ssh cuda-dev (alias for [email protected])
GPU Check: bash scripts/check_cuda.sh
Start vLLM: bash scripts/start_vllm.sh [MODEL] [MAX_LEN] [GPU_UTIL] [PORT]
Power Management: bash scripts/manage_power.sh {shutdown|reboot|status}
See references/machine_specs.md for complete hardware specs, SSH config, and project directories.
Key specs:
Use the check script to see GPU status, CUDA toolkit, vLLM process, and disk space:
bash scripts/check_cuda.sh
Or manually:
ssh cuda-dev 'nvidia-smi'
ssh cuda-dev 'ps aux | grep "[p]ython -m vllm"'
Default (Qwen2.5-Coder-7B with 128K context):
bash scripts/start_vllm.sh
With custom parameters:
bash scripts/start_vllm.sh MODEL_NAME MAX_CONTEXT GPU_UTILIZATION PORT
Example - DeepSeek R1 with 32K context:
bash scripts/start_vllm.sh deepseek-ai/DeepSeek-R1-Distill-Qwen-32B 32768 0.85 8000
Important: Always use Flash Attention 2 for RTX 5090. The script handles this automatically.
Monitor startup:
ssh cuda-dev 'tail -f ~/vllm-models/vllm-*.log'
ssh cuda-dev 'pkill -f "python -m vllm"'
Navigate to CUDA project on remote:
ssh cuda-dev 'cd ~/cuda-learning/01-hello-cuda && make clean && make'
Run compiled program:
ssh cuda-dev 'cd ~/cuda-learning/01-hello-cuda && ./hello_v2'
Architecture flag: Always use -arch=sm_120 for RTX 5090 in Makefiles.
Check if machine is online:
bash scripts/manage_power.sh status
Shutdown for the night:
bash scripts/manage_power.sh shutdown
Reboot:
bash scripts/manage_power.sh reboot
LiteLLM runs locally and proxies to the remote vLLM server.
cd /Users/thesolutionarchitect/Documents/source/litellm
source venv/bin/activate
litellm --config cuda_vllm_config.yaml --port 4000
Or run in background:
nohup litellm --config cuda_vllm_config.yaml --port 4000 > litellm.log 2>&1 &
Important: Ensure .env file is renamed to .env.backup to avoid database errors.
curl http://localhost:4000/v1/models -H "Authorization: Bearer sk-litellm-cuda"
After changing vLLM model, update /Users/thesolutionarchitect/Documents/source/litellm/cuda_vllm_config.yaml:
model field to match vLLM modelmax_input_tokens and max_output_tokens to match context windowpkill -f litellm && litellm --config cuda_vllm_config.yaml --port 4000Install extensions:
Grant Local Network permission (macOS):
Connect to cuda-dev:
cuda-devOpen workspace:
/home/m/cuda-learningThe workspace at /home/m/cuda-learning/cuda-remote.code-workspace is pre-configured with:
/usr/local/cuda-13.1/bin/nvcc/usr/local/cuda-13.1/include.cu and .cuhSee references/vllm_models.md for detailed model information, known issues, and troubleshooting.
Recommended Model: Qwen/Qwen2.5-Coder-7B-Instruct (128K context)
Known Issues:
┌─────────────┐ ┌──────────────┐ ┌─────────────┐ ┌──────────┐
│ Kilo Code / │ HTTP │ LiteLLM │ HTTP │ vLLM │ CUDA │ RTX 5090 │
│ VS Code │ ──────> │ (localhost: │ ──────> │ (cuda-dev: │ ──────> │ 32GB │
│ (Local) │ │ 4000) │ │ 8000) │ │ VRAM │
└─────────────┘ └──────────────┘ └─────────────┘ └──────────┘
│ │
│ │
API Key: Model serving
sk-litellm-cuda (OpenAI-compatible)
Check if machine is powered on:
ping 192.168.250.179
Verify SSH config at ~/.ssh/config has the cuda-dev host entry.
ssh cuda-dev 'nvidia-smi'ssh cuda-dev 'tail -100 ~/vllm-models/vllm-*.log'bash scripts/start_vllm.sh MODEL 32768 0.75 8000Rename .env file:
cd /Users/thesolutionarchitect/Documents/source/litellm
mv .env .env.backup
Grant Local Network permission:
Models download from HuggingFace on first use. Check internet connection on cuda-dev and disk space:
ssh cuda-dev 'df -h ~/vllm-models'
ssh cuda-dev 'du -sh ~/.cache/huggingface/'
Guides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub amdmax/claude_marketplace --plugin content-specialized