From vastai-pack
Executes Vast.ai production checklist for GPU workloads, verifying account setup, instance config, data safety, monitoring, and cost controls for training pipelines.
How this skill is triggered — by the user, by Claude, or both
Slash command
/vastai-pack:vastai-prod-checklistThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Complete checklist for running production GPU workloads on Vast.ai, covering account setup, instance selection, data safety, monitoring, and cost controls.
Complete checklist for running production GPU workloads on Vast.ai, covering account setup, instance selection, data safety, monitoring, and cost controls.
>= 0.98 for production jobsinet_down >= 200 for data transferdph_total set in search queries#!/bin/bash
set -euo pipefail
echo "Vast.ai Production Readiness Check"
# 1. Auth
vastai show user --raw | python3 -c "
import sys, json; u=json.load(sys.stdin)
balance = u.get('balance', 0)
print(f' Auth: OK | Balance: \${balance:.2f}')
assert balance >= 10, f'Balance too low: \${balance:.2f}'
" && echo " Balance: PASS" || echo " Balance: FAIL"
# 2. Offer availability
COUNT=$(vastai search offers 'reliability>0.98 num_gpus=1 rentable=true' --raw --limit 1 | python3 -c "import sys,json; print(len(json.load(sys.stdin)))")
echo " Offers available: $COUNT+ | PASS"
# 3. Docker image pullable
docker pull pytorch/pytorch:2.2.0-cuda12.1-cudnn8-runtime > /dev/null 2>&1 && echo " Docker image: PASS" || echo " Docker image: FAIL"
echo "Pre-flight checks complete."
| Error | Cause | Solution |
|---|---|---|
| Insufficient balance | Credits depleted mid-job | Set up auto-top-up or balance alerts |
| Instance preempted during final epoch | Spot instance reclaimed | Use on-demand for final training stage |
| Checkpoint corrupted | Interrupted mid-save | Implement atomic checkpoint writes (save to temp, rename) |
| GPU utilization drops to 0% | Data pipeline bottleneck | Profile data loading; increase disk I/O |
For version upgrades, see vastai-upgrade-migration.
Pre-launch audit: Run the verification script, check all boxes, confirm Docker image pulls successfully, and verify at least 3 matching offers are available before starting a production training run.
Budget-safe launch: Set max_dph=2.00, auto-destroy timeout of 12 hours, and daily spend alert at $50 to prevent cost overruns.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin vastai-packDeploys ML training jobs and inference services to Vast.ai GPU cloud using optimized Docker images, CLI scripting, and automation for GPU instance provisioning.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.