From vastai
Manage Vast.ai autoscaling endpoints and worker groups for production deployments. Use when setting up auto-scaling GPU inference, managing worker pools, or deploying services.
How this skill is triggered — by the user, by Claude, or both
Slash command
/vastai:autoscaleThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Manage production deployments with auto-scaling worker pools.
Manage production deployments with auto-scaling worker pools.
$ARGUMENTS
vastai create endpoint \
--endpoint_name '<NAME>' \
--target_util 0.9 \
--max_workers 20 \
--cold_workers 5 \
--cold_mult 2.5 \
--min_load 0.0
| Option | Description | Default |
|---|---|---|
--endpoint_name | Name for the endpoint | (required) |
--target_util | Target utilization 0–1 | 0.9 |
--max_workers | Max workers | 20 |
--cold_workers | Min cold/standby workers | 5 |
--cold_mult | Cold capacity multiplier | 2.5 |
--min_load | Minimum floor load (perf units/s) | 0.0 |
--min_cold_load | Minimum cold load | 0.0 |
vastai show endpoints
vastai update endpoint <ID> [--target_util 0.85 --max_workers 50 ...]
vastai delete endpoint <ID>
vastai get endpt-logs <ID> [--level 0-3 --tail N]
vastai create workergroup \
--template_hash '<HASH>' \
--endpoint_name '<NAME>' \
--test_workers 3 \
--cold_workers 2 \
--target_util 0.9 \
--search_params 'gpu_name=RTX_4090 reliability>0.9'
| Option | Description |
|---|---|
--template_hash | Template for worker instances |
--template_id | Template ID (alternative) |
--endpoint_name / --endpoint_id | Target endpoint |
--test_workers | Workers for perf estimation |
--cold_workers | Min cold workers |
--target_util | Target utilization |
--cold_mult | Cold capacity multiplier |
--search_params | Search query for selecting machines |
--gpu_ram | Estimated GPU RAM requirement |
--launch_args | Extra args for instance creation |
-n | Disable default search params |
vastai show workergroups
vastai update workergroup <ID> [--target_util --cold_workers ...]
vastai delete workergroup <ID>
vastai get wrkgrp-logs <ID> [--level 0-3 --tail N]
show endpoints and show workergroupsget endpt-logs and get wrkgrp-logsnpx claudepluginhub liorz/vastai-claude-skill --plugin vastaiDeploys ML training jobs and inference services to Vast.ai GPU cloud using optimized Docker images, CLI scripting, and automation for GPU instance provisioning.
Manages single-tenant GPU endpoints on Together AI with autoscaling and no rate limits. Deploys fine-tuned or uploaded models, sizes hardware, and handles endpoint lifecycle.
Launches GPU/TPU clusters, training jobs, and inference servers across 25+ clouds, Kubernetes, Slurm using SkyPilot; debugs YAML, optimizes costs.