Skill

autoscale

Manage Vast.ai autoscaling endpoints and worker groups for production deployments. Use when setting up auto-scaling GPU inference, managing worker pools, or deploying services.

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/vastai:autoscale

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

Bash

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Manage production deployments with auto-scaling worker pools.

SKILL.md

93 lines · ~681 tokens

Stats

Stars3

MaintenanceGood

Last CommitFeb 7, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Vast.ai Autoscaling & Endpoints

Manage production deployments with auto-scaling worker pools.

User Request

$ARGUMENTS

Concepts

Endpoint: A deployment target that manages load and scaling policy
Worker Group: A pool of instances (workers) tied to an endpoint, auto-scaled based on load

Endpoints

Create

vastai create endpoint \
  --endpoint_name '<NAME>' \
  --target_util 0.9 \
  --max_workers 20 \
  --cold_workers 5 \
  --cold_mult 2.5 \
  --min_load 0.0

Option	Description	Default
`--endpoint_name`	Name for the endpoint	(required)
`--target_util`	Target utilization 0–1	0.9
`--max_workers`	Max workers	20
`--cold_workers`	Min cold/standby workers	5
`--cold_mult`	Cold capacity multiplier	2.5
`--min_load`	Minimum floor load (perf units/s)	0.0
`--min_cold_load`	Minimum cold load	0.0

Manage

vastai show endpoints
vastai update endpoint <ID> [--target_util 0.85 --max_workers 50 ...]
vastai delete endpoint <ID>
vastai get endpt-logs <ID> [--level 0-3 --tail N]

Worker Groups

Create

vastai create workergroup \
  --template_hash '<HASH>' \
  --endpoint_name '<NAME>' \
  --test_workers 3 \
  --cold_workers 2 \
  --target_util 0.9 \
  --search_params 'gpu_name=RTX_4090 reliability>0.9'

Option	Description
`--template_hash`	Template for worker instances
`--template_id`	Template ID (alternative)
`--endpoint_name` / `--endpoint_id`	Target endpoint
`--test_workers`	Workers for perf estimation
`--cold_workers`	Min cold workers
`--target_util`	Target utilization
`--cold_mult`	Cold capacity multiplier
`--search_params`	Search query for selecting machines
`--gpu_ram`	Estimated GPU RAM requirement
`--launch_args`	Extra args for instance creation
`-n`	Disable default search params

Manage

vastai show workergroups
vastai update workergroup <ID> [--target_util --cold_workers ...]
vastai delete workergroup <ID>
vastai get wrkgrp-logs <ID> [--level 0-3 --tail N]

Typical Setup Flow

Create a template with your Docker image and config
Create an endpoint with scaling policy
Create a worker group linking the template to the endpoint
Monitor with show endpoints and show workergroups
Check logs with get endpt-logs and get wrkgrp-logs

autoscale

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

autoscale

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

Vast.ai Autoscaling & Endpoints

User Request

Concepts

Endpoints

Create

Manage

Worker Groups

Create

Manage

Typical Setup Flow

Similar Skills

Vast.ai Autoscaling & Endpoints

User Request

Concepts

Endpoints

Create

Manage

Worker Groups

Create

Manage

Typical Setup Flow

Similar Skills