From vastai-pack
Implements Python HTTP client for Vast.ai API with rate limit handling: exponential backoff, retries on 429s, min delays, and adaptive polling for instances.
How this skill is triggered — by the user, by Claude, or both
Slash command
/vastai-pack:vastai-rate-limitsThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Handle Vast.ai REST API rate limits gracefully. The API at `cloud.vast.ai/api/v0` returns HTTP 429 when request limits are exceeded. Most operations (search, show) are read-heavy and rarely hit limits, but automated scripts doing rapid provisioning or polling can trigger throttling.
Handle Vast.ai REST API rate limits gracefully. The API at cloud.vast.ai/api/v0 returns HTTP 429 when request limits are exceeded. Most operations (search, show) are read-heavy and rarely hit limits, but automated scripts doing rapid provisioning or polling can trigger throttling.
import requests
import time
class RateLimitedVastClient:
BASE_URL = "https://cloud.vast.ai/api/v0"
def __init__(self, api_key, min_delay=0.5, max_retries=5):
self.session = requests.Session()
self.session.headers["Authorization"] = f"Bearer {api_key}"
self.min_delay = min_delay
self.max_retries = max_retries
self.last_request = 0
def request(self, method, endpoint, **kwargs):
# Enforce minimum delay between requests
elapsed = time.time() - self.last_request
if elapsed < self.min_delay:
time.sleep(self.min_delay - elapsed)
for attempt in range(self.max_retries):
self.last_request = time.time()
resp = self.session.request(method, f"{self.BASE_URL}{endpoint}", **kwargs)
if resp.status_code == 429:
retry_after = int(resp.headers.get("Retry-After", 2 ** attempt))
print(f"Rate limited. Waiting {retry_after}s (attempt {attempt+1})")
time.sleep(retry_after)
continue
resp.raise_for_status()
return resp.json()
raise RuntimeError("Max retries exceeded due to rate limiting")
def poll_instance_status(client, instance_id, target="running", timeout=300):
"""Poll instance status with increasing intervals."""
start = time.time()
interval = 5 # Start at 5s, increase to max 30s
while time.time() - start < timeout:
info = client.request("GET", f"/instances/{instance_id}/")
status = info.get("actual_status", "unknown")
if status == target:
return info
if status in ("error", "offline"):
raise RuntimeError(f"Instance {instance_id} failed: {status}")
time.sleep(interval)
interval = min(interval * 1.5, 30)
raise TimeoutError(f"Instance did not reach '{target}' within {timeout}s")
def batch_search(client, gpu_configs):
"""Search for multiple GPU types with rate-limit-safe delays."""
results = {}
for config in gpu_configs:
query = GPUQuery(**config).to_filter()
offers = client.request("GET", "/bundles/", params={"q": str(query)})
results[config.get("gpu_name", "any")] = offers.get("offers", [])
time.sleep(1) # Be polite between searches
return results
# Usage
configs = [
{"gpu_name": "RTX_4090", "max_dph": 0.30},
{"gpu_name": "A100", "max_dph": 2.00},
{"gpu_name": "H100_SXM", "max_dph": 4.00},
]
all_offers = batch_search(client, configs)
Strategies to reduce API calls:
--limit: Restrict search results to what you needshow instances (lists all) instead of individual show instance ID calls| Scenario | Response |
|---|---|
| First 429 | Wait Retry-After header value, then retry |
| Repeated 429s | Double wait time between retries |
| 429 during provisioning | Instance creation is idempotent; safe to retry |
| 429 during search | Cache previous results and use them temporarily |
For security best practices, see vastai-security-basics.
Safe multi-instance provisioning: Create 10 instances with 2-second delays between each create instance call to avoid triggering rate limits during cluster setup.
Efficient monitoring: Poll all instances with a single show instances call every 30 seconds instead of individual calls per instance.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin vastai-packApplies Vast.ai SDK patterns for Python and REST API: typed GPU queries, managed instance lifecycles, offer scoring, retries. For GPU cloud integrations and refactoring.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.