From coreweave-pack
Optimizes CoreWeave GPU inference latency and throughput using workload-specific GPU picks, vLLM batching, and Kubernetes HPA autoscaling.
How this skill is triggered — by the user, by Claude, or both
Slash command
/coreweave-pack:coreweave-performance-tuningThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
| Workload | Recommended GPU | Why |
| Workload | Recommended GPU | Why |
|---|---|---|
| LLM inference (7-13B) | A100 80GB | Good balance of memory and cost |
| LLM inference (70B+) | 8xH100 | NVLink for tensor parallelism |
| Image generation | L40 | Good for diffusion models |
| Training (large models) | 8xH100 SXM5 | Fastest interconnect |
| Batch processing | A100 40GB | Cost-effective |
# Continuous batching with vLLM
containers:
- name: vllm
args:
- "--model=meta-llama/Llama-3.1-8B-Instruct"
- "--max-num-batched-tokens=8192"
- "--max-num-seqs=256"
- "--gpu-memory-utilization=0.90"
- "--enable-prefix-caching"
- "--dtype=float16"
# HPA based on GPU utilization
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: inference-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: inference-server
minReplicas: 2
maxReplicas: 10
metrics:
- type: Pods
pods:
metric:
name: DCGM_FI_DEV_GPU_UTIL
target:
type: AverageValue
averageValue: "70"
| Metric | A100-80GB | H100-80GB |
|---|---|---|
| Llama-8B tokens/sec | ~2,000 | ~4,500 |
| Llama-70B tokens/sec | ~200 (4x) | ~500 (4x) |
| Cold start (vLLM) | 30-60s | 20-40s |
For cost optimization, see coreweave-cost-tuning.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin coreweave-packOptimizes CoreWeave GPU costs with right-sizing, Knative scale-to-zero, quantization, and instance recommendations for ML inference workloads.
Provides LLM serving optimization recommendations for latency, inference costs, and throughput. Scans configs, detects stacks like vLLM/TGI, suggests quantization, batching, KV cache, and framework changes.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.