From pi-skill
Deploys and manages vLLM LLM GPU pods using pi CLI for DataCrunch/RunPod, including setup, model commands, GPU multi-assignment, and tensor parallelism. Useful for GPU instances and vLLM configuration.
How this skill is triggered — by the user, by Claude, or both
Slash command
/pi-skill:pi-podsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
1. `pi-mono/packages/pods/README.md` — installation, pod management, model commands, GPU multi-assignment, and pre-defined models.
pi-mono/packages/pods/README.md — installation, pod management, model commands, GPU multi-assignment, and pre-defined models.pi-mono/packages/pods/src/ — CLI commands implementation if needing deeper args validation.pi automatically assigns them to different GPUs.--vllm, the default CLI shortcuts for --memory, --context, and --gpus are ignored.pi pods setup <name> "<ssh>" along with --mount for shared NFS storage (DataCrunch) or network volumes (RunPod).pi start <model> --name <name> for known agentic models (Qwen, GLM, GPT-OSS). The tool calling parsers are automatically configured.--vllm --tensor-parallel-size <N>.pi configures hermes or glm4_moe automatically./mnt/hf-models).npx claudepluginhub romiluz13/pi-agent-skillsDeploys vLLM OpenAI-compatible server to Kubernetes with GPU support, health probes, and services via YAML templates. Checks HF token secret and existing deployments before applying.
Launches GPU/TPU clusters, training jobs, and inference servers across 25+ clouds, Kubernetes, Slurm using SkyPilot; debugs YAML, optimizes costs.
Deploys GPU workloads on CoreWeave Kubernetes with kubectl: vLLM inference server or batch job. For first GPU deploys, inference testing, cluster access checks.