From exec-remote
Manages GKE TPU clusters using xpk. Creates, deletes, and lists TPU Nodepool resources on Google Kubernetes Engine. Multi-user safe - always queries GKE for real-time cluster state.
How this skill is triggered — by the user, by Claude, or both
Slash command
/exec-remote:apply-resourceThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill manages TPU clusters on Google Kubernetes Engine (GKE) using [xpk](https://github.com/AI-Hypercomputer/xpk).
This skill manages TPU clusters on Google Kubernetes Engine (GKE) using xpk.
Multi-User Safe: This skill does NOT use local caching. All cluster information is queried directly from GKE in real-time, ensuring accuracy in multi-user scenarios where clusters may be created, modified, or deleted by other users.
Before using this skill, ensure the following tools are installed:
gcloud auth login to authenticategcloud auth listgke-gcloud-auth-pluginkubectl version --clientxpk --helpThe following defaults apply unless the user explicitly overrides them:
| Parameter | Default |
|---|---|
| PROJECT_ID | tpu-service-473302 |
| CLUSTER_NAME | sglang-jax-agent-tests |
| ZONE | asia-northeast1-b |
| NUM_SLICES | 1 |
Use these values directly — do NOT ask the user to confirm or re-enter them unless they specify otherwise.
When creating a cluster, the following parameters are required:
tpu-service-473302)sglang-jax-agent-tests)v6e-16, v6e-4, v4-8) — must be specified1)asia-northeast1-b)If these parameters are already known from an upstream caller (e.g., exec-remote or deploy-cluster), use them directly — do NOT re-ask the user. Only prompt interactively when this skill is invoked standalone and the user wants to override defaults.
Creates a new GKE cluster with TPU nodepool using Pathways.
Command:
xpk cluster create-pathways \
--cluster $CLUSTER_NAME \
--num-slices=$NUM_SLICES \
--tpu-type=$TPU_TYPE \
--zone=$ZONE \
--spot \
--project=$PROJECT_ID
Interactive Flow:
Deletes an existing GKE cluster.
Command:
xpk cluster delete \
--cluster $CLUSTER_NAME \
--zone=$ZONE \
--project=$PROJECT_ID
Interactive Flow:
Lists all managed clusters.
Command:
xpk cluster list
Real-time Query:
gcloud container clusters listShows detailed information about a specific cluster.
Command:
xpk cluster describe \
--cluster $CLUSTER_NAME \
--zone=$ZONE
When a specified ZONE doesn't support the requested TPU_TYPE:
This skill is designed for multi-user environments:
This skill can be invoked in two ways:
/apply-resource create): Prompt user for all required parameters interactively.deploy-cluster or exec-remote): Parameters (CLUSTER_NAME, TPU_TYPE, NUM_SLICES, ZONE) are already known — use them directly without prompting.gcloud auth listnpx claudepluginhub primatrix/skills --plugin exec-remoteOperate GKE clusters (Standard and Autopilot), manage node pools, configure Workload Identity, enforce Binary Authorization, plan node pool upgrades, and review cluster security posture.
Provisions and manages on-demand or reserved GPU clusters (H100, H200, B200) on Together AI with Kubernetes or Slurm orchestration, shared storage, and credential management for ML and HPC workloads.
Launches GPU/TPU clusters, training jobs, and inference servers across 25+ clouds, Kubernetes, Slurm using SkyPilot; debugs YAML, optimizes costs.