From core
Use when the user asks to start, stop, or check Google Compute Engine GPU instances for physics workflows, including retry-start in constrained zones, instance status checks, and zone mapping management.
How this skill is triggered — by the user, by Claude, or both
Slash command
/core:gce-ops-mcpThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Operate GCP Compute Engine instances for GPU-backed research workflows.
Operate GCP Compute Engine instances for GPU-backed research workflows.
Workflow position: this skill mainly supports component (2) Science by managing compute infrastructure for simulation/training runs.
Use these MCP tools first:
local-gcp-gpu.healthlocal-gcp-gpu.instance_statuslocal-gcp-gpu.start_instancelocal-gcp-gpu.stop_instancelocal-gcp-gpu.set_zonelocal-gcp-gpu.list_zonesTypical sequence:
health to confirm local gcloud readiness.instance_status to see current state.start_instance (or stop_instance) for lifecycle action.instance_status again to confirm final state.Before using lifecycle tools, confirm Google Cloud access for the target project.
Run:
gcloud auth login
gcloud auth application-default login
gcloud config set project <project_id>
Optional default for this MCP server:
export GCP_PROJECT_ID="<project_id>"
If GCP_PROJECT_ID is not set, pass project_id explicitly in tool calls.
Required IAM capability in the target project:
compute.instances.getcompute.instances.listcompute.instances.startcompute.instances.stopQuick verification:
gcloud auth list --filter=status:ACTIVE --format='value(account)'
gcloud compute instances list --project <project_id> --limit 5
Use start_instance with explicit project_id and zone on first call for a VM. If it fails due to transient capacity, re-run with larger max_attempts.
Use stop_instance with discard_local_ssd=true unless you explicitly need to preserve Local SSD data and accept associated implications.
When an instance is stable in one zone, store it with set_zone(instance_name, zone) to reduce repeated zone discovery and avoid ambiguity.
This skill is aligned to the current repository implementation:
servers/gce-ops-mcp/src/gce_ops_mcp/server.pyscripts/run-gce-ops-mcp.shIf this skill is copied into fundamental-physics/marketplace, keep the SKILL content but adjust local file paths and server names to the target repository conventions.
npx claudepluginhub jacobtutt/mcp-tools --plugin coreOperates GCE instances and MIGs, manages OS patch compliance via VM Manager, designs spot/preemptible VM strategies, and configures startup/shutdown scripts.
Polls Vast.ai GPU instances for status changes and triggers handlers on lifecycle events like running, exited, or error. Enables monitoring, auto-recovery, and orchestration workflows.
Launches GPU/TPU clusters, training jobs, and inference servers across 25+ clouds, Kubernetes, Slurm using SkyPilot; debugs YAML, optimizes costs.