From core
Use when the user wants to monitor remote SLURM jobs over SSH, check queue state, wait for a job to leave pending, or wait for a job to finish.
How this skill is triggered — by the user, by Claude, or both
Slash command
/core:slurm-watch-mcpThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Monitor remote SLURM jobs over SSH for workflow orchestration.
Monitor remote SLURM jobs over SSH for workflow orchestration.
Workflow position: this skill supports long-running HPC jobs where the assistant needs to pause active work until a submitted batch job is ready or complete.
Use these MCP tools first:
slurm-watch-mcp.healthslurm-watch-mcp.job_statusslurm-watch-mcp.wait_until_startedslurm-watch-mcp.wait_until_finishedTypical sequence:
health to confirm SSH and remote SLURM readiness.job_status to inspect the current state.wait_until_started to block until the job leaves PENDING.wait_until_finished to block until the terminal state is known.Before using wait tools, confirm SSH access to the remote login host.
Optional default:
export SLURM_WATCH_HOST="login.hpc.example.edu"
Optional SSH flags for jump hosts or identity files:
export SLURM_WATCH_SSH_OPTIONS="-J bastion.example.edu -i ~/.ssh/id_ed25519"
If SLURM_WATCH_HOST is not set, pass host explicitly in tool calls.
Use job_status first to see whether the job is still pending, already active, or already terminal.
Use wait_until_started when downstream work can begin as soon as the scheduler has admitted the job out of the queue.
Use wait_until_finished when follow-on analysis depends on the final sacct state and exit code.
This skill is aligned to the current repository implementation:
servers/slurm-watch-mcp/src/slurm_watch_mcp/server.pyscripts/run-slurm-watch-mcp.shKeep this skill generic. Do not hardcode personal usernames, cluster account names, or hostnames into the skill text.
npx claudepluginhub jacobtutt/mcp-tools --plugin coreGenerates and submits sbatch scripts for GPU compute jobs on Slurm clusters. Handles partition, GPU types (A100_40G, V100, A800), node selection, Python paths, and cluster rules.
Debug-only skill that identifies and classifies Slurm scheduler and node-daemon issues on Amazon SageMaker HyperPod clusters.
Runs, monitors, debugs, and analyzes LLM evaluations via nemo-evaluator-launcher on Slurm clusters. Handles SSH execution, artifact/log export, and status checking.