By maedmatt
Run a long training job on a remote shared-GPU box so it survives disconnects, and keep your agent in the loop while it runs.
An agent skill for running long jobs on a remote, shared GPU machine over SSH: the job survives your connection dropping, and the agent can follow it the whole time.
Works with any harness that follows the Agent Skills standard (Claude Code, Codex, and others). It's plain bash plus a short guide that tells an agent how to set it up for your machine.
You want an agent to train a model (or run any long job) on a remote GPU machine.
ssh host "python train.py" has three traps:
Start the job in a background tmux session, save its output to a log on the remote, and
watch it with a loop that checks whether the session is alive and reads the log for
progress. A run that takes hours becomes a few short status lines (RUNNING, the
dashboard link, RUN ENDED) the agent reacts to. The job is over when the session disappears, not when an error
shows up in the log: a crash, an out-of-memory kill, or a bad flag can end it without
printing anything to search for, and watching the session catches those too.
The few things that must stay true. Everything else is meant to be changed to fit your setup; a change that breaks one of these has broken the skill.
remote-gpu-train/
├── README.md # this file: the idea and the rules
├── LICENSE
├── .claude-plugin/marketplace.json # lets Claude Code install it as a plugin
└── skills/remote-gpu-train/
├── SKILL.md # how to use it day to day
├── train.sh # the script (gpus / run / watch / cp / ssh)
└── ADAPTING.md # how an agent sets it up for your machine
With npx skills (vercel-labs/skills), which
can install into several harnesses:
npx skills add maedmatt/remote-gpu-train -a claude-code # or -a codex, etc.
npx skills add maedmatt/remote-gpu-train -g -a claude-code # -g installs it everywhere (globally)
As a Claude Code plugin:
/plugin marketplace add maedmatt/remote-gpu-train
/plugin install remote-gpu-train@remote-gpu-train
By hand: copy skills/remote-gpu-train/ into ~/.claude/skills/ (or your agent's
skills folder).
On first use in a project the skill sets itself up: an agent reads ADAPTING.md, asks
what it needs (ssh name, repo path, environment setup, launch command, the log line that
means "running"), fills in train.sh, wires watch to your harness, and verifies it.
Then:
train.sh gpus show GPUs, mark which are FREE
train.sh run <gpu> <tag> <args...> start the job on <gpu> in a background session
train.sh watch <tag> follow it until the run ends
train.sh cp <local> <remote> copy a file into the repo
train.sh ssh [cmd...] git pull/status, tail a log, list/kill sessions, anything else
The core (a background job that keeps running, plus a watcher that reads the log and
notices when the session dies) fits any long remote job: a data pipeline, a slow build, a
simulation, a batch of evals. Drop gpus and point the launch command at whatever you run.
MIT. See LICENSE.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
npx claudepluginhub maedmatt/remote-gpu-train --plugin remote-gpu-trainUltra-compressed communication mode. Cuts ~75% of tokens while keeping full technical accuracy by speaking like a caveman.
Frontend design skill for UI/UX implementation
Comprehensive UI/UX design plugin for mobile (iOS, Android, React Native) and web applications with design systems, accessibility, and modern patterns
Memory compression system for Claude Code - persist context across sessions
Marketing skills for AI agents — conversion optimization, copywriting, SEO, paid ads, ad creative, and growth
Standalone image generation plugin using Nano Banana MCP server. Generates and edits images, icons, diagrams, patterns, and visual assets via Gemini image models. No Gemini CLI dependency required.