Search everything...

Stats

Actions

Available In

skymcp

Name: skymcp
Author: slapglif

By slapglif

End-to-end ML training ecosystem powered by SkyPilot. Launch, monitor, fix, iterate, ablate, and serve models on any cloud GPU with production-grade frameworks.

npx claudepluginhub slapglif/skymcp --plugin skymcp

Popularity

Stars

Med: 0·Avg: 285

Installs

Med: 0·Avg: 1

What's Inside

Agents5

cloud-optimizer

/cloud-optimizer

Analyzes SkyPilot cloud spending and optimizes GPU costs. Monitors active clusters, identifies waste (idle clusters, over-provisioned resources, missing autostop), and suggests concrete savings strategies with dollar amounts. Use proactively when expensive jobs are running or when reviewing cloud costs, or reactively when users ask about spending.

config-validator

/config-validator

Validates SkyPilot YAML configurations before launch. Checks for known gotchas, estimates costs, verifies resource feasibility, and suggests optimizations. Use proactively before any sky launch or sky jobs launch command, or reactively when a user asks to review a config. Triggers when YAML files are being written or when launch commands are about to execute.

experiment-scientist

/experiment-scientist

Designs and runs controlled ML experiments -- hyperparameter sweeps, architecture ablations, scaling law studies, and A/B comparisons. Use when comparing configurations, searching for optimal settings, or when the user discusses experimentation, ablation, or comparison. Triggers proactively after a training run completes to suggest next experiments.

training-doctor

/training-doctor

Diagnoses and fixes ML training failures -- OOM crashes, NaN loss, loss plateaus, gradient explosion, slow throughput, data corruption, and preemption recovery. Use when training encounters problems or produces unexpected results. Triggers proactively when error patterns are detected in logs, or reactively when users report training issues.

training-orchestrator

/training-orchestrator

Autonomous ML training lifecycle manager. Use when the user wants to train a model end-to-end -- framework selection, YAML generation, launch, monitoring, failure recovery, and iteration. Triggers proactively when training tasks are detected, such as dataset preparation discussions, model fine-tuning requests, or when training scripts exist but have not been launched.

Skills17

checkpoint-management

/checkpoint-management

Use when saving or resuming training checkpoints, merging LoRA adapters, converting between model formats (safetensors, GGUF, PyTorch), quantizing models, using mergekit for model merging, or managing checkpoint lifecycle on cloud storage

cost-optimization

/cost-optimization

Use when reducing cloud GPU costs, choosing spot vs on-demand, estimating training expenses, managing budgets, configuring auto-stop, or optimizing spend across clouds - covers SkyPilot optimizer, spot failover, pricing tiers, and budget management

data-pipeline-design

/data-pipeline-design

Use when designing a data pipeline, curating pretraining data, choosing between NeMo Curator and other tools, configuring tokenization, running deduplication (exact or fuzzy), applying data filtering or quality classifiers, working with FineWeb or Dolma or RedPajama, or assessing data quality for ML training - the production data pipeline design reference

distributed-training

/distributed-training

Use when setting up multi-GPU or multi-node training, configuring torchrun or deepspeed launchers, choosing parallelism strategy (FSDP, DeepSpeed ZeRO, tensor parallel, pipeline parallel), tuning NCCL, or enabling high-performance networking with SkyPilot

ml-training-frameworks

/ml-training-frameworks

Use when selecting a training framework, comparing NeMo vs Axolotl vs torchtune vs TRL vs DeepSpeed vs Megatron, choosing between FSDP and DeepSpeed, deciding how to fine-tune or pretrain a model, or configuring LoRA/QLoRA/full fine-tuning - the definitive framework selection guide for ML training at any scale

Hooks1

Event Hooks

Bash

3 hooks across 3 events

MCP Servers1

skypilot

admin

Stats

Version0.1.0

Stars0

MaintenanceGood

LicenseMIT

Last CommitMar 25, 2026

AddedMar 27, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

Available In

skymcp-marketplace

Safety Signals

Critical

Admin access level

Server config contains admin-level keywords

Caution

Executes bash commands

Hook triggers when Bash tool is used

skymcp

Popularity

What's Inside

Confidence

Similar Plugins

ml-training

skypilot

superml

coreweave-pack

ml-model-training

training-hub

More by slapglif

maestro-mcp

burrow

agent-bus

theory2-physics

Popularity

Health & Quality

More by slapglif

maestro-mcp

burrow

agent-bus

theory2-physics

Similar Plugins

ml-training

skypilot

superml

coreweave-pack

ml-model-training

training-hub