Search everything...

Stats

Actions

Available In

training-monitor

Name: training-monitor
Author: t2ance

By t2ance

Prediction-first autonomous monitoring for ML/DL training jobs. General-purpose framework with domain-specific skills for GRPO/RL, distributed training, and Kubernetes.

npx claudepluginhub t2ance/training-monitor-plugin

Popularity

Stars

Med: 0·Avg: 285

Installs

Med: 0·Avg: 1

What's Inside

Agents1

quality-reviewer

/quality-reviewer

Team role template for reviewing monitoring reports. Checks reasoning PROCESS and logical coherence. Spot-checks the load-bearing claim for ground truth. Catches lazy, shallow, or logically inconsistent analysis.

Skills6

distributed-monitor

/distributed-monitor

Heuristics for monitoring multi-GPU and multi-process distributed training. Common patterns, NCCL diagnostics, known failure modes. Reference knowledge, not rules.

grpo-monitor

/grpo-monitor

Heuristics for monitoring GRPO, PPO, and other RL training. Common patterns, typical indicators, known failure modes. Reference knowledge, not rules.

k8s-monitor

/k8s-monitor

Heuristics for monitoring training jobs on Kubernetes. Common patterns, pod anomalies, scheduling failures, escalation ladder. Reference knowledge, not rules.

monitor-doctor

/monitor-doctor

Interactive setup wizard for the training-monitor plugin. Goal-driven — the agent determines which dependencies are needed by checking project context and asking the user only when the context is ambiguous. Installs missing dependencies and reports available capabilities.

training-monitor

/training-monitor

Prediction-first monitoring for ML/DL training jobs. Single-agent execution with reviewer sub-agent. Derives judgment criteria from training artifacts, not hardcoded rules.

Stats

Version0.1.0

Stars0

MaintenanceExcellent

Last CommitApr 4, 2026

AddedApr 4, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

Safety Signals

Caution

Uses power tools

Uses Bash, Write, or Edit tools

README