ML engineering plugin: build, train, debug, and deploy machine learning systems with verified API knowledge. 5 domain-expert agents (model-architect, training-engineer, ml-debugger, graph-engineer, systems-optimizer), 6 reference files, 5 runnable templates, and 4 workflow commands.
End-of-session learning. Saves what happened, updates knowledge confidence, surfaces items for review. Run this when your work session is complete.
Build an ML system from a handoff document or problem description
Diagnose and fix ML training failures using the 5-step protocol
Export, optimize, and deploy a trained model
Generate a complete training pipeline for a model
Specializes in graph ML: GNN architecture, PyG data handling, message passing, knowledge graph embeddings, link prediction, node classification. Route here for: any GNN task, "graph neural network," "knowledge graph," "link prediction," "node classification," "message passing," "PyG," "PyKEEN," or any graph-structured ML problem. <example> Context: User wants to build a KG completion model user: "Build an R-GCN model for link prediction on my knowledge graph" assistant: "I'll use graph-engineer to design the R-GCN encoder with DistMult decoder and the full PyG training pipeline." </example>
Diagnoses and fixes ML training failures. Follows a systematic protocol: overfit-one-batch, loss curve analysis, gradient inspection, data pipeline verification, simplification. Route here for: "training isn't working," "loss is stuck," "loss is NaN," "model isn't learning," "overfitting," or any training failure. <example> Context: User's GNN training loss is flat user: "My GNN loss hasn't moved in 20 epochs" assistant: "I'll use ml-debugger to run the systematic diagnostic protocol." </example>
Designs ML model architectures. Selects layers, dimensions, activations, normalization, and skip connections. Produces complete nn.Module code with parameter counts and shape annotations. Route here for: "build a model," "design the architecture," "what layers should I use," or any request to create a new model from a spec or handoff document. <example> Context: User has a handoff doc specifying a GNN for link prediction user: "Implement the model architecture from this handoff" assistant: "I'll use model-architect to build the R-GCN encoder with DistMult decoder specified in the handoff." </example> <example> Context: User wants a custom transformer for sequence classification user: "Build me a 4-layer transformer classifier for 512-token sequences" assistant: "I'll use model-architect to design the encoder with the specified depth and produce the nn.Module." </example>
Optimizes ML systems for production. Mixed precision, torch.compile, distributed training, quantization, profiling, memory optimization, inference serving. Route here for: "make this faster," "reduce memory," "deploy this model," "optimize inference," "distributed training," "quantize," "profile," or any performance/deployment task. <example> Context: User's training is running out of GPU memory user: "My 7B model fine-tuning OOMs on a 48GB A6000" assistant: "I'll use systems-optimizer to apply the memory optimization ladder: QLoRA + gradient checkpointing + bf16." </example> <example> Context: User wants to serve a model in production user: "How do I deploy this classifier as an API?" assistant: "I'll use systems-optimizer to set up ONNX export, quantization, and a FastAPI serving layer." </example>
Builds complete training pipelines. Data loading, loss functions, optimizers, schedulers, training loops, validation, checkpointing, and experiment tracking. Route here for: "write the training loop," "set up training," "train this model," or any request to create or modify training infrastructure. <example> Context: User has a model and needs a training pipeline user: "Write the training loop for this GNN classifier" assistant: "I'll use training-engineer to build the complete pipeline with PyG DataLoader, cross-entropy loss, AdamW, and wandb tracking." </example>
Uses power tools
Uses Bash, Write, or Edit tools
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Most Claude Code plugins give you a set of slash commands and some domain knowledge. These plugins do something different: they learn.
Each plugin in this repo is a domain-specialized engineering intelligence that accumulates knowledge across sessions, grounds itself in real library source code (not training data), and coordinates with a companion chat skill on Claude.ai. The plugin implements. The chat skill plans. Over time, the plugin gets better at its job because it tracks what works, what doesn't, and what it's still uncertain about.
This is the two-surface architecture: one surface for thinking, one for building.
A typical plugin contains four layers:
Specialist agents and slash commands. Each plugin ships with 3 to 7 agents that handle specific subtasks. UI-Design-Pro has a design critic, a component builder, an accessibility auditor, an animation engineer, and a visual architect. Django-Engine-Pro has agents for model design, ORM optimization, migration planning, and MCP server exposure. Agents compose in defined sequences: you always run the stack detector before the component builder, always run the design critic after.
Source-code references. Plugins include install.sh scripts that shallow-clone real library repos into a local refs/ directory. When UI-Design-Pro needs to know how Radix handles focus restoration, it greps the actual Radix source, not its training data. When D3-Pro needs to verify a scale constructor's API, it reads the Observable source directly. This matters because training data goes stale. Source code doesn't.
Skills and decision frameworks. Static knowledge: inheritance decision tables, ORM anti-pattern catalogs, polymorphic rendering rules, animation physics constants. These encode the expert judgment that doesn't change between sessions.
An epistemic knowledge layer. This is the part that learns. Each plugin maintains a knowledge/ directory containing typed claims in JSONL, confidence scores, session logs, and (for some plugins) SBERT embeddings. Claims start as drafts. After review, they become active. Active claims carry Bayesian confidence that updates based on session outcomes: when a suggestion informed by a claim gets accepted, confidence rises; when it gets rejected, confidence drops. Over time, each plugin develops its own body of verified, weighted knowledge about its domain.
Each plugin here has a counterpart: a chat skill that runs on Claude.ai (or Claude Desktop). The division of labor is deliberate.
The chat skill handles planning, reasoning, and decision-making. When you're deciding between DRF and Ninja for an API, or choosing an inheritance strategy for a model hierarchy, or evaluating whether a component needs polymorphic rendering, the chat skill walks you through the tradeoffs and produces a structured handoff document.
The Claude Code plugin handles implementation and learning. It takes the handoff document, builds the thing, greps real source code when it needs to verify an API, logs what it tried, and updates its knowledge base with what it learned.
The chat skill never sees knowledge/claims.jsonl. The plugin never produces planning documents. Each surface does what it's good at.
| Chat Skill (Claude.ai) | Claude Code Plugin |
|---|---|
| Decision frameworks | Slash commands and agents |
| Tradeoff analysis | Source-code grepping |
| Structured handoff docs | Implementation and testing |
| Domain reasoning | Session logging and learning |
| Static (expert knowledge) | Dynamic (knowledge that evolves) |
Every plugin with a knowledge/ directory runs the same protocol:
Session start: Read manifest.json for current state. Load active claims sorted by confidence. Check tensions.jsonl for unresolved conflicts in the task's domain. Surface tensions before making decisions, not after.
During work: Track which claims informed each suggestion. Note when the user accepts, modifies, or rejects a recommendation.
Session end: Write observations to session_log/. Flag contradictions as tension signals. Note recurring patterns the knowledge base doesn't yet cover.
The knowledge types are borrowed from Theseus (a separate epistemic engine project):
Current knowledge stats across the fleet:
| Plugin | Total Claims | Active | Avg Confidence |
|---|---|---|---|
| UI-Design-Pro | 140 | 135 | 0.667 |
| Django-Engine-Pro | 111 | 29 | 0.75 |
npx claudepluginhub travis-gilbert/claude-marketplace --plugin ml-proMobile app development specialist: PWA retrofitting, React Native architecture, offline-first sync, mobile API design, touch optimization, and mobile visualization adaptation.
Makes Claude Code extraordinarily good at transforming websites into applications: converting page-based Next.js sites into app-like experiences with persistent layouts, command palettes, and background sync; wrapping them in Tauri desktop shells with native OS integration; and producing architecture handoffs for native Swift/AppKit macOS apps.
Makes Claude Code genuinely good at designing knowledge-graph answer experiences with cosmos.gl on top of DuckDB-WASM, Mosaic, and vgplot. Owns the SceneDirective adapter, the three-picker ControlDock (Position, Weight, Edges), and the recipe library that turns novel ideas into usable images.
Git and deployment automation with verification at every step. Staged file review, conventional commits, pre-commit checks, push with CI/CD detection, and post-deploy health verification.
Makes Claude Code extraordinarily good at building D3 visualizations that are mathematically accurate, physically believable, and aesthetically grounded in the Mike Bostock / Observable canon.
Comprehensive feature development workflow with specialized agents for codebase exploration, architecture design, and quality review
Harness-native ECC operator layer - 67 agents, 271 skills, 92 legacy command shims, reusable hooks, rules, selective install profiles, and production-ready workflows for Claude Code, Codex, OpenCode, Cursor, and related agent harnesses
Comprehensive PR review agents specializing in comments, tests, error handling, type design, code quality, and code simplification
Upstash Context7 MCP server for up-to-date documentation lookup. Pull version-specific documentation and code examples directly from source repositories into your LLM context.
Comprehensive C4 architecture documentation workflow with bottom-up code analysis, component synthesis, container mapping, and context diagram generation
Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.