By QianWangX
A team of specialized deep learning research agents: Planner, Engineer, Operator, Auditor, Tracer, and Cleaner. Run /init to set up your project before using agents.
Use to critically review experiment results before drawing conclusions, sharing findings, or writing them up. Optional — invoke when the results matter and you want a second opinion on whether they actually support your claim. Not needed after every run. Examples: "audit the results from last night's run before I write this up", "check whether my ablation actually proves what I think it does", "is this result reproducible based on what was logged?".
Use to audit and clean up the codebase after a period of fast iteration. Invoke sparingly — when things feel messy, not after every session. Produces a report and a staging list for your approval. Never deletes or refactors anything without your explicit sign-off. Examples: "the codebase has gotten messy after two weeks of experiments, clean it up", "find all the dead experiment scripts", "check the core modules for duplicate logic".
Use to implement or modify code based on a plan or a direct request. Invoke when you know what needs to be built and want it written correctly. Can work from a Planner task list or directly from your own description. Examples: "implement the loss function from the task list", "refactor the data loader into a separate module", "add dropout to the model".
Use to run experiments, training scripts, or multi-step commands where log capture and environment verification matter. Best for consolidation runs, GPU jobs, or anything where you want a structured record of what ran and what the result was. For simple one-off commands, just run them yourself. Examples: "run the training script with the new config", "execute the preprocessing pipeline and collect logs", "run the ablation sweep".
Use to break down a research or implementation task into concrete steps before writing any code. Invoke when starting something new, when a task involves multiple files, or when you want a clear map before diving in. Examples: "plan how to add a new loss function", "how should I restructure the data loader?", "what needs to change to support multi-GPU training?"
Uses power tools
Uses Bash, Write, or Edit tools
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
A set of specialized Claude Code agents for deep learning researchers who want to move faster without losing control of their work.
You already have a research idea. You want to get some help with everything around the idea — figuring out which files to touch, writing boilerplate, remembering to capture logs, wondering whether your result actually supports your claim.
These agents handle that surrounding work. You stay in charge of the research.
This is not a pipeline where you describe an idea and get results automatically. You decide which agents to call, when to call them, and what to do with what they return. You can invoke them in any order, skip steps entirely, or take over and do things yourself.
This is also not production-grade software engineering tooling. The code quality bar is research-appropriate: readable, modular where it helps, well-commented on non-obvious logic — not full test coverage, not formal code review gates.
| Agent | When to call |
|---|---|
@planner | Before writing code — map out what needs to change |
@engineer | When you're ready to implement — writes and modifies code |
@operator | For experiments worth capturing — runs commands and logs results |
@auditor | Before sharing results — checks whether your conclusion holds |
@tracer | When something looks wrong — investigates model behaviour |
@cleaner | Occasionally — cleans up after heavy iteration |
Install:
/plugin marketplace add QianWangX/research-agents
/plugin install research-agents@QianWangX/research-agents
Initialize your project (run once per project):
/research-init
This scans your project, asks a few questions about your model, dataset, and goal, and writes a PROJECT_CONTEXT.md to your project root. All agents read this file before doing any work. Keep it updated as your project evolves.
There is no required order. Call whatever agent is useful at the moment.
The typical flow for implementing something new:
@planner → @engineer → run it yourself or @operator
Repeat that loop as many times as needed. When results start to matter:
@auditor (before writing up or sharing)
@tracer (when something looks wrong)
@cleaner (when the codebase feels messy)
Some examples:
@planner to map the changes, @engineer to implement, run training yourself because it's one command.@tracer to investigate gradient flow and data pipeline, then decide whether to call @engineer.@auditor in deep mode to check whether the conclusion actually holds before you share.@cleaner to produce a staging list, then delete what's safe.Agents never call each other automatically. Each one does its job, produces a structured output, and stops. What happens next is always your decision.
@planner — break down a task before writing anythingReads your project, traces the relevant files with Glob and Grep, and produces a short numbered task list — specific files, what changes, and why. Before outputting the list, it runs a DL-specific risk check: does this invalidate checkpoints, touch the data pipeline, affect randomness, or require a config update? Only the risks that apply are flagged.
Distinguishes between exploration (quick, loose, expect to iterate) and consolidation (careful, meant to keep). Nothing is written until you decide to proceed.
Output: Task List
@engineer — implement or modify codeWorks from a planner task list or a direct instruction. Reads existing files first and matches your style. For exploration, writes quick code and marks shortcuts with # TODO: clean up. For non-obvious logic — attention operations, custom losses, data manipulation — adds shape comments: # (B, T, D) → (B, H, T, d_k).
At the end, tells you whether the next command is simple enough to run yourself or worth routing through @operator. If the changes are meaningful, proposes a git commit command for you to run, modify, or skip.
Output: Implementation Note
@operator — run experiments and capture resultsBest for GPU jobs, multi-step pipelines, or anything where you want a clean record. Verifies the environment before running, captures stdout and stderr to a timestamped log, and checks sanity after: loss trend, output shapes, NaN/Inf, GPU utilisation, and whether the initial loss is in the expected range. For quick exploratory runs it produces a one-line note; for consolidation runs, a full report. Updating PROJECT_CONTEXT.md is optional — use your judgement.
Output: Run Note or Execution Report
@auditor — check whether your result supports your claimnpx claudepluginhub qianwangx/research-agents --plugin research-agentsML/perf investigation skills: topic, plan, judge, run, sweep
ML engineering plugin: Give your AI coding agent ML engineering superpowers.
Oh My Paper research harness: memory system, Codex delegation, and pipeline commands for academic research projects.
Set up ML experiment tracking
Autonomous research loops with 10 commands. Generalizes Karpathy's autoresearch loop to any domain with mechanical evaluation, overnight persistence, and zero dependencies.
ML experiment tracking with metrics logging and run comparison