Autonomous ML research harness — the autoresearch loop as a formal protocol. 74 commands, 2 specialized agents, skills/turing source layout, operational intelligence (postmortem + doctor + plan), model lifecycle (update + registry), what-if analysis (whatif + counterfactual + simulate), collaboration (onboard + share + review), research communication (cite + present + changelog), experiment archaeology (trend + flashback + archive + annotate + search + template + replay), model surgery (prune + quantize + merge + surgery), feature & training intelligence, model debugging, pre-training intelligence, meta-intelligence, scaling & efficiency, model composition, deep analysis, experiment orchestration, literature + paper, model export, profiling, checkpoints, experiment intelligence, statistical rigor, tree-search, cost-performance, model cards, hypothesis database, novelty guard, anti-cheating, taste-leverage loop. Inspired by Karpathy's autoresearch and the scientific method itself.
Based on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Run systematic ablation study — remove components one at a time, measure impact, produce publication-ready table with dead-weight flagging.
Retrospective experiment annotations — add human notes, tags, and context that automated metrics can't capture.
Experiment lifecycle cleanup — compress old artifacts, prune checkpoints, create queryable summary index. Reclaim disk space.
Pre-submission methodology audit — catch data leakage, missing baselines, cherry-picked seeds, and incomplete ablations before a reviewer does.
Automatic baseline generation — random, majority/mean, linear, k-NN baselines in 60 seconds. Every experiment needs a "is this better than dumb?" reference.
Read-only ML evaluation agent. Analyzes experiment results, compares runs, detects convergence patterns, and provides statistical insights. Cannot modify code — this is a safety constraint, not a limitation. The evaluator sees what the researcher cannot see precisely because it cannot change what it observes.
Autonomous ML research agent that implements the autoresearch experiment loop. Modifies train.py, runs experiments, evaluates results, keeps improvements, discards regressions. Operates under strict safety constraints — immutable evaluation infrastructure, git-disciplined rollback, and structured experiment logging.
Uses power tools
Uses Bash, Write, or Edit tools
The research assistant that can't fool itself.
A Claude Code plugin that runs autonomous ML experiment loops, named after the man who first asked whether machines could think. Two agents enforce a strict separation: one writes code, one scores it, and neither can see the other's work. Immutable evaluation, anti-cheating guardrails, and structured hypothesis tracking make sure the results stay honest. When code is free, research is all that matters. You bring the research taste; Turing handles the rest.
train.py; it cannot see or touch evaluate.py/turing:try, read results with /turing:brief[!NOTE] Turing is in active development. Some features are rough around the edges. Issues and feedback welcome.
npm install -g claude-turing && claude-turing install --global && claude-turing verify
You have taste: the accumulated judgment about which problems are tractable, which metrics matter, and which directions are dead ends. Turing has leverage: the discipline to run experiments without fatigue, track every result without amnesia, and measure without contamination.
The interface is two verbs:
/turing:try switch to LightGBM Your taste → the agent
/turing:brief --deep The agent's results → you
Everything in between (experiment logging, convergence detection, hypothesis tracking, statistical validation, anti-cheating guardrails) is infrastructure connecting those two endpoints. You think about what to try. Turing handles how to try it.
/turing:init Scaffold a new ML project
/turing:train Agent runs 5-10 experiments autonomously
/turing:brief Campaign summary: what improved, what's exhausted
/turing:try "add polynomial features" Inject your next idea
/turing:train Agent follows your lead
For fully hands-off operation:
/loop 5m /turing:train
The agent trains, evaluates, keeps improvements, discards regressions, detects convergence, and stops. You come back to a briefing.
The experiment loop. Every iteration: observe metrics, hypothesize (human ideas first), edit train.py, commit to a git branch, train, measure (agent can't see how), keep or revert, log, check convergence.
Hypothesis tracking. Every idea flows through hypotheses.yaml with a novelty guard that blocks duplicates. Detail files record architecture, hyperparameters, expected outcome, actual result, and lineage. Nothing is forgotten between sessions.
Anti-cheating stack. Six structural layers, not prompt-based rules. The agent cannot see evaluate.py, cannot discover scoring formulas, cannot reverse-engineer fixed seeds. It knows the metric name, the direction, and the result. That's it. Research on autonomous ML agents shows that every prompt-based rule got worked around; every code-based rule held.
Two agents, strict boundary. @ml-researcher (Read/Write/Edit/Bash) modifies code and runs experiments. @ml-evaluator (Read/Bash only) analyzes results. An analyst who cannot act on their observations makes more trustworthy observations.
Convergence detection. After N consecutive non-improvements (default 3, configurable), the agent stops. For noisy metrics, /turing:validate auto-configures multi-run evaluation so the agent can't be rewarded for lucky single runs.
| Command | What it does |
|---|---|
/turing:init [--plan] | Scaffold a new ML project. --plan for literature-grounded research plan. |
/turing:train [path] [N] | Run the experiment loop. Auto-detects project from cwd. |
/turing:status | Quick status: best model, convergence state |
/turing:compare <a> <b> | Side-by-side experiment comparison |
/turing:sweep | Systematic hyperparameter sweep |
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimnpx claudepluginhub thepyprogrammer/turing --plugin turingComprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.
A growing collection of Claude-compatible academic workflow bundles. Covers scientific figures, manuscript writing and polishing, reviewer assessment, citation retrieval, data availability, paper reading, literature search, response letters, paper-to-PPTX conversion, and evidence-grounded Chinese invention patent drafting. Rules are organized as reusable skill folders with explicit workflows and quality checks.
Comprehensive PR review agents specializing in comments, tests, error handling, type design, code quality, and code simplification
Comprehensive feature development workflow with specialized agents for codebase exploration, architecture design, and quality review
Upstash Context7 MCP server for up-to-date documentation lookup. Pull version-specific documentation and code examples directly from source repositories into your LLM context.
Intelligent draw.io diagramming plugin with AI-powered diagram generation, multi-platform embedding (GitHub, Confluence, Azure DevOps, Notion, Teams, Harness), conditional formatting, live data binding, and MCP server integration for programmatic diagram creation and management.