Search everything...

Stats

Actions

Available In

turing

Name: turing
Author: thepyprogrammer

By thepyprogrammer

Autonomous ML research harness — the autoresearch loop as a formal protocol. 74 commands, 2 specialized agents, skills/turing source layout, operational intelligence (postmortem + doctor + plan), model lifecycle (update + registry), what-if analysis (whatif + counterfactual + simulate), collaboration (onboard + share + review), research communication (cite + present + changelog), experiment archaeology (trend + flashback + archive + annotate + search + template + replay), model surgery (prune + quantize + merge + surgery), feature & training intelligence, model debugging, pre-training intelligence, meta-intelligence, scaling & efficiency, model composition, deep analysis, experiment orchestration, literature + paper, model export, profiling, checkpoints, experiment intelligence, statistical rigor, tree-search, cost-performance, model cards, hypothesis database, novelty guard, anti-cheating, taste-leverage loop. Inspired by Karpathy's autoresearch and the scientific method itself.

npx claudepluginhub thepyprogrammer/turing --plugin turing

Popularity

Stars

Med: 0·Avg: 285

Installs

Med: 0·Avg: 1

What's Inside

Slash Commands76

ablate

/ablate

Run systematic ablation study — remove components one at a time, measure impact, produce publication-ready table with dead-weight flagging.

annotate

/annotate

Retrospective experiment annotations — add human notes, tags, and context that automated metrics can't capture.

README

turing

The research assistant that can't fool itself.

A Claude Code plugin that runs autonomous ML experiment loops, named after the man who first asked whether machines could think. Two agents enforce a strict separation: one writes code, one scores it, and neither can see the other's work. Immutable evaluation, anti-cheating guardrails, and structured hypothesis tracking make sure the results stay honest. When code is free, research is all that matters. You bring the research taste; Turing handles the rest.

Separation: the agent modifies train.py; it cannot see or touch evaluate.py
Memory: every hypothesis registered, every experiment logged, every variant preserved
Convergence: automatic detection of diminishing returns; the agent stops when it should
Taste: you inject ideas with /turing:try, read results with /turing:brief

[!NOTE] Turing is in active development. Some features are rough around the edges. Issues and feedback welcome.

Install

npm install -g claude-turing && claude-turing install --global && claude-turing verify

The Taste-Leverage Loop

You have taste: the accumulated judgment about which problems are tractable, which metrics matter, and which directions are dead ends. Turing has leverage: the discipline to run experiments without fatigue, track every result without amnesia, and measure without contamination.

The interface is two verbs:

/turing:try switch to LightGBM        Your taste → the agent
/turing:brief --deep                   The agent's results → you

Everything in between (experiment logging, convergence detection, hypothesis tracking, statistical validation, anti-cheating guardrails) is infrastructure connecting those two endpoints. You think about what to try. Turing handles how to try it.

What a Session Looks Like

/turing:init                          Scaffold a new ML project
/turing:train                         Agent runs 5-10 experiments autonomously
/turing:brief                         Campaign summary: what improved, what's exhausted
/turing:try "add polynomial features" Inject your next idea
/turing:train                         Agent follows your lead

For fully hands-off operation:

/loop 5m /turing:train

The agent trains, evaluates, keeps improvements, discards regressions, detects convergence, and stops. You come back to a briefing.

How It Works

The experiment loop. Every iteration: observe metrics, hypothesize (human ideas first), edit train.py, commit to a git branch, train, measure (agent can't see how), keep or revert, log, check convergence.

Hypothesis tracking. Every idea flows through hypotheses.yaml with a novelty guard that blocks duplicates. Detail files record architecture, hyperparameters, expected outcome, actual result, and lineage. Nothing is forgotten between sessions.

Anti-cheating stack. Six structural layers, not prompt-based rules. The agent cannot see evaluate.py, cannot discover scoring formulas, cannot reverse-engineer fixed seeds. It knows the metric name, the direction, and the result. That's it. Research on autonomous ML agents shows that every prompt-based rule got worked around; every code-based rule held.

Two agents, strict boundary. @ml-researcher (Read/Write/Edit/Bash) modifies code and runs experiments. @ml-evaluator (Read/Bash only) analyzes results. An analyst who cannot act on their observations makes more trustworthy observations.

Convergence detection. After N consecutive non-improvements (default 3, configurable), the agent stops. For noisy metrics, /turing:validate auto-configures multi-run evaluation so the agent can't be rewarded for lucky single runs.

Command Reference

Core Loop

Command	What it does
`/turing:init [--plan]`	Scaffold a new ML project. `--plan` for literature-grounded research plan.
`/turing:train [path] [N]`	Run the experiment loop. Auto-detects project from cwd.
`/turing:status`	Quick status: best model, convergence state
`/turing:compare <a> <b>`	Side-by-side experiment comparison
`/turing:sweep`	Systematic hyperparameter sweep

Taste-Leverage Interface

View full README on GitHub

Similar Plugins

fullstack-dev-skills

10.0k·455·

Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.

v0.4.15

Jeffallan

nature-skills

20.0k·124·

A growing collection of Claude-compatible academic workflow bundles. Covers scientific figures, manuscript writing and polishing, reviewer assessment, citation retrieval, data availability, paper reading, literature search, response letters, paper-to-PPTX conversion, and evidence-grounded Chinese invention patent drafting. Rules are organized as reusable skill folders with explicit workflows and quality checks.

v1.0.0

Yuan1z0825

pr-review-toolkit

30.2k·268·

Comprehensive PR review agents specializing in comments, tests, error handling, type design, code quality, and code simplification

1mo

v1.0.0

anthropics

feature-dev

30.2k·140·

Comprehensive feature development workflow with specialized agents for codebase exploration, architecture design, and quality review

3mo

v1.0.0

anthropics

context7-plugin

55.5k·266·

Upstash Context7 MCP server for up-to-date documentation lookup. Pull version-specific documentation and code examples directly from source repositories into your LLM context.

3mo

[email protected]

upstash

drawio-diagramming

12·108·

Intelligent draw.io diagramming plugin with AI-powered diagram generation, multi-platform embedding (GitHub, Confluence, Azure DevOps, Notion, Teams, Harness), conditional formatting, live data binding, and MCP server integration for programmatic diagram creation and management.

2mo

v2.0.0

markus41

turing - Claude Code Plugin | ClaudePluginHub

turing

Popularity

Confidence

What's Inside

README

turing

Install

The Taste-Leverage Loop

What a Session Looks Like

How It Works

Command Reference

Core Loop

Taste-Leverage Interface

Similar Plugins

fullstack-dev-skills

nature-skills

pr-review-toolkit

feature-dev

context7-plugin

drawio-diagramming

turing

Install

The Taste-Leverage Loop

What a Session Looks Like

How It Works

Command Reference

Core Loop

Taste-Leverage Interface

Popularity

Health & Quality

Similar Plugins

fullstack-dev-skills

nature-skills

pr-review-toolkit

feature-dev

context7-plugin

drawio-diagramming