From mlx
Autonomous time-budget experiment loop. Modify a training script, train for a fixed wall-clock budget, evaluate, record, repeat. Inspired by karpathy/autoresearch. Use for overnight architecture search, systematic hyperparameter sweeps, or any iterative model improvement workflow.
How this skill is triggered — by the user, by Claude, or both
Slash command
/mlx:autoexperimentopusThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Run autonomous time-budget experiment loops. Each iteration modifies `train.py`,
Run autonomous time-budget experiment loops. Each iteration modifies train.py,
trains for a fixed wall-clock budget, evaluates, records in results.tsv, and repeats.
results.tsv exists with a baseline (exp000) before iteratingEXPERIMENT.md with your goal, baseline, hypothesis, and constraints/mlx:autoexperiment path/to/train.pyEXPERIMENT.md for the current hypothesisresults.tsv for experiment historytrain.py with the single changetimeout $BUDGET uv run train.pyresults.tsv: KEEP / DISCARD / CRASHEXPERIMENT.md "Next to try" sectionSee references/EXPERIMENT.md.template for the hypothesis file format.
See scripts/time_budget_train.py for a complete training script template with all patterns.
total_nats / (math.log(2) * total_bytes) — vocab-independent metricif math.isnan(loss) or loss > 100: sys.exit(1)See references/autoexperiment-guide.md for full documentation.
npx claudepluginhub damionrashford/mlx --plugin mlxGenerates program.md for autonomous AI research experiments (Karpathy's autoresearch). Interviews user on codebase, metrics, constraints; explores code; tailors agent instructions from template.
Sets up and runs autonomous experiment loops to optimize any target metric using git branches, autoresearch.md configs, bash benchmark scripts, and JSONL state logging. Activates on 'run autoresearch' or optimization loop requests.
Guides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.