Skill

autoexperiment

From mlx

Autonomous time-budget experiment loop. Modify a training script, train for a fixed wall-clock budget, evaluate, record, repeat. Inspired by karpathy/autoresearch. Use for overnight architecture search, systematic hyperparameter sweeps, or any iterative model improvement workflow.

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/mlx:autoexperiment

User invocable

Model invocable

Inline context

Default effort

Configuration

Modelopus

Tool Access

This skill is limited to the following tools:

Bash(uv run * scripts/time_budget_train.py *) BashReadWriteEditGlobGrep

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Run autonomous time-budget experiment loops. Each iteration modifies `train.py`,

Supporting Files

evals/evals.jsonreferences/EXPERIMENT.md.templatereferences/autoexperiment-guide.mdscripts/time_budget_train.py

SKILL.md

67 lines · ~597 tokens

Stats

LanguageJupyter Notebook

Stars2

MaintenanceExcellent

Last CommitApr 9, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Autoexperiment Skill

Run autonomous time-budget experiment loops. Each iteration modifies train.py, trains for a fixed wall-clock budget, evaluates, records in results.tsv, and repeats.

Setup

Ensure results.tsv exists with a baseline (exp000) before iterating
Create EXPERIMENT.md with your goal, baseline, hypothesis, and constraints
Run: /mlx:autoexperiment path/to/train.py

Protocol

Before each iteration

Read EXPERIMENT.md for the current hypothesis
Read results.tsv for experiment history
Identify one change to make (ONE variable only)

Iteration loop

Edit train.py with the single change
Run with TIME_BUDGET: timeout $BUDGET uv run train.py
Capture exit code and metrics
Record in results.tsv: KEEP / DISCARD / CRASH
If CRASH 3× in a row on the same error → stop, report diagnosis

After each iteration

Update EXPERIMENT.md "Next to try" section
Summarize: what changed, what happened, what's next

Templates

See references/EXPERIMENT.md.template for the hypothesis file format. See scripts/time_budget_train.py for a complete training script template with all patterns.

Key patterns

TIME_BUDGET: wall-clock seconds, not epochs. ~12 experiments/hour at 300s each
val_bpb: total_nats / (math.log(2) * total_bytes) — vocab-independent metric
GC freeze: after step 0 eliminates ~500ms stalls
Fast fail: if math.isnan(loss) or loss > 100: sys.exit(1)
Circuit breaker: 3 consecutive CRASHes on same error → escalate to user

See references/autoexperiment-guide.md for full documentation.

autoexperiment

Popularity

Invocation

Configuration

Tool Access

Context Preview

Supporting Files

SKILL.md

autoexperiment

Popularity

Invocation

Configuration

Tool Access

Context Preview

Supporting Files

SKILL.md

Autoexperiment Skill

Setup

Protocol

Before each iteration

Iteration loop

After each iteration

Templates

Key patterns

Similar Skills

Autoexperiment Skill

Setup

Protocol

Before each iteration

Iteration loop

After each iteration

Templates

Key patterns

Similar Skills