autotune
Autonomous optimization loops for Claude Code. Inspired by karpathy/autoresearch.
Try an idea, measure it, keep what works, repair what breaks, and pause with a reason when the loop gets stuck.
Works for any optimization target: test speed, bundle size, LLM training loss, build times, Lighthouse scores, inference latency — anything with a number you want to move.
How It Works
┌─────────────────────────────────────────────────┐
│ Autotune Loop │
│ │
│ Edit code ─► Benchmark ─► Keep / Revert / Heal │
│ ▲ │ │
│ └──── Explain state ◄───────┘ │
│ until budget or pause │
└─────────────────────────────────────────────────┘
The Claude Code agent autonomously:
- Analyzes the codebase and picks an optimization to try
- Makes a focused code change
- Runs the benchmark (
autotune.sh)
- Compares against baseline and guardrails: improved? keep (auto-commit). Worse? discard (auto-revert)
- Tracks loop health: improving, plateaued, crashing, healing, paused
- Applies recovery playbooks with bounded retries, then pauses with an explicit reason if recovery is exhausted
- Logs experiments, recovery actions, and decisions to
autotune.jsonl
What's Included
Infrastructure (domain-agnostic):
init-experiment.sh — Initialize a session with metric name, unit, direction
run-experiment.sh — Run benchmark, parse METRIC name=value lines, run correctness checks
log-experiment.sh — Log results, auto-commit/revert, compute confidence scores, classify health
dashboard.sh — Terminal dashboard with live monitoring and health state
lib/health.py — Recovery controller and explainable health transitions
- Hooks for auto-resume and benchmark enforcement
Agent (the brain):
agents/autotune.md — Full autonomous loop with resume protocol, loop rules, and tool protocol
Skill (setup wizard):
skills/autotune/SKILL.md — Gathers goal, writes session files, starts the loop
Quick Start
Install as Claude Code Plugin (recommended)
# Step 1: Add the marketplace
/plugin marketplace add hwuiwon/autotune
# Step 2: Install the plugin
/plugin install autotune@autotune
For local development/testing:
claude --plugin-dir /path/to/autotune
Run
cd /path/to/your/project
# Interactive — agent asks what to optimize
claude --agent autotune
# Quick start with a goal
claude --agent autotune -p "Optimize test suite speed"
Monitor
In a separate terminal:
bash "${CLAUDE_PLUGIN_ROOT:-/path/to/autotune}/bin/dashboard.sh" --watch
Stop / Resume
# Stop the loop (preserves all data)
autotune stop
# Explain the current state and next action
autotune explain
# Force the loop into repair mode
autotune repair
# Resume later
claude --agent autotune
# Clear everything and start fresh
autotune clear
Session Persistence
Two files keep the session alive across restarts and context resets:
autotune.jsonl — Append-only log of every experiment (metric, status, commit, description, ASI)
autotune.md — Living document: objective, what's been tried, dead ends, key wins
.autotune.state — Current operating mode, health state, streaks, and recovery budget
A fresh agent with no memory can read these two files and continue exactly where the previous session left off.
Confidence Scoring
After 3+ experiments, computes a confidence score using Median Absolute Deviation (MAD) as a robust noise estimator:
confidence = |best_improvement| / MAD
| Score | Label | Meaning |
|---|
| >= 2.0 | High (green) | Improvement is likely real |
| 1.0-2.0 | Marginal (yellow) | Could be noise — consider re-running |
| < 1.0 | Within noise (red) | Improvement may not be real |
Health And Recovery
Autotune is not a blind infinite loop anymore. It tracks loop health and switches strategies when the run stops being productive.
Health states:
running / improving — normal optimization mode
plateaued — repeated non-improving runs
crashing — benchmark or checks failures are stacking up
healing — a recovery playbook is active
paused — recovery budget is exhausted and the loop stopped safely
Recovery playbooks are configured in priority order and can include:
rebaseline
shrink_scope
diagnose
pause
Use autotune explain to see the current health state, last decision reason, and suggested next action.
Backpressure Checks
Optional autotune.checks.sh runs correctness checks after every passing benchmark:
#!/bin/bash
set -e
pnpm typecheck
pnpm test
pnpm lint
If checks fail, the result is logged as checks_failed and the changes are reverted.