autotune

Autonomous optimization loops for Claude Code. Inspired by karpathy/autoresearch.

Try an idea, measure it, keep what works, repair what breaks, and pause with a reason when the loop gets stuck.

Works for any optimization target: test speed, bundle size, LLM training loss, build times, Lighthouse scores, inference latency — anything with a number you want to move.

How It Works

┌─────────────────────────────────────────────────┐
│                 Autotune Loop                │
│                                                  │
│ Edit code ─► Benchmark ─► Keep / Revert / Heal  │
│      ▲                           │               │
│      └──── Explain state ◄───────┘               │
│         until budget or pause                    │
└─────────────────────────────────────────────────┘

The Claude Code agent autonomously:

Analyzes the codebase and picks an optimization to try
Makes a focused code change
Runs the benchmark (autotune.sh)
Compares against baseline and guardrails: improved? keep (auto-commit). Worse? discard (auto-revert)
Tracks loop health: improving, plateaued, crashing, healing, paused
Applies recovery playbooks with bounded retries, then pauses with an explicit reason if recovery is exhausted
Logs experiments, recovery actions, and decisions to autotune.jsonl

What's Included

Infrastructure (domain-agnostic):

init-experiment.sh — Initialize a session with metric name, unit, direction
run-experiment.sh — Run benchmark, parse METRIC name=value lines, run correctness checks
log-experiment.sh — Log results, auto-commit/revert, compute confidence scores, classify health
dashboard.sh — Terminal dashboard with live monitoring and health state
lib/health.py — Recovery controller and explainable health transitions
Hooks for auto-resume and benchmark enforcement

Agent (the brain):

agents/autotune.md — Full autonomous loop with resume protocol, loop rules, and tool protocol

Skill (setup wizard):

skills/autotune/SKILL.md — Gathers goal, writes session files, starts the loop

Quick Start

Install as Claude Code Plugin (recommended)

# Step 1: Add the marketplace
/plugin marketplace add hwuiwon/autotune

# Step 2: Install the plugin
/plugin install autotune@autotune

For local development/testing:

claude --plugin-dir /path/to/autotune

Run

cd /path/to/your/project

# Interactive — agent asks what to optimize
claude --agent autotune

# Quick start with a goal
claude --agent autotune -p "Optimize test suite speed"

Monitor

In a separate terminal:

bash "${CLAUDE_PLUGIN_ROOT:-/path/to/autotune}/bin/dashboard.sh" --watch

Stop / Resume

# Stop the loop (preserves all data)
autotune stop

# Explain the current state and next action
autotune explain

# Force the loop into repair mode
autotune repair

# Resume later
claude --agent autotune

# Clear everything and start fresh
autotune clear

Session Persistence

Two files keep the session alive across restarts and context resets:

autotune.jsonl — Append-only log of every experiment (metric, status, commit, description, ASI)
autotune.md — Living document: objective, what's been tried, dead ends, key wins
.autotune.state — Current operating mode, health state, streaks, and recovery budget

A fresh agent with no memory can read these two files and continue exactly where the previous session left off.

Confidence Scoring

After 3+ experiments, computes a confidence score using Median Absolute Deviation (MAD) as a robust noise estimator:

confidence = |best_improvement| / MAD

Score	Label	Meaning
>= 2.0	High (green)	Improvement is likely real
1.0-2.0	Marginal (yellow)	Could be noise — consider re-running
< 1.0	Within noise (red)	Improvement may not be real

Health And Recovery

Autotune is not a blind infinite loop anymore. It tracks loop health and switches strategies when the run stops being productive.

Health states:

running / improving — normal optimization mode
plateaued — repeated non-improving runs
crashing — benchmark or checks failures are stacking up
healing — a recovery playbook is active
paused — recovery budget is exhausted and the loop stopped safely

Recovery playbooks are configured in priority order and can include:

rebaseline
shrink_scope
diagnose
pause

Use autotune explain to see the current health state, last decision reason, and suggested next action.

Backpressure Checks

Optional autotune.checks.sh runs correctness checks after every passing benchmark:

#!/bin/bash
set -e
pnpm typecheck
pnpm test
pnpm lint

If checks fail, the result is logged as checks_failed and the changes are reverted.

autotune

Popularity

What's Inside

README

autotune

How It Works

What's Included

Quick Start

Install as Claude Code Plugin (recommended)

Run

Monitor

Stop / Resume

Session Persistence

Confidence Scoring

Health And Recovery

Backpressure Checks

Confidence

Similar Plugins

autoresearch

autoresearch-agent

autoresearch-builder

gepa-research

skills

caveman

Popularity

Health & Quality

Similar Plugins

autoresearch

autoresearch-agent

autoresearch-builder

gepa-research

skills

caveman