auto-optimize

AutoResearch for Performance Engineering

Measure first. Reason deep. Reflect. Repeat.

Andrej Karpathy introduced the idea of autoresearch — closing the loop between hypothesis, experiment, and reflection so that an AI agent can drive an entire research cycle autonomously. auto-optimize applies that idea to performance engineering.

You define a numeric goal and a success threshold. The plugin builds regression and benchmark infrastructure, locks a baseline, then runs an autonomous loop: profile → reason → plan → apply → test → measure → reflect → repeat. Every iteration is a git commit. Every decision is reasoned, recorded, and fed back into the next cycle.

Installation

claude plugin marketplace add bluuewhale/auto-optimize
claude plugin install auto-optimize@auto-optimize

Quick Start

/auto-optimize The API is slow. I want to make it faster.

auto-optimize will ask a few clarifying questions — metric, scope, success target, and test commands — then take it from there. If benchmarks or regression tests are missing, it writes them before starting the loop.

Real-World Result: 27% Faster Hash Table

Full write-up: HashSmith Part 3 — I Automated My Way to a 27% Faster Hash Table

HashSmith is an open-source high-performance hash table for the JVM — a SwissTable-style map built around SWAR probing and 8-byte control word groups. After two rounds of manual optimization, the author handed the profiler to auto-optimize.

One prompt. ~3 hours. No manual intervention.

/auto-optimize I want to optimize the get/put performance of the SwissMap implementation.


Experiments run	5
Optimizations landed	3
Dropped	2
Improvement vs baseline	13–32% across all 8 benchmark scenarios

What the agent found

The agent ran 5 experiments autonomously. Three compounding wins, in order:

Tombstone guard — the probe loop was carrying tombstone-handling logic on a path where tombstones essentially never exist in production. Splitting into two specialized loop bodies eliminated the dead weight. Put path: -19% to -45%.
ILP hoisting on the read path — emptyMask was being computed after the key-equality loop, creating a serial dependency. Moving it adjacent to eqMask let the CPU's out-of-order engine pipeline both SWAR operations in the same clock cycle. Get path: -11% to -36%.
A third, smaller improvement compounded on top of both.

None of these required a single line of code written by the author. The structured reasoning pipeline (Step-Back → CoT → Self-Consistency → Pre-mortem) found the tombstone fast path by asking what is this loop doing that it doesn't need to do? — a question that wasn't visible in the disassembly alone.

The Problem

Most optimization attempts fail silently:

You change code, feel like it's faster, ship it — but never measured before or after
You write a quick benchmark once, optimize for it, then lose the script
You try five approaches, forget what you tried, and repeat the same dead-ends

auto-optimize enforces the discipline you know you should have but don't.

How It Works

Phase	What Happens	Output
0. Gather	Collects goal, scope, metric direction, and numeric success criteria	`experiment-plan.md`
1. Infra	Builds Regression Test and Benchmark Test scripts if missing (parallel sub-agents)	`tests/` + `bench/`
1.5 Baseline	Locks noise-floor-validated baseline measurement and environment snapshot	`baseline/`
2. Loop	Profile → Disassemble → Reason (Opus) → Apply → Test → Benchmark → Reflect	`iterations/` + `leaderboard.md`
3. Report	Summarizes all iterations, best config, and recommended next steps	`final-report.md`

Every iteration is a git commit — including reverts. The full experiment history is always recoverable.

The Intelligence Layer

Most AI coding tools apply changes and hope for the best. auto-optimize's inner loop is built differently — each iteration runs a structured reasoning pipeline powered by Claude Opus before a single line of code is touched.

Multi-technique Reasoning per Iteration

Every iteration delegates planning to a dedicated Opus sub-agent that applies four reasoning techniques in sequence:

auto-optimize

Popularity

What's Inside

README

auto-optimize

Installation

Quick Start

Real-World Result: 27% Faster Hash Table

What the agent found

The Problem

How It Works

The Intelligence Layer

Multi-technique Reasoning per Iteration

Confidence

Similar Plugins

performance-benchmarker

autoresearch-agent

autoresearch

perf-profiler

performance-optimization-advisor

gepa-research

Popularity

Health & Quality

Similar Plugins

performance-benchmarker

autoresearch-agent

autoresearch

perf-profiler

performance-optimization-advisor

gepa-research