From autoresearch
Set up and run an autonomous experiment loop for any optimization target. Gathers what to optimize, then starts the loop immediately. Use when asked to "run autoresearch", "optimize X in a loop", "set up autoresearch for X", or "start experiments".
How this skill is triggered — by the user, by Claude, or both
Slash command
/autoresearch:autoresearchThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Autonomous experiment loop: try ideas, keep what works, discard what doesn't, never stop.
Autonomous experiment loop: try ideas, keep what works, discard what doesn't, never stop.
All session files live in .research/<dir>/ where <dir> matches the branch suffix. For branch autoresearch/<goal>-<date>, the directory is .research/<goal>-<date>/.
You use your built-in tools (Bash, Read, Write, git) to run experiments, track state, and manage code changes. There are no special tools — you do everything yourself.
Bash to execute .research/<dir>/autoresearch.sh, capture output, parse METRIC lines.Write/Read to append results to .research/<dir>/autoresearch.jsonl.Bash to git add -A && git commit.Bash to revert code changes while preserving .research/ directory.Write/Edit to maintain .research/<dir>/autoresearch.md.maxIterations to null (loop forever).git checkout -b autoresearch/<goal>-<date> then derive: DIR=.research/<goal>-<date>; mkdir -p $DIR$DIR/autoresearch.md and $DIR/autoresearch.sh (see below). Commit both.autoresearch.mdThis is the heart of the session. A fresh agent with no context should be able to read this file and run the loop effectively. Invest time making it excellent.
# Autoresearch: <goal>
## Objective
<Specific description of what we're optimizing and the workload.>
## Metrics
- **Primary**: <name> (<unit>, lower/higher is better)
- **Secondary**: <name>, <name>, ...
## How to Run
`.research/<dir>/autoresearch.sh` — outputs `METRIC name=number` lines.
## Files in Scope
<Every file the agent may modify, with a brief note on what it does.>
## Off Limits
<What must NOT be touched.>
## Constraints
<Hard rules: tests must pass, no new deps, etc.>
## What's Been Tried
<Update this section as experiments accumulate. Note key wins, dead ends,
and architectural insights so the agent doesn't repeat failed approaches.>
Update .research/<dir>/autoresearch.md periodically — especially the "What's Been Tried" section — so resuming agents have full context.
autoresearch.shLives in .research/<dir>/autoresearch.sh. Bash script (set -euo pipefail) that: pre-checks fast (syntax errors in <1s), runs the benchmark, outputs METRIC name=value lines to stdout. Keep the script fast — every second is multiplied by hundreds of runs. Update it during the loop as needed.
For fast, noisy benchmarks (< 5s), run the workload multiple times inside the script and report the median. This produces stable data points and makes the confidence score reliable from the start. Slow workloads (ML training, large builds) don't need this — single runs are fine.
autoresearch.checks.sh (optional)Lives in .research/<dir>/autoresearch.checks.sh. Bash script (set -euo pipefail) for backpressure/correctness checks: tests, types, lint, etc. Only create this file when the user's constraints require correctness validation (e.g., "tests must pass", "types must check").
When this file exists:
checks_failed and revert.keep a result when checks have failed.Keep output minimal. Suppress verbose progress/success output and let only errors through.
#!/bin/bash
set -euo pipefail
# Example: run tests and typecheck — suppress success output, only show errors
pnpm test --run --reporter=dot 2>&1 | tail -50
pnpm typecheck 2>&1 | grep -i error || true
After running .research/<dir>/autoresearch.sh, parse stdout for lines matching:
METRIC <name>=<number>
Regex: ^METRIC\s+(\S+)=([0-9]*\.?[0-9]+)\s*$
The primary metric matches the metric name from setup. All other METRIC lines are secondary metrics. If no METRIC lines are found, manually extract values from the output.
After 3+ experiments, compute a confidence score to distinguish real improvements from benchmark noise.
Algorithm (MAD-based):
|value - median| for each value.confidence = |best_kept - baseline| / MADInterpretation:
= 2.0x — improvement is likely real
The score is advisory — it never auto-discards.
.research/<dir>/autoresearch.jsonlAppend-only file. Each line is a JSON object. Two types:
Config header (written at session start):
{"type":"config","name":"<session name>","metric_name":"<name>","metric_unit":"<unit>","direction":"lower|higher","maxIterations":20,"segment":0,"timestamp":1234567890}
The maxIterations field is a number (positive integer) or null. When null or omitted, the session loops forever.
Result (written after each experiment):
{"type":"result","metric":15200,"metrics":{"compile_us":4200,"render_us":9800},"status":"keep|discard|crash|checks_failed","description":"what was tried","commit":"abc1234","confidence":2.3,"segment":0,"timestamp":1234567890}
The segment field increments on each config header. Use the current (highest) segment for confidence calculations.
keep (primary metric improved):git add -A && git commit -m "autoresearch: <description> (<metric_name>=<value>)"
discard, crash, or checks_failed:Revert all code changes but preserve the .research/ directory:
# Save research state
git stash push --include-untracked -- .research/ 2>/dev/null || true
git checkout -- .
git clean -fd
git stash pop 2>/dev/null || true
Always append the result to .research/<dir>/autoresearch.jsonl BEFORE reverting, so the record is preserved.
LOOP FOREVER — unless maxIterations is set. Never ask "should I continue?" — the user expects autonomous work.
keep. Worse/equal → discard. Secondary metrics rarely affect this..research/<dir>/autoresearch.md exists, read it + git log, continue looping..research/<dir>/autoresearch.checks.sh exists and fails, the result MUST be logged as checks_failed and reverted.type: "result" lines in the current segment of autoresearch.jsonl. If maxIterations is set (non-null) and the count >= maxIterations, stop looping and proceed to the Final Summary section below.NEVER STOP — unless maxIterations is reached. The user may be away for hours. Keep going until interrupted or the iteration limit is hit.
When maxIterations is reached, stop looping and do the following:
.research/<dir>/autoresearch.md — write a final "What's Been Tried" summary covering all experiments, key wins, and dead ends.git add -A && git commit -m "autoresearch: max iterations (N) reached — final summary"
Replace N with the actual maxIterations value.When you discover complex but promising optimizations that you won't pursue right now, append them as bullets to .research/<dir>/autoresearch.ideas.md. Don't let good ideas get lost.
On resume (context limit, crash), check .research/<dir>/autoresearch.ideas.md — prune stale/tried entries, experiment with the rest. When all paths are exhausted, delete the file and write a final summary.
If the user sends a message while an experiment is running, finish the current run + log cycle first, then incorporate their feedback in the next iteration. Don't abandon a running experiment.
Detect an active session by checking .research/ for a subdirectory matching the current branch suffix. For branch autoresearch/foo-2026-03-20, look for .research/foo-2026-03-20/.
If .research/<dir>/autoresearch.md exists when you start:
.research/<dir>/autoresearch.md to understand the objective and what's been tried..research/<dir>/autoresearch.jsonl to reconstruct state (latest config header, all results in current segment).maxIterations from the latest config header. Count all type: "result" lines in the current segment.maxIterations is set (non-null) and result count >= maxIterations: stop immediately — do NOT resume the loop. Instead, proceed directly to the Final Summary section (update autoresearch.md, commit, print summary, stop).git log for recent commits..research/<dir>/autoresearch.ideas.md for pending ideas.Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub oskarhane/oskars.ai --plugin autoresearch