From autoresearch
Set up and run an autonomous experiment loop for any optimization target. Use when asked to "run autoresearch", "optimize X in a loop", "set up autoresearch", "start experiments", or "benchmark and optimize".
How this skill is triggered — by the user, by Claude, or both
Slash command
/autoresearch:autoresearch <goal or "resume" or "off" or "clear" or "status"><goal or "resume" or "off" or "clear" or "status">This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Autonomous experiment loop: try ideas, keep what works, discard what doesn't, never stop.
Autonomous experiment loop: try ideas, keep what works, discard what doesn't, never stop.
/autoresearch <goal> — set up a new session and start looping/autoresearch resume — resume from existing autoresearch.md/autoresearch status — show experiment results summary/autoresearch off — deactivate the loop (stop hook stops blocking)/autoresearch clear — delete autoresearch.jsonl and reset all stateAutoresearch MUST run in a git worktree. This is a strict requirement, not a suggestion. The worktree isolates all experiments from the user's main checkout — no files outside the worktree are ever at risk.
Before doing anything else, verify you are in a worktree:
# .git is a FILE in worktrees, a DIRECTORY in main checkouts
if [ -d .git ]; then
echo "ERROR: autoresearch must run in a git worktree."
echo "Start with: claude -w autoresearch-<name>"
fi
If not in a worktree, tell the user to restart with claude -w <name> and stop.
When starting a new session:
Verify worktree — run the check above. Do not proceed if not in a worktree.
Gather info — ask or infer from context:
bun run bench:fifo)recalc_us, microseconds, lower is better)bun run test)memory_mb < 512)5.00)Read source files deeply before writing anything. Understand the workload.
Write session files and commit them:
autoresearch.mdThe heart of the session. A fresh agent with zero context should be able to read this file and run the loop effectively. Invest time making it excellent.
# Autoresearch: <goal>
## Objective
Specific description of what we're optimizing and why.
## Metrics
- **Primary**: <name> (<unit>, <lower|higher> is better)
- **Secondary** (optional): additional metrics to track
## How to Run
./autoresearch.sh
## Files in Scope
List every file the agent may modify, with brief notes on what each does.
## Off Limits
What must NOT be touched and why.
## Constraints
Hard rules: tests must pass, no new dependencies, etc.
## Guards (must not regress)
Metrics that must stay within bounds regardless of the primary optimization.
Format: `metric_name operator threshold` (e.g., `memory_mb < 512`, `test_count >= 47`)
The benchmark script outputs these as additional METRIC lines.
If any guard is violated, treat the experiment as checks_failed.
## Baseline
- Primary metric: <value>
- Date: <date>
- Commit: <hash>
## What's Been Tried
Updated as experiments accumulate. Format:
- Run N: <description> → <kept|discarded|crashed> (<metric value>, <delta%>)
autoresearch.shBash benchmark script. Must:
set -euo pipefailMETRIC <name>=<number> lines on stdoutExample:
#!/bin/bash
set -euo pipefail
# Pre-check: fast syntax/compile verification
bun check
# Run benchmark
RESULT=$(bun run bench:fifo 2>&1)
echo "$RESULT"
# Extract and output metric
TIME=$(echo "$RESULT" | grep -oP 'recalc_us: \K[0-9.]+')
echo "METRIC recalc_us=$TIME"
autoresearch.checks.sh (optional)Only create when quality gates are needed. Runs after every passing benchmark.
set -euo pipefailExample:
#!/bin/bash
set -euo pipefail
bun run test --run 2>&1 | tail -20
Make scripts executable: chmod +x autoresearch.sh autoresearch.checks.sh
Activate the loop: write the state file that tells the stop hook to keep looping:
cat > .claude/autoresearch-loop.local.md << 'EOF'
---
stop_count: 0
max_iterations: 50
max_consecutive_discards: 8
max_cost_usd: 0
active: true
---
Read autoresearch.md for full context. Continue the experiment loop.
EOF
Set max_cost_usd to the user's cost ceiling if they specified one (0 = unlimited).
Set max_consecutive_discards to 8 by default (loop stops after 8 consecutive non-keep results).
${CLAUDE_PLUGIN_ROOT}/hooks/scripts/log-experiment.sh . init "<goal>" "<metric_name>" "<metric_unit>" "<lower|higher>"
./autoresearch.sh, parse the metric, log the baseline:${CLAUDE_PLUGIN_ROOT}/hooks/scripts/log-experiment.sh . result 1 <baseline_metric> keep "baseline"
Each iteration:
autoresearch.md (especially "What's Been Tried"), check git log for recent experiments, check autoresearch.ideas.md if it exists./autoresearch.shMETRIC name=value line from outputautoresearch.checks.sh exists): ./autoresearch.checks.shgit add -A
git commit -m "<description>
Autoresearch: {\"status\":\"keep\",\"metric\":<value>,\"delta\":\"<delta%>\"}"
Log the result using the helper (ensures consistent JSONL format):
${CLAUDE_PLUGIN_ROOT}/hooks/scripts/log-experiment.sh . result <run_number> <metric_value> keep "<description>"
Revert code changes (autoresearch files are automatically preserved):
${CLAUDE_PLUGIN_ROOT}/hooks/scripts/revert-experiment.sh .
Log the result:
${CLAUDE_PLUGIN_ROOT}/hooks/scripts/log-experiment.sh . result <run_number> <metric_value> discard|crash|checks_failed "<description>"
autoresearch.md: append to "What's Been Tried"LOOP FOREVER. Never ask "should I continue?" Never stop to summarize. Never wait for permission.
keep. Worse or equal → discard.keep.crash and move on.autoresearch.md has a ## Guards section, parse all guard metrics from the benchmark output and verify each one. If any guard is violated, treat the experiment as checks_failed regardless of primary metric improvement.<promise>AUTORESEARCH_COMPLETE</promise> in your message. The stop hook will detect this and cleanly terminate the loop.NEVER STOP. The user may be away for hours.
The stop hook computes an adaptive search strategy based on experiment history and includes it in the system message each turn. Adapt your approach:
Check the system message at the start of each iteration for the current strategy and reason.
For performance optimization targets, create autoresearch.profile.sh during setup. Run it once after the baseline to capture profiling data, then add a ## Profiling Hotspots section to autoresearch.md.
Example autoresearch.profile.sh for Python:
#!/bin/bash
set -euo pipefail
python3 -c "
import cProfile, pstats, io
# Import and run your target
from sort import sort_numbers
import random
random.seed(42)
data = random.sample(range(100000), 5000)
pr = cProfile.Profile()
pr.enable()
for _ in range(3): sort_numbers(data)
pr.disable()
s = io.StringIO()
pstats.Stats(pr, stream=s).sort_stats('cumulative').print_stats(15)
print(s.getvalue())
"
Add the output to autoresearch.md as a hotspot table. Re-profile after significant kept experiments to check if the bottleneck has moved.
autoresearch.sh, autoresearch.checks.sh, or the test suite to make metrics look better.if running_benchmark: ... shortcuts. The optimized code must be production-quality.When you discover promising but complex optimizations you won't pursue right now, append them as bullets to autoresearch.ideas.md. On resume, check and prune stale entries, experiment with the rest.
When /autoresearch resume is called or after context compaction:
autoresearch.md — this is the complete session stateautoresearch.jsonl — parse to find run count, best metric, recent resultsgit log --oneline -10 — see recent commitsautoresearch.ideas.md if it exists — promising paths to exploreIf the user sends a message while you're mid-experiment, finish the current run + log cycle first, then incorporate their feedback.
When /autoresearch status is called, use the /autoresearch-status command to display results.
The loop stops automatically when any of these occur:
max_cost_usd<promise>AUTORESEARCH_COMPLETE</promise> when all ideas are exhaustedWhen /autoresearch off is called:
.claude/autoresearch-loop.local.md (stops the stop hook)autoresearch.jsonl or autoresearch.mdWhen /autoresearch clear is called:
.claude/autoresearch-loop.local.mdautoresearch.jsonlautoresearch.md (it's still useful as documentation)Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub liljamesjohn-archive/claude-autoresearch --plugin autoresearch