From autoresearch
Experiment loop discipline for autoresearch sessions — decision rules, git workflow, JSONL logging, benchmark metrics, anti-patterns
How this skill is triggered — by the user, by Claude, or both
Slash command
/autoresearch:autoresearchThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are in an autoresearch session. This skill governs how you run the experiment loop.
You are in an autoresearch session. This skill governs how you run the experiment loop.
State lives in files (survives context resets):
.autoresearch/current — active session id.autoresearch/sessions/<session-id>/state.md — config, rules, scope.autoresearch/sessions/<session-id>/benchmark.sh — benchmark wrapper.autoresearch/sessions/<session-id>/run.jsonl — run log (append-only)research/learnings/<session-id>.md — extracted learning that should be committedautoresearch/* — all work happens hereRuntime state under .autoresearch/** is in-progress state. Never commit it.
On context reset: read .autoresearch/current, then that session's state.md and only the last 20 lines of run.jsonl. The last JSONL entry tells you the run number and current state. Do not scan older sessions by default. To extend prior work, create a new session id and set Extends: in state.md; then read only the parent session's summary/last 20 runs unless the user asks for deeper history.
Default policy:
.autoresearch/currentExtends:On start/resume/stop, prune cold runtime sessions that are not active. Keep research/learnings/*.md as the durable record.
bash "$SESSION_DIR/benchmark.sh"$SESSION_DIR/state.md| Outcome | Action |
|---|---|
| Metric improves | git add <intended-paths> and commit with Result: trailer. JSONL: "status":"keep" |
| Metric regresses | Revert only intended experiment pathspecs. JSONL: "status":"discard" |
| Metric unchanged | Discard unless change is a prerequisite. JSONL: "status":"discard" |
| Benchmark crashes | Revert only intended experiment pathspecs. JSONL: "status":"crash". Diagnose before next run |
| Benchmark timeout | Treat as crash |
Never use git add . or whole-tree checkout/reset in an autoresearch session. They mix runtime trash, unrelated user edits, and experiment edits.
experiment: <short description>
<detailed rationale — what and why>
Result: <metric>=<value>, <metric>=<value>
Each line is a JSON object:
{"run":<n>,"commit":"<short-hash>","metrics":{<parsed>},"status":"keep|discard|crash","description":"<what changed>","timestamp":<unix>}
run — sequential, starts at 1 (baseline)commit — short hash of HEAD at time of run (before revert if discarded)metrics — full parsed output from parse-metrics.shstatus — decision outcomedescription — human-readable summary of the changetimestamp — Unix epoch secondsCommit learning only when it is reusable outside the active run. Use one file per session:
research/learnings/<session-id>.md
This avoids conflicts between resumed or parallel research sessions. Keep these commits separate from experiment commits when practical.
$SESSION_DIR/benchmark.sh mid-session unless the benchmark itself is broken. Log this as a special entry..autoresearch/**; it is local progress state.If .autoresearch/current exists but you have no conversation context:
.autoresearch/current to get the session id.autoresearch/sessions/<session-id>/state.md for config.autoresearch/sessions/<session-id>/run.jsonlgit log --oneline -5 for recent experiment commitsEvery 5 runs (or on user request), show a summary:
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub lagz0ne/1percent --plugin autoresearch