From autoresearch
Parses METRIC output lines from autoresearch.sh, infers units from suffixes, tracks primary vs secondary metrics across runs, and logs to JSONL for experiment analysis.
How this skill is triggered — by the user, by Claude, or both
Slash command
/autoresearch:metric-extractionThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Parses structured output from `autoresearch.sh` to extract primary and secondary metrics.
Parses structured output from autoresearch.sh to extract primary and secondary metrics.
Each metric is a single line matching:
METRIC <name>=<value>
a-z, A-Z, 0-9, _), dots (.), or µ. Examples: total_µs, compile_ms, cache.hitsNaN, Infinity, and non-numeric values are silently ignored.METRIC are ignored (but may contain useful diagnostics).keep vs discard.METRIC lines. Tracked for tradeoff monitoring but don't affect keep/discard decisions.If the primary metric is missing from output, treat the run as a crash — the benchmark didn't produce the expected data.
Infer units from metric name suffixes for display and context:
| Suffix | Unit |
|---|---|
µs | µs (microseconds) |
_ms | ms (milliseconds) |
_s or _sec | s (seconds) |
_kb | kb (kilobytes) |
_mb | mb (megabytes) |
| (none matched) | (unitless) |
Units are informational — they don't affect computation.
Maintain a list of known secondary metrics discovered across the session. When a new metric name appears in output that hasn't been seen before, register it with its inferred unit. This allows consistent reporting even when scripts evolve during the loop.
When logging an experiment, record metrics as:
{
"metric": 14600,
"metrics": {
"compile_µs": 4100,
"render_µs": 9500,
"cache.hits": 42
}
}
metric: the primary metric's numeric value (top-level for easy querying)metrics: object of all secondary metric name→value pairsThe autoresearch.sh script should output whatever helps the agent make better decisions:
The script can be updated during the loop as you learn what signal matters. Add instrumentation when you need more data to decide where to focus next.
npx claudepluginhub pbdeuchler/llm-plugins --plugin autoresearchSets up and runs autonomous experiment loops to optimize any target metric using git branches, autoresearch.md configs, bash benchmark scripts, and JSONL state logging. Activates on 'run autoresearch' or optimization loop requests.
Runs iterative experiments to optimize measurable metrics (speed, accuracy, config). Manages .lab/ directory for experiment history and autonomous workflow.
Sets up autoresearch experiments interactively or via CLI for code optimization, collecting domain, target file, eval command, metric, direction, and evaluator.