From qe-framework
Runs an autonomous code-modify-evaluate loop to optimize code against a single metric. Useful for ML training, algorithm benchmarks, build optimization, and performance tuning.
How this skill is triggered — by the user, by Claude, or both
Slash command
/qe-framework:QautoresearchThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Karpathy's autoresearch pattern generalized. Repeatedly modify target files → run → evaluate metric → keep/discard, looping until manually stopped.
Karpathy's autoresearch pattern generalized. Repeatedly modify target files → run → evaluate metric → keep/discard, looping until manually stopped.
Collect the following (skip items already provided):
| Item | Description | Required |
|---|---|---|
| tag | Session name (used as branch name) | Yes |
| target_files | Files agent may modify (1+). target_file singular also accepted | Yes |
| fixed_files | Files that must never be modified | Yes |
| run_command | Experiment execution command | Yes |
| metric_grep | Grep pattern to extract metric | Yes |
| metric_direction | lower_is_better or higher_is_better | Yes |
| time_budget | Seconds per experiment (default: 300) | No |
| timeout_multiplier | Timeout multiplier (default: 2) | No |
| context_files | Reference files for agent to read | No |
git checkout -b autoresearch/<tag>commit\tmetric\tmemory_gb\tstatus\tdescriptionEcompact-executor in background (context window pressure monitoring)After setup, this loop NEVER stops until the user manually interrupts.
[Background: Ecompact-executor — context pressure monitoring]
LOOP FOREVER:
1. Assess current state
2. Form hypothesis
3. Modify code
3.5. [Ecode-reviewer] Code review — fix Critical issues
4. git commit
5. Run experiment
6. Extract results
7. Handle crash (if any) — [Ecode-debugger]
8. Record in results.tsv
8.5. [Every 5th run] Trend analysis
9. Verdict: keep or discard
10. Next idea → back to 1
Priority order: (1) variations on near-miss experiments, (2) optimization opportunities in code, (3) hyperparameter exploration, (4) architecture changes, (5) radical alternatives
Report: Experiment #{N}: {hypothesis summary}
Delegate to Ecode-reviewer (foreground, 10s timeout).
git add {modified target files}
git commit -m "experiment #{N}: {hypothesis}"
Do NOT use /Qcommit (no skill nesting in autonomous loop).
timeout {time_budget * timeout_multiplier} bash -c '{run_command} > run.log 2>&1'
Always redirect to run.log (prevent context window pollution).
Run {metric_grep}. Parse number from output. No number → crash → Step 7.
tail -n 50 run.log for error contentEcode-debugger: pass error trace + target_files code → get root cause + fix suggestiongit reset --hard to restore state{commit_hash_7char}\t{metric_value}\t{memory_gb}\t{status}\t{description}
Status: keep, discard, or crash. results.tsv stays untracked (never git commit).
Analyze results.tsv:
Use analysis to dynamically re-weight hypothesis priorities in Step 2.
| Condition | Verdict |
|---|---|
| Metric improved (vs best) | keep |
| Metric same or worse | discard (git reset --hard HEAD~1) |
| Marginal improvement (<0.1%) + code complexity increase (20+ lines) | discard |
| Same metric but simpler code (fewer lines) | keep |
| Code deleted, metric same or better | keep |
Report: Experiment #{N}: KEEP/DISCARD/CRASH — metric {old} → {new} ({delta})
Autoresearch Session: {tag}
Total: {N} | Keep: {k} | Discard: {d} | Crash: {c}
Best metric: {value} (experiment #{best_n})
Branch: autoresearch/{tag}
Analyze autoresearch/<tag> branch git history: effective change types, repeated failure directions, discovered optimization patterns. Save to autoresearch-lessons-{tag}.md.
To merge: git checkout main && git merge autoresearch/{tag}
| Dependency | Type | Connection | Required |
|---|---|---|---|
| Ecompact-executor | Agent | Phase 1 → Phase 2 (background) | Recommended |
| Ecode-reviewer | Agent | Step 3.5 (10s timeout) | Recommended |
| Ecode-debugger | Agent | Step 7 (crash only) | Recommended |
| Qlesson-learned | Skill pattern | Phase 3 | Optional |
On dependency failure: skip the step and continue the loop. Dependencies improve quality but must never stop the loop.
run.log/Qcommit skill/Qautoresearch
tag: mar15
target_files: train.py
fixed_files: prepare.py
run_command: uv run train.py
metric_grep: grep "^val_bpb:" run.log
metric_direction: lower_is_better
npx claudepluginhub inho-team/qe-framework --plugin qe-frameworkCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.