From claude-commands
Orchestrates an automated loop that generates, scores, and closes PRs for three research techniques (SelfRefine, ET, PRM) until each reaches n=15 samples in a Thompson bandit.
How this skill is triggered — by the user, by Claude, or both
Slash command
/claude-commands:autor-n15-loopThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
**Loop interval**: 30m | **Max duration**: 12h (24 iterations)
Loop interval: 30m | Max duration: 12h (24 iterations)
Drive SelfRefine/ET/PRM autor PR generation until each technique reaches n=15 samples in the Thompson bandit.
chore/auto-research-phase3 is pushed and not detachedpython technique_bandit/technique_selector.py --rank
If all three techniques have n≥15 → STOP (goal reached).
python technique_bandit/technique_selector.py --suggest <PR#>
Use the suggested technique for the next run.
cd to ~/llm-wiki-autor-phase3 (NOT the main workspace).
For the suggested technique, pick a paper from the autor benchmark that has the fewest SelfRefine/ET/PRM samples. Run the autor pipeline:
# Example for SelfRefine
cd ~/llm-wiki-autor-phase3
autor run --technique SelfRefine --paper <paper_id> --pr-number <next_pr>
If the autor CLI is not available, use the manual workflow:
Use the 6-dim rubric on the new PR diff:
python layer/score_pr.py <pr_number>
python technique_bandit/technique_selector.py --update --PR <pr> --score <score> --technique <tech>
gh pr close <pr> --repo $GITHUB_REPOSITORY \
--comment "autor eval: $tech score=$score. Closing — evaluation artifact, not a merge candidate."
Autor PRs are evaluation artifacts, not merge candidates. Always open as draft, always close after scoring. Do not leave them open.
git add -A && git commit -m "autor: <tech> n=$(n+1) score=<score>" && git push
If elapsed > 12h since first iteration → STOP.
After each iteration, print:
[iter N] SelfRefine n=X | ET n=Y | PRM n=Z | elapsed=Thh:mm
npx claudepluginhub jleechanorg/claude-commands --plugin claude-commandsRuns the autor research and SWE-bench benchmark loop: executes run_autor_experiment.py for technique comparison, evaluates against SWE-bench, and manages bandit state for technique selection.
Sets up autonomous experiment loops for code optimization targets. Gathers goal/metric/files, creates git branch/benchmark script/logging, runs baseline via subagent. For 'run autoresearch' or iterative experiments.
Runs an autonomous improvement loop: modify code, measure one metric, keep or discard changes, repeat. Use for overnight optimization against a quantified goal (coverage, bundle size, etc.).