By lendtrain
Autonomous skill optimizer using Karpathy's autoresearch methodology. Includes PostToolUse hook for auto-screenshotting HTML outputs and Stop hook for dashboard verification.
A Claude Code plugin that autonomously optimizes skill prompts using Andrej Karpathy's autoresearch methodology. Instead of optimizing ML training code, it optimizes Claude Code skill prompts.
Point it at any skill, define what "good" looks like, and let it run. It executes the skill repeatedly with real inputs, scores every output on a 0-100 scale, mutates the prompt, keeps what improves the score, and discards the rest.
claude plugins marketplace add /path/to/autoresearch-for-skills
claude plugins install autoresearch@autoresearch-for-skills
Or add the GitHub repo as a marketplace:
# Add the marketplace
claude plugins marketplace add --source github --repo lendtrain/autoresearch-for-skills
# Install the plugin
claude plugins install autoresearch@autoresearch-for-skills
/autoresearch skill=path/to/SKILL.md iterations=10
The skill will ask you for:
Then it runs autonomously until stopped.
Each eval criterion is scored 0.00-100.00 per run. The experiment score is the weighted average across all evals and runs. A mutation is kept if it improves the score by 1.00+ points. Same score with less complexity = simplification win (kept).
On a git branch (autoresearch/<skill-name>-<date>):
git diff main shows total improvementIn a working directory (autoresearch-<skill-name>/):
autoresearch-<skill-name>/
├── runs/ # every output + screenshot from every run
├── dashboard.html # self-contained HTML dashboard with charts
├── results.json # structured experiment data
├── results.tsv # tab-separated score log
└── changelog.md # detailed mutation log
/autoresearchThe core autoresearch loop — gather context, build evals, establish baseline, run experiments, build dashboard.
runs/ directories during active runseval-guide.md — how to write eval criteria that actually workFrom a real run optimizing a frontend-design skill:
| Experiment | Score | Status | Description |
|---|---|---|---|
| 0 | 62.50% | baseline | Original skill |
| 1 | 100% | kept | Added explicit banned font list + approved alternatives |
| 2 | 100% | kept | Removed redundant paragraph (simplification win) |
From optimizing a design-review skill against a live Next.js app:
| Experiment | Score | Status | Description |
|---|---|---|---|
| 0 | 84.50 | baseline | Original skill — reported issues but didn't fix them |
| 1 | 91.56 | kept | Added mandatory fix gate — agent now produces actual code fixes |
| 2 | 90.88 | kept | Condensed fix gate from 26 to 3 lines (simplification win) |
MIT
Modifies files
Hook triggers on file write and edit operations
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
npx claudepluginhub lendtrain/autoresearch-for-skillsMortgage refinance quoting, analysis, and application submission powered by Lendtrain
Self-evolving skill engine for Claude Code. Creates, scores, repairs, and hardens skills autonomously through recursive improvement cycles.
建立新技能、修改和改進現有技能、衡量技能效能。用於從零開始建立技能、編輯或優化現有技能、執行評估測試、基準測試效能分析、或優化技能描述以提升觸發準確度
Autonomous experiment loop that optimizes any file by a measurable metric. 5 slash commands, 8 evaluators, configurable loop intervals (10min to monthly).
Agent Skills for improving SKILL.md files: mine repeated workflows from history, personalize and audit existing skills, or generalize personal skills for publication.
Create and manage Claude Code skills, plugins, subagents, and hooks. Use when building new skills, validating existing skills, testing skills empirically, creating plugins, converting projects to plugins, creating hooks, or managing plugin automation. Includes /skills-toolkit:skill-composer, /skills-toolkit:skill-refiner, /skills-toolkit:skill-tester, /skills-toolkit:plugin-creator, /skills-toolkit:subagent-creator, /skills-toolkit:hook-creator, and /skills-toolkit:ask-user-question skills.
Professional skill creation with TDD workflow. Features dual-mode (fast/full), behavioral validation, and automated quality gates for 9.0/10+ scores.