From autoresearch
Autonomous codebase improvement loop inspired by Karpathy's autoresearch. USE WHEN user wants to iteratively improve a codebase, run autonomous code improvement, or apply the autoresearch pattern. Individual commands use /autoresearch directly.
How this skill is triggered — by the user, by Claude, or both
Slash command
/autoresearch:autoresearchThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Autonomous codebase improvement loop that converges on measurable improvements through iterative improve-evaluate-iterate cycles.
Autonomous codebase improvement loop that converges on measurable improvements through iterative improve-evaluate-iterate cycles.
/autoresearch # Interactive discovery mode
/autoresearch src/ --profile quality # Quality-focused on src/
/autoresearch --profile coverage # Maximize test coverage
/autoresearch --profile performance # Optimize performance
/autoresearch --resume # Resume a previous run
/autoresearch --dry-run # Preview what would be evaluated
Autoresearch runs a tight loop inspired by Karpathy's autoresearch pattern:
┌─ DISCOVER ──────────────────────────────┐
│ Analyze codebase → propose constraints │
│ → interview user via AskUserQuestion │
│ → lock evaluation commands │
└─────────────────────────────────────────┘
↓
┌─ BASELINE ──────────────────────────────┐
│ Create git branch → run all evaluators │
│ → capture baseline scores │
└─────────────────────────────────────────┘
↓
┌─ LOOP (until convergence) ──────────────┐
│ Improve → Evaluate → Decide → Track │
│ │
│ Keep if score improves (git commit) │
│ Revert if score regresses (git reset) │
│ Stop on diminishing returns │
└─────────────────────────────────────────┘
↓
┌─ REPORT ────────────────────────────────┐
│ Full LLM evaluation → learning report │
│ → improvement table → convergence data │
└─────────────────────────────────────────┘
| Argument | Description | Default |
|---|---|---|
[scope] | File or directory path(s) to improve | auto-discover |
--profile <name> | Preset: quality, performance, coverage | interactive |
--max-iterations <n> | Override max iterations | 20 |
--time-box <seconds> | Override per-iteration time box | 120 |
--resume | Resume from .autoresearch/state.json | — |
--dry-run | Discovery only, no loop | — |
| Profile | Focus | Evaluators | Time Box |
|---|---|---|---|
quality | Code quality, type safety, naming | lint 25%, types 20%, tests 25%, LLM 30% | 120s |
performance | Bundle size, algorithms, hot paths | lint 15%, tests 20%, benchmark 35%, LLM 30% | 180s |
coverage | Test coverage, edge cases | coverage 35%, tests 25%, lint 10%, LLM 30% | 150s |
Each axis is grounded in ISO 25010 quality characteristics with documented weight rationale and pre-computed orthogonality analysis.
.autoresearch/state.json — Loop state for resume (includes token breakdown, volatility, eval decisions).autoresearch/report.md — Full report with token dashboard, confidence intervals, trajectory analysis, learning summaryautoresearch/<timestamp>-<scope> with per-iteration commitsThe TypeScript modules in src/ provide structured reference implementations:
| Module | Purpose |
|---|---|
src/types.ts | Type definitions and defaults |
src/loop.ts | Core loop state machine |
src/discovery.ts | Codebase introspection + constraint pipeline |
src/report.ts | Summary report generation |
src/permissions.ts | Permission manifest + pre-flight verification |
src/scoring.ts | Phase-adaptive composite scoring (arithmetic/harmonic/geometric) |
src/analytics.ts | Token dashboard, confidence intervals, trajectory prediction |
src/scheduling.ts | Adaptive LLM eval scheduling + volatility detection |
src/evaluators/ | Static, test, LLM, custom, and fallback evaluators |
Provides a checklist for code reviews covering functionality, security, performance, maintainability, tests, and quality. Use for pull requests, audits, team standards, and developer training.
npx claudepluginhub dhanesh/autoresearch --plugin autoresearch