Marketplace

autoresearch-marketplace

Autonomous codebase improvement loop inspired by Karpathy's autoresearch.

npx claudepluginhub dhanesh/autoresearch

README

1 Plugin

autoresearch

0·

Autonomous codebase improvement loop inspired by Karpathy's autoresearch. Iteratively improves code quality, test coverage, performance, and architecture using multi-metric evaluation with diminishing returns detection.

2mo

v1.0.0

dhanesh

Stats

Plugins1

UpdatedApr 6, 2026

Links

View on GitHub View Marketplace JSON

Autoresearch

Autonomous codebase improvement loop for Claude Code, inspired by Karpathy's autoresearch.

Runs a tight improve-evaluate-iterate loop that converges on measurable codebase improvements across code quality, test coverage, performance, and architecture.

Install

As a Claude Code plugin

# Add to your project's .claude/settings.json
{
  "plugins": ["github:dhanesh/autoresearch#plugin"]
}

Local development

claude --plugin-dir /path/to/autoresearch/plugin

Via install script

# From clone
bash install/install.sh

# From remote
curl -fsSL https://raw.githubusercontent.com/dhanesh/autoresearch/main/install/install.sh | bash

Usage

/autoresearch                              # Interactive — discover constraints from your codebase
/autoresearch src/ --profile quality       # Quality-focused improvement on src/
/autoresearch --profile coverage           # Maximize test coverage
/autoresearch --profile performance        # Optimize performance
/autoresearch --max-iterations 10          # Limit iterations
/autoresearch --resume                     # Resume a previous run
/autoresearch --dry-run                    # Preview what would be evaluated

How It Works

DISCOVER → BASELINE → LOOP → REPORT

Discover — Analyzes your codebase tooling (linters, test runners, type checkers), proposes evaluation constraints, and interviews you via AskUserQuestion to accept/modify/add constraints
Baseline — Creates a git branch, runs all evaluators, captures baseline scores (0-100 per axis)
Loop — Each iteration: improve code → evaluate across all axes → keep if composite improves, revert if it regresses → auto-stop on diminishing returns
Report — Full LLM evaluation explaining WHY each change was made, improvement tables, convergence analysis

Evaluation Axes

Axis	What it measures	Examples
Static Analysis	Lint warnings, type errors, complexity	ESLint, TSC, Biome, Ruff
Test Suite	Pass rate, coverage percentage	Jest, Vitest, pytest
LLM Rubric	Readability, architecture, maintainability	4-dimension weighted rubric
Custom	User-defined metrics	Bundle size, benchmarks, custom scripts

All scores normalized to 0-100 and combined via weighted composite.

Preset Profiles

Profile	Best for	Weights
`quality`	Reducing complexity, improving naming, strengthening types	lint 25%, types 20%, tests 25%, LLM 30%
`performance`	Bundle size, algorithmic complexity, hot paths	lint 15%, tests 20%, benchmark 35%, LLM 30%
`coverage`	Adding tests, covering edge cases, assertion quality	coverage 35%, tests 25%, lint 10%, LLM 30%

Safety

Git branch isolation — never touches main/master
Command sandboxing — SHA-256 hash verification on all registered commands
Scope enforcement — reads anything, writes only within declared scope
Circuit breaker — auto-stops on >10% regression in any metric
Non-destructive git — never force-push, delete branches, or rewrite history
Iteration cap + wall-clock timeout — hard limits prevent runaway loops

Project Structure

autoresearch/
├── plugin/                    # Claude Code plugin (distributable)
│   ├── plugin.json            # Plugin metadata
│   ├── commands/              # /autoresearch command
│   ├── skills/autoresearch/   # Overview skill
│   ├── hooks/                 # SessionStart + PreCompact hooks
│   ├── lib/                   # TypeScript reference implementations
│   ├── profiles/              # Preset evaluation profiles
│   └── README.md              # Plugin documentation
├── install/                   # Installation scripts
│   ├── install.sh             # Multi-agent installer
│   └── uninstall.sh           # Cleanup
├── src/                       # Source (canonical)
│   ├── types.ts               # Core types and defaults
│   ├── loop.ts                # Loop state machine
│   ├── discovery.ts           # Codebase introspection
│   ├── report.ts              # Report generation
│   └── evaluators/            # Multi-axis evaluation engine
├── profiles/                  # Preset profiles (canonical)
├── SKILL.md                   # Main skill definition
├── package.json               # Project metadata
└── .manifold/                 # Constraint manifold (design docs)

License

MIT

autoresearch-marketplace

README

1 Plugin

autoresearch

autoresearch-marketplace

README

Autoresearch

Install

As a Claude Code plugin

Local development

Via install script

Usage

How It Works

Evaluation Axes

Preset Profiles

Safety

Project Structure

License

1 Plugin

autoresearch

Related Marketplaces

antigravity-awesome-skills

claude-code-workflows

claude-plugins-official

Autoresearch

Install

As a Claude Code plugin

Local development

Via install script

Usage

How It Works

Evaluation Axes

Preset Profiles

Safety

Project Structure

License

Related Marketplaces

antigravity-awesome-skills

claude-code-workflows

claude-plugins-official