By shawnroos
Your overnight research assistant. Obsessively analyzes codebases for tunable parameters, designs experiments, runs them in worktrees while you sleep, and delivers findings with recommendations.
Train a local LLM to handle lightweight research tasks. Your intern starts dumb (really dumb) but learns from every nerd run and earns responsibility through demonstrated competence. Requires a local LLM serving stack (Ollama, MLX-LM, llama.cpp, or vLLM).
Run a continuous self-improvement loop on a specific aspect of your codebase. The agent edits code, runs it, measures the result, keeps improvements, discards regressions, and repeats indefinitely. Like Karpathy's autoresearch but for any codebase feature. Use: /nerd-loop 'search relevance' or /nerd-loop 'api response time'
Schedule nerd experiments to run at specific times (e.g., overnight). Uses macOS LaunchAgent for scheduling.
One-time global setup for the nerd plugin. Detects hardware, installs the training variant (MLX for Apple Silicon, original for NVIDIA), runs calibration benchmarks, and saves a hardware profile. Only needs to run once per machine — projects auto-initialize on first /nerd run.
Check the status of the nerd queue, running experiments, and backlog. Shows progress, completed findings, and pending proposals.
Scans a scoped set of files for tunable parameters and clusters results into research themes. Used by /nerd-this for context-scoped experiment discovery.
Executes nerd experiment plans in isolated worktrees. Builds evaluation harnesses, runs parameter sweeps, captures results. Use when an experiment plan is ready and needs to be implemented and run.
Runs aptitude tests and ongoing evaluation of the local LLM intern. Calls the intern endpoint with benchmark examples for each task type (parameter-detection, result-classification, context-extraction), scores against expected outputs, and returns structured results with accuracy per task and mode recommendations. Use during /nerd-intern setup or when re-evaluating intern capability.
Pre-flight validation agent that checks whether the lab is ready before experiments run. Verifies data access (WAL-mode, file permissions, exports), confirms config fields are actually wired in execution paths, scaffolds missing eval infrastructure (export scripts, test fixtures, datasets), and reports readiness. Use before experiment execution or before starting a nerd-loop to confirm the environment can produce valid results.
Analyzes nerd research findings, experiment reports, and backlog proposals to identify the best candidates for deep /nerd-loop continuous improvement. Looks for areas with high improvement potential, measurable metrics, and clear scope boundaries. Use after /nerd completes or when deciding what to loop on.
Reference for identifying tunable parameters in codebases. Use when scanning for research targets — hardcoded thresholds, magic numbers, heuristic weights, prompt templates, pipeline budgets.
Reference for designing nerd experiments — competing theories, sweep harnesses, ground truth strategies, metric selection, and feasibility checks. Use when creating or reviewing experiment plans.
Canonical delegation protocol for the nerd intern. Reference this when delegating tasks to the local LLM in /nerd or /nerd-this orchestrators. Defines health checks, timeouts, confidence gating, shadow comparison, fallback, and logging.
Reference for intern training data formats, benchmark structure, and evaluation protocol. Use when running aptitude tests, collecting training data, or evaluating intern performance.
Reference card for performance research — anti-pattern catalog, profiling tool reference, metric command templates, and measurability gate criteria. Use when writing performance experiment plans or analyzing performance findings.
Modifies files
Hook triggers on file write and edit operations
Uses power tools
Uses Bash, Write, or Edit tools
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Your codebase has hundreds of hardcoded thresholds, magic numbers, and untested heuristics. You don't know which ones matter. The nerd does.
A Claude Code plugin that obsessively researches your codebase overnight — finding every tunable parameter, designing rigorous experiments with competing theories, running them in isolated worktrees, and delivering findings that tell you what to keep, what to change, and what to rearchitect. It remembers what it learned, so it never wastes time re-testing what it already proved.
Most "optimization" is guessing. You tweak a threshold, eyeball the result, ship it. The nerd treats your codebase like a research problem:
The most valuable findings aren't parameter tweaks. They're architectural discoveries that only emerge when you test competing explanations.
claude plugin install nerd
/nerd-setup # One-time hardware calibration
/nerd # Let the nerd loose on your codebase
/nerd-loop "search relevance" # Deep continuous iteration on one area
/nerd-schedule tonight # Run experiments overnight
/nerd-setup runs once per machine. Projects auto-initialize on first /nerd run.
/nerd — Broad ResearchScans your codebase for every tunable parameter, designs experiments with competing theories, validates the lab environment, runs them in parallel, and delivers structured findings.
/nerd "search ranking"
├─ parameter-scanner finds 12 tunable parameters
├─ plan-reviewer generates 3 competing theories per experiment
├─ lab-tech validates data access, config wiring, build cache
├─ experiment-executor runs experiments in parallel worktrees
├─ report-compiler evaluates which theories held up
└─ loop-scout recommends the best target for deep iteration
/nerd-loop — Deep IterationKarpathy's autoresearch pattern applied to your code. Reads the code, hypothesizes an improvement, makes the change, measures, keeps if better, reverts if not — and repeats until it hits a local maximum.
/nerd-loop "search relevance"
├─ Establishes baseline metric (e.g., nDCG@10)
├─ Loops: edit → test → measure → keep/discard
├─ Pivots strategy after 5 consecutive failures
├─ Escalates after another 5
└─ Stops at local maximum (15 failures across 3 strategies)
It doesn't just sweep parameters — it rewrites algorithms, restructures logic, removes unnecessary code. Anything within the scoped files is fair game.
/nerd-this — Context-Scoped ResearchResearch just what you're working on right now. Infers scope from your current branch, session files, and conversation topics, then groups findings into research themes.
/nerd-this auth flow
├─ Infers scope from git diff + session context
├─ Groups parameters into research themes
└─ Runs the full experiment pipeline on selected themes
This is the core insight. Most experiment tools ask "is this parameter optimal?" The nerd asks "what's actually going on?" by generating 3+ competing theories per experiment:
| Theory Type | What It Tests |
|---|---|
| Parameter is wrong | A different value would improve the metric |
| Model is wrong | The mathematical model is inappropriate — try a different one entirely |
| Feature is unnecessary | Removing the feature causes no degradation |
| Data is the bottleneck | The parameter doesn't matter because the input data is the real problem |
| Architecture is the bottleneck | No parameter value can fix this — the architecture needs to change |
Reports evaluate each theory as SUPPORTED / REFUTED / INCONCLUSIVE and recommend: KEEP, CHANGE, REMOVE, REARCHITECT, or INVESTIGATE.
Every theory, verdict, and finding is persisted in a JSON knowledge graph. The nerd gets smarter with every run:
npx claudepluginhub shawnroos/nerdYour overnight research assistant. Obsessively analyzes codebases for tunable parameters, designs experiments, runs them in worktrees while you sleep, and delivers findings with recommendations.
Find and destroy zombie processes and repo slop spawned by Claude
Find and destroy zombie processes and repo slop spawned by Claude
Workflow-agnostic pulsed loop engine for Claude Code: runs the auto loop pattern (plan-loop -> seam -> work-loop, parallel fan-out, severity-based exit) as a durable, observable state machine. Ships named recipes (A1 Classic, A2 Parallel Theories+Judge, A4 Adversarial Pair, W Work-only) — pick a workflow topology at /auto start, or author your own. A disk-persisted per-unit ledger is the loop's source of truth; the engine is workflow-blind and drives any workflow through a thin adapter. Self-paces in-session via ScheduleWakeup; resume after a suspend is one command off the durable ledger.
Autonomous experiment loops on any codebase — one file, one metric, one loop. Based on Karpathy's autoresearch pattern.
Autonomous experiment loop for any project type. Inspired by karpathy/autoresearch.
Autonomous experiment loop that optimizes any file by a measurable metric. 5 slash commands, 8 evaluators, configurable loop intervals (10min to monthly).
Research harness for optimizing code with the GEPA algorithm (LLM-driven genetic-Pareto search).
Autonomous experimentation skill — your AI coding agent designs experiments, tests hypotheses, discards failures, keeps wins. Runs overnight while you sleep.
Multi-model consensus engine integrating OpenAI Codex CLI, Gemini CLI, and Claude CLI for collaborative code review and problem-solving.