By surahli123
Iteratively improve any skill by running eval-grounded autoresearch, design audits, and mutation optimization to assess quality, build evals, and perform error analysis.
Karpathy's autoresearch + Hamel's Three Gulfs, applied to Claude Code skills.
AutoRefine is a guided workflow for improving a SKILL.md file from vague instructions to eval-grounded, trust-checked, iterative optimization. It is designed for skill authors who want more than prompt tweaking: it adds human error analysis, judge validation, and a disciplined mutation loop before claiming a skill got better.
Most skill iteration tools are strong at fast optimization but weak at proving that the optimization is real.
AutoRefine's core idea is simple:
That turns “prompt iteration” into a more reliable research loop.
Clone the repo and copy the shipped bundle into your Claude Code skills directory:
git clone https://github.com/surahli123/autorefine-skill-improvement.git
cp -r autorefine-skill-improvement/autorefine ~/.claude/skills/autorefine
If you want the bundle-level file breakdown, see autorefine/README.md.
The repo also includes .claude-plugin/ and .codex-plugin/ metadata for plugin/discovery surfaces. Both plugin manifests mirror the same public package identity and point at the shipped runtime bundle under ./autorefine.
/autorefine /full/path/to/your-skill
AutoRefine creates a workspace, copies your target skill into it, and then guides you through the full refinement pipeline.
Phase 1 audits the skill across gotchas, voice, progressive disclosure, anti-railroading, description quality, and scripts before later gulfs build evals from those findings.
The installable runtime lives under autorefine/. That bundle contains:
SKILL.mdreferences/gulf1-comprehension.mdreferences/gulf2-specification.mdreferences/gulf3-generalization.mdreferences.mddashboard.htmlscripts/record.pyscripts/records-to-gulf1.pyvalidate-host.shRepo-maintenance tests and internal design material are intentionally kept outside the shipped bundle.
The optional Phase 0.5 live-trace recorder path uses scripts/record.py, which requires Python with httpx available in the runtime environment. The converter script is stdlib-only.
This public repo is user-facing:
autorefine/ is the shipped bundledocs/ is curated user documentationInternal engineering material lives in the dev submodule and is not part of the public install surface.
MIT
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
npx claudepluginhub surahli123/autorefine-skill-improvementTools to maintain and improve CLAUDE.md files - audit quality, capture session learnings, and keep project memory current.
A growing collection of Claude-compatible academic workflow bundles. Covers scientific figures, manuscript writing and polishing, reviewer assessment, citation retrieval, data availability, paper reading, literature search, response letters, paper-to-PPTX conversion, and evidence-grounded Chinese invention patent drafting. Rules are organized as reusable skill folders with explicit workflows and quality checks.
Plugin-safe Claude Code distribution of Antigravity Awesome Skills with 1,561 supported skills.
Core skills library for Claude Code: TDD, debugging, collaboration patterns, and proven techniques
Harness-native ECC operator layer - 67 agents, 271 skills, 92 legacy command shims, reusable hooks, rules, selective install profiles, and production-ready workflows for Claude Code, Codex, OpenCode, Cursor, and related agent harnesses