From cc-ar
This skill should be used when the user asks to "improve a skill", "create an autoresearch loop", "iteratively improve", "optimize a skill", "run an improvement loop on skill X", "autoresearch skill", "autonomously improve", "evaluate and improve a skill", "benchmark a skill", or wants to set up an autonomous agent loop that iteratively experiments with and improves a Claude Code skill against user-defined criteria.
How this skill is triggered — by the user, by Claude, or both
Slash command
/cc-ar:autoresearch-skillThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Generate a self-contained shell script that autonomously improves a Claude Code
Generate a self-contained shell script that autonomously improves a Claude Code
skill through iterative cycles. Each iteration: improve the skill, actually
execute it against test prompts, evaluate the execution output, keep or discard.
Deterministic steps (git, scoring, cleanup) run as plain bash. Subjective steps
(improving, executing, evaluating) run via claude -p calls with generated
prompts.
Read all files in the target skill directory. Build two things:
File inventory -- what files exist and can be modified:
Skill: mermaid-svg
Files:
- SKILL.md (2,100 words)
- references/layout-patterns.md (3,200 words)
- references/style-guide.md (1,800 words)
Execution profile -- how the skill behaves when used:
claude -p call needs to execute the skill
(e.g., Read,Write,Glob for a skill that reads input and writes files;
Read,Edit,Write,Glob,Bash for a skill that runs scripts; Read,WebSearch, WebFetch for a research skill)*.svg files, *.ts files, text-only output to stdout)rm -f *.svg, rm -rf output/). Empty if the skill produces only
text output.If the user has not provided optimization criteria, ask for them. Criteria are specific, measurable aspects of the skill to improve. Examples:
Also ask whether the user wants to provide test prompts -- concrete user queries that exercise the skill. If not provided, generate 3-5 from the criteria. Test prompts should be realistic requests a user would make that trigger the skill.
Convert each criterion into a 1-5 scoring rubric with observable level
descriptions. See references/prompt-templates.md for rubric construction
guidance. Each level describes concrete, observable qualities of the
execution output (not the skill files themselves).
Build three system prompts embedded in the script:
Improver prompt -- Instructs claude -p to read the skill files,
analyze past results, propose and apply ONE improvement. Includes: the
rubric, improvement strategies (from references/improvement-strategies.md),
file constraints, and required HYPOTHESIS: output line.
Executor prompt -- Instructs claude -p to act as Claude with the
skill loaded. The skill content is injected dynamically at runtime (the
script reads the current SKILL.md + references each iteration). The
executor receives a test prompt and responds as Claude would, following
the skill's guidance.
Evaluator prompt -- Instructs claude -p to judge the execution
outputs against the rubric. Receives: the collected execution outputs
from all test prompts, the rubric, and outputs a structured score with
AGGREGATE: line.
See references/prompt-templates.md for the full templates.
Generate a bash script following references/script-structure.md. Sections:
read_skill (reads current skill content at
runtime), execute_skill (runs all test prompts, collects outputs),
cleanup_artifacts (removes execution artifacts), summary (exit trap)Key design rules:
claude -p call receives the current skill content
(read at runtime via read_skill), not a fixed heredoc -- the skill
changes each iterationcleanup_artifacts removes any
files the skill created, so the next iteration starts cleangrep -oP 'AGGREGATE:\s*\K[0-9.]+'score > best_score via bc -lWrite the script to improve-<skill-name>.sh in the repo root. Make it
executable. Show the user:
bash improve-<skill-name>.shcat improvements.tsv and git log skill-improve/<tag>claude, git, bc, grepimprovements.tsv stays untracked by gitreferences/prompt-templates.md -- System prompt templates for all
three claude -p calls (improve, execute, evaluate), rubric guidereferences/script-structure.md -- Complete bash script structure
with execute step, cleanup, and dynamic skill loadingreferences/improvement-strategies.md -- Catalog of skill improvement
strategies, embedded in the improver system promptnpx claudepluginhub dannycoates/cc-ar --plugin cc-arImproves existing Claude Code skills by fixing under/over-triggering, refining instructions, adding sub-skills, and evolving architecture based on feedback.
Autonomously optimizes skill prompts using a mutate/score/keep evolutionary loop with git-based revert. Useful for improving SKILL.md performance over time.
Autonomously optimizes Claude Code skills by iteratively running them on test inputs, scoring against binary evals, reflecting on failures to mutate prompts, and archiving improvements. Invoke via /auto-optimize for skill enhancement or autoresearch.