Skill

autoresearch-skill

This skill should be used when the user asks to "improve a skill", "create an autoresearch loop", "iteratively improve", "optimize a skill", "run an improvement loop on skill X", "autoresearch skill", "autonomously improve", "evaluate and improve a skill", "benchmark a skill", or wants to set up an autonomous agent loop that iteratively experiments with and improves a Claude Code skill against user-defined criteria.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/cc-ar:autoresearch-skill

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Generate a self-contained shell script that autonomously improves a Claude Code

Supporting Files

references/improvement-strategies.mdreferences/prompt-templates.mdreferences/script-structure.md

SKILL.md

156 lines · ~1.7k tokens

Stats

Stars0

MaintenanceGood

Last CommitMar 23, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Autoresearch Skill Improvement Loop

Generate a self-contained shell script that autonomously improves a Claude Code skill through iterative cycles. Each iteration: improve the skill, actually execute it against test prompts, evaluate the execution output, keep or discard. Deterministic steps (git, scoring, cleanup) run as plain bash. Subjective steps (improving, executing, evaluating) run via claude -p calls with generated prompts.

Workflow

Step 1: Analyze the Target Skill

Read all files in the target skill directory. Build two things:

File inventory -- what files exist and can be modified:

Skill: mermaid-svg
Files:
  - SKILL.md (2,100 words)
  - references/layout-patterns.md (3,200 words)
  - references/style-guide.md (1,800 words)

Execution profile -- how the skill behaves when used:

Purpose: What the skill does when triggered (e.g., "converts Mermaid diagram syntax into hand-crafted SVG markup")
Required tools: What tools a claude -p call needs to execute the skill (e.g., Read,Write,Glob for a skill that reads input and writes files; Read,Edit,Write,Glob,Bash for a skill that runs scripts; Read,WebSearch, WebFetch for a research skill)
Artifacts: What files or output the skill produces when executed (e.g., *.svg files, *.ts files, text-only output to stdout)
Cleanup commands: Bash commands to remove artifacts between executions (e.g., rm -f *.svg, rm -rf output/). Empty if the skill produces only text output.

Step 2: Gather Criteria

If the user has not provided optimization criteria, ask for them. Criteria are specific, measurable aspects of the skill to improve. Examples:

"SVG output should use logical coordinate-based layout, not magic numbers"
"Research responses should include source quality evaluation"
"TypeScript guidance should cover monorepo project patterns"

Also ask whether the user wants to provide test prompts -- concrete user queries that exercise the skill. If not provided, generate 3-5 from the criteria. Test prompts should be realistic requests a user would make that trigger the skill.

Step 3: Build the Evaluation Rubric

Convert each criterion into a 1-5 scoring rubric with observable level descriptions. See references/prompt-templates.md for rubric construction guidance. Each level describes concrete, observable qualities of the execution output (not the skill files themselves).

Step 4: Compose the Prompts

Build three system prompts embedded in the script:

Improver prompt -- Instructs claude -p to read the skill files, analyze past results, propose and apply ONE improvement. Includes: the rubric, improvement strategies (from references/improvement-strategies.md), file constraints, and required HYPOTHESIS: output line.
Executor prompt -- Instructs claude -p to act as Claude with the skill loaded. The skill content is injected dynamically at runtime (the script reads the current SKILL.md + references each iteration). The executor receives a test prompt and responds as Claude would, following the skill's guidance.
Evaluator prompt -- Instructs claude -p to judge the execution outputs against the rubric. Receives: the collected execution outputs from all test prompts, the rubric, and outputs a structured score with AGGREGATE: line.

See references/prompt-templates.md for the full templates.

Step 5: Generate the Script

Generate a bash script following references/script-structure.md. Sections:

Configuration -- skill dir, branch, TSV path, test prompts array, cleanup commands, all system prompts as heredocs
Helper functions -- read_skill (reads current skill content at runtime), execute_skill (runs all test prompts, collects outputs), cleanup_artifacts (removes execution artifacts), summary (exit trap)
Setup -- create branch, initialize TSV
Baseline -- execute skill as-is against test prompts, evaluate outputs, record baseline score
Main loop -- improve, commit, execute, evaluate, decide, cleanup, log
Summary -- print results table on exit

Key design rules:

The executor claude -p call receives the current skill content (read at runtime via read_skill), not a fixed heredoc -- the skill changes each iteration
The executor gets tools from the execution profile (Step 1)
After each execution+evaluation cycle, cleanup_artifacts removes any files the skill created, so the next iteration starts clean
The evaluator judges execution output quality, not skill file content
Score extraction: grep -oP 'AGGREGATE:\s*\K[0-9.]+'
Keep/discard: score > best_score via bc -l
Plateau: 3 consecutive discards triggers exit
Trap EXIT for summary on Ctrl+C

Step 6: Write and Confirm

Write the script to improve-<skill-name>.sh in the repo root. Make it executable. Show the user:

The execution profile (tools, artifacts, cleanup)
The generated rubric
The test prompts
How to run: bash improve-<skill-name>.sh
How to stop: Ctrl+C (prints summary)
How to review: cat improvements.tsv and git log skill-improve/<tag>

Constraints

The generated script must be self-contained -- prompts baked in as heredocs, no external dependencies beyond claude, git, bc, grep
The executor must load the current skill content each iteration (not a stale copy from generation time)
Cleanup must remove all execution artifacts between iterations
The evaluator prompt and test prompts are fixed for the run
improvements.tsv stays untracked by git

Additional Resources

Reference Files

references/prompt-templates.md -- System prompt templates for all three claude -p calls (improve, execute, evaluate), rubric guide
references/script-structure.md -- Complete bash script structure with execute step, cleanup, and dynamic skill loading
references/improvement-strategies.md -- Catalog of skill improvement strategies, embedded in the improver system prompt

autoresearch-skill

Invocation

Context Preview

Supporting Files

SKILL.md

autoresearch-skill

Invocation

Context Preview

Supporting Files

SKILL.md

Autoresearch Skill Improvement Loop

Workflow

Step 1: Analyze the Target Skill

Step 2: Gather Criteria

Step 3: Build the Evaluation Rubric

Step 4: Compose the Prompts

Step 5: Generate the Script

Step 6: Write and Confirm

Constraints

Additional Resources

Reference Files

Similar Skills

Autoresearch Skill Improvement Loop

Workflow

Step 1: Analyze the Target Skill

Step 2: Gather Criteria

Step 3: Build the Evaluation Rubric

Step 4: Compose the Prompts

Step 5: Generate the Script

Step 6: Write and Confirm

Constraints

Additional Resources

Reference Files

Similar Skills