From rosary
Continuous improvement loop — assesses open beads, classifies by effort, dispatches generator→evaluator pipeline with file-based handoff. No per-agent commits — evaluator gates all changes. Post-mortem refines agent prompts. Modes: --simplify (refactor only, no net new code), --security, --fe, --prune. Composable with /loop for cron scheduling.
How this skill is triggered — by the user, by Claude, or both
Slash command
/rosary:evolve [--dry-run] [--simplify|--security|--fe|--prune] [--focus <area>][--dry-run] [--simplify|--security|--fe|--prune] [--focus <area>]This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Autonomous improvement for any ART repo. Reads beads, picks up work, ships
Autonomous improvement for any ART repo. Reads beads, picks up work, ships fixes, closes beads. Humans review the summary, not permission prompts.
Composable: /loop 10m /rosary:evolve --simplify for continuous refactoring.
| Mode | Constraint | Goal |
|---|---|---|
| (default) | Fix closable beads | Ship work |
--simplify | No net new code — refactor only | Extract abstractions, reduce LOC |
--security | Security findings only | Audit + fix vulnerabilities |
--fe | FE consistency only | Design language, UX |
--prune | Delete only — dead code, stale beads | Reduce surface area |
--dry-run | Assess + plan, no changes | Preview what would happen |
Evolve operates at three scales. The assessment phase determines which:
| Scale | Scope | When | Intel source |
|---|---|---|---|
| File | Single file refactor | --simplify on small beads | LSP, mache get_impact |
| Repo | Cross-file coherence | Default mode, most beads | mache get_communities, dependency graph |
| Ecosystem | Cross-repo dependencies | Beads touching shared types/APIs | rosary.toml repo list, ../ traversal |
Ecosystem scale applies when:
Scale detection is DERIVED, not configured:
rosary.toml exists -- if yes, read the repo map../ for sibling repos (common monorepo/workspace pattern)At ecosystem scale, the scoping-agent maps dependencies (rosary.toml OR ../ scan),
then checks if the bead's files touch any cross-repo seams (use rosary:seam-discovery).
If yes, plan.md must document which repos are affected and in what order.
The skill works with zero config. rosary.toml is a hint, not a requirement.
Agents are personas (perspective + judgment) with abilities (skills + tools):
| Persona | Perspective | Skills | Constraints |
|---|---|---|---|
| scoping-agent | Planner — enriches before expensive work | note | Haiku model, cheap |
| dev-agent | Implementer — finds complexity, fixes it | simplify | Full access |
| principal-agent | Does what is right, not what was asked | evolve, simplify, seam-discovery, note | Full access, Opus |
| skeptic-agent | Distrusts AI output, assumes wrong until proven | note | Read-only, no Write/Edit |
| staging-agent | Adversarial tester — tests test real behavior? | Read-only | |
| prod-agent | Finds resource leaks, error swallowing, concurrency | note | Read-only |
| pm-agent | Strategic — cross-repo overlap, scope creep, retro | note, evolve | Full access |
Mode determines which personas deploy:
| Mode | Generator | Evaluator |
|---|---|---|
| default | dev-agent | staging-agent |
--simplify | principal-agent | skeptic-agent |
--security | prod-agent | skeptic-agent |
--fe | dev-agent | staging-agent |
--prune | janitor-agent | skeptic-agent |
The scoping-agent determines team composition. Mode selects which AGENTS, scale selects HOW MANY:
| Bead scope | Team | Rationale |
|---|---|---|
| 1 file | Generator solo | No coordination overhead |
| 2-5 files, same module | Generator + evaluator | Minimal viable pipeline |
| >5 files OR cross-module | Scoping + generator + evaluator + skeptic | Full pipeline |
--simplify (any size) | principal-agent + skeptic | Simplify always needs adversarial check |
--security (any size) | prod-agent + skeptic | Security always needs adversarial check |
How to determine scale:
--simplify, --security) override file-count logic## Team sectionFor 1-file scale: the generator agent does its own verification (no separate evaluator). This avoids spawning a second agent just to re-run one lint command.
Separate generator from evaluator. Self-evaluation creates confirmation bias; external evaluation enables skepticism. File-based handoff between stages.
bead (contract: "what done looks like")
-> scoping-agent (reads bead + files, writes plan.md)
-> dev-agent (implements, stages changes -- does NOT commit)
-> evaluator (runs typecheck + tests, writes eval.md)
-> PASS: team lead commits + closes bead
-> FAIL: feedback.md -> dev-agent retries (max 3)
-> pm-agent (post-mortem, updates operating principles)
Critical rules:
rsry bead list -> open beads for this repo
mache get_overview -> structural health (if available)
git log --oneline -20 -> recent momentum
Classify each open bead:
For --simplify mode: ignore beads. Scan codebase for:
For each closable bead, the scoping-agent DERIVES the lifecycle from the code. Do not ask the human to define these -- read the codebase and figure it out. If something is NOT derivable, ASK the human before proceeding.
Starting point (derive from bead + code):
Stopping point (derive from existing tests + types):
Validation (derive from the dependency graph):
--prune.Verification and attestation (cite sources):
The scoping-agent writes all of this into plan.md:
## Plan: {bead-id} -- {title}
### Starting point
- Problem: {derived from bead description}
- Files: {list with line numbers}
- Current behavior: {what the code does now, with citations}
### Stopping point
- Done when: {test name} passes
- Acceptance: typecheck + {N} existing tests + 1 new test
- LOC delta: {estimate, negative for simplify}
### Validation
- Callers: {list from mache/grep, with file:line}
- Risk: {low|medium|high} because {reason}
### Verification
- Citations: {every claim has a file:line or bead-id}
- Questions for human: {anything not derivable}
If the Questions for human section is non-empty, STOP and ask before executing.
An empty questions section means the plan is self-contained and can proceed autonomously.
The scoping-agent discovers verification commands for the repo. This replaces any hardcoded assumptions about what tools a repo uses.
Principle: One path, multiple callers. Evolve uses the same commands a developer runs locally, CI runs remotely, and pre-commit runs on commit. It never invents its own verification commands.
Probe hierarchy (scoping-agent checks in order, uses first match per category):
| Priority | File | Extract |
|---|---|---|
| 1 | Taskfile.yml | Parse task names — lint, test, check, typecheck, build |
| 2 | Makefile | Parse targets — lint, test, check |
| 3 | package.json | Parse scripts — test, lint, typecheck, build |
| 4 | Cargo.toml | cargo test, cargo clippy, cargo build |
| 5 | go.mod | go test ./..., go vet ./... |
| 6 | mix.exs | mix test, mix format --check-formatted |
| 7 | pyproject.toml | pytest, ruff check . |
Fast-fail layer: Before running any of the above, the evaluator runs
mache get_diagnostics on changed files. Type/syntax errors caught in
milliseconds — no point running a 30-second test suite if the code doesn't parse.
Commit gate: If .pre-commit-config.yaml exists, run pre-commit run --all-files
as a final gate. If not, suggest bootstrapping from the reference template
in docs/pre-commit-reference.yaml.
Hard stop: If NO verification tooling is found, STOP and ask the human: "No verification tooling found in {repo}. What should I run?" Do not guess. Do not proceed without verification.
The scoping-agent writes all discovered commands into plan.md's ## Verification
section. Downstream agents read this section — they never probe on their own.
Generator (dev-agent, per bead):
## Verification section)git add)changes.md with:
Evaluator (staging-agent, after ALL generators finish):
mache get_diagnostics on changed files (fast-fail — ms)changes.mdeval.md per bead: PASS or FAIL with:
On FAIL: writes feedback.md with the exact error output (command, exit code,
stderr). Generator retries with this specific error context (max 3). If all retries
fail, bead stays open with error details as a comment via rsry_bead_comment.
On PASS: team lead commits all passing beads as one batch.
Only after evaluator passes ALL changes:
[bead-id] type(scope): descriptiongit push origin mainAfter every run, pm-agent writes postmortem.md:
## /evolve post-mortem -- {date}
### What worked
- dev-agent extracted middleware correctly on first try
### What failed
- api-cleanup left unused imports (evaluator caught on retry)
### Operating principles (updated)
1. Always run typecheck before reporting done
2. JSX components: verify rendered, not just imported
3. as-any casts: document WHY, link upstream issue
### Agent prompt improvements
- dev-agent: add "run typecheck before done"
- staging-agent: add "check unused imports specifically"
### Metrics
- Beads closed: 4 | Retries: 1 | Human interventions: 0
Operating principles feed back into agent prompts. Each run makes agents better.
## /evolve run -- {date} ({mode})
### Closed
- rig-abc123: Fixed X (3 files, 2 tests)
### Needs human
- rig-967ae1: Needs pricing decision
### Blocked
- rig-bf1493: Waiting on CF beta
### Health
- Typecheck: pass | Tests: 52/52 | Deploy: v{id}
- Beads: 39 -> 35 | LOC delta: -47
### Post-mortem
- 1 retry, 0 human interventions, 2 principles updated
/rosary:evolve # full sweep
/rosary:evolve --simplify # refactor pass
/rosary:evolve --dry-run # preview only
/loop 30m /rosary:evolve --simplify # continuous refactoring
/loop 4h /rosary:evolve # full sweep every 4 hours
START: bead open + files specified + description clear + verification discovered STOP: evaluator passes + ALL discovered verify commands green (with evidence) ESCALATE: critical issue found, or >3 files without tests, or no verification tooling found RETRY: evaluator fails with specific error output (max 3 per bead) NEVER: commit without eval pass, push with failing tests, proceed without verification
Commits: [bead-id] type(scope): description
PRs: bead link + test results + files + agent name
Post-mortem: what agents learned + principles updated
Future: BDR-configurable output templates per repo (rig-d2b660)
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub agentic-research/rosary --plugin rosary