From ev
Execution loop for tiered-transform work. Runs a phase as a sequence of tiers, each tier processing a batch of files under a tier contract, gated by evaluator verdicts and pre-flight checks. Writes tactical retros between tiers. Dispatches to the bin/loom CLI directly; composes /guild-validate; composes no other loop. Use when a phase is a bulk transform, audit, or find-replace-style operation across many files.
How this skill is triggered — by the user, by Claude, or both
Slash command
/ev:ev-loop-confidence <project-slug-or-path> <phase-number><project-slug-or-path> <phase-number>This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Execute one phase of a project as a confidence loop: tiered transforms,
Execute one phase of a project as a confidence loop: tiered transforms, ratcheting from small/safe to large/risky, with an evaluator verdict per unit and a tactical retro per tier.
Composes: bin/loom CLI (via Bash) for substrate
operations; /guild-validate (via the Skill tool) for the antagonist
panel.
Does not compose: other loops. Peer loops are invoked by the router,
not by each other. No ambient /loom-* or /draft-revise skills —
substrate plumbing dispatches directly to the CLIs (see § Substrate
compositions).
Format reference: docs/LOOM-CONVENTIONS.md (plugin-relative
path; present in every install of the loom plugin).
Skill invocations like /guild-validate below mean
Skill(skill: <name>, args: "…"). CLI invocations like
loom phase update mean Bash("loom phase update <args>").
Antagonist evaluation runs through /guild-validate, which spawns
evaluator agents in parallel via /guild-spawn; the loop itself never
calls the Agent tool directly.
Before doing anything else, verify the substrate CLIs are on PATH.
The marketplace dependencies cascade handles install-time + enable-
time correctness; this skill-body check catches the runtime case
where a user disabled a dep plugin mid-session.
Run:
Bash("command -v loom guild griot >/dev/null 2>&1 || { echo 'ev-loop-confidence requires loom + guild + griot plugins on PATH. Enable them with: claude plugin enable loom@krambuhl guild@krambuhl griot@krambuhl' >&2; exit 1; }")
If exit code is non-zero, stop and surface the message to the operator verbatim — do not proceed with any other step.
Every substrate operation this loop performs dispatches directly to
bin/loom, bin/griot, or bin/guild — no ambient
skills, no trout scripts. The unit/tier loop steps below cite recipes
by name (e.g. "checkpoint per § Compose PR"). All § <Recipe>
references in this body resolve in docs/SUBSTRATE-COMPOSITIONS.md.
For loom verb shapes and event vocabulary, see
docs/LOOM-CONVENTIONS.md.
<project-slug-or-path> — resolved by loom's standard slug resolution
(full slug → date-less suffix → relative or absolute path).<phase-number> — which phase of the project to run. Must exist in
the manifest and not be in completed state.If <phase-number> is missing, already completed, or the slug does
not resolve, stop and ask the user rather than guessing.
This loop creates and uses a scope directory at
./projects/<slug>/<phase-slug>/ where <phase-slug> is a kebab-case
form of the phase name. Inside:
inventory.md — the full list of files in scope, generated in step 1Create this directory if it doesn't exist. Phase state lives in the
project's manifest.json (via § Phase update); tactical retros live
in the project's retros/ directory (via § Retro write). The scope
directory is now just the inventory home.
Every phase runs a multi-engineer design pass once before Step 0
(pre-flight). The whiteboard output becomes shared reference material
for every tier in the phase (cited in each tier contract's Inputs:
line). This step is always-on: the loop invokes
/guild-whiteboard at phase start regardless of explicit
configuration; an optional PLAN.md block overrides defaults.
Default behavior (no **Whiteboard**: block in PLAN.md):
engineers = all currently registered whiteboard-* agents,
resolved via glob of .claude/agents/whiteboard-*.md.topic = the phase name.rounds = 1.Override — optional PLAN.md block, placed immediately under the phase's prose paragraph:
**Whiteboard**: engineers=<comma-separated names>; topic=<one-line topic>; rounds=<N>
Any field in the block overrides the corresponding default. Partial blocks are allowed.
Whiteboard artifact path:
projects/<slug>/whiteboards/<phase-number>-<topic-slug>.md. Create
the parent directory if it doesn't exist.
Per-round invocation: for each round 1..N, invoke
/guild-whiteboard via the Skill tool with engineers=<list>,
brief=<topic + any phase context>, whiteboard=<path>. The skill
auto-detects round number from existing file state, so re-running is
idempotent (a re-invocation with the same whiteboard file detects
existing rounds and appends a NEW round). For round 2+, the skill
constructs per_agent_context from prior round state so engineers
can address contradictions.
Bootstrapping case (no engineers registered): if the
.claude/agents/whiteboard-*.md glob returns zero matches AND no
explicit engineers= override is given, log a one-line note ("no
whiteboard engineers registered — skipping whiteboard step") and
proceed directly to Step 0.
L-004 session-boundary: if any of the resolved whiteboard-*
engineers were authored in the current session, drop them from the
effective list manually and surface the override in the next tier's
first checkin Notes for the PR. The runtime registry is loaded once
per Claude Code process start; /clear is NOT a session boundary.
Before any work:
git status --porcelain). If not,
stop and ask the user to commit or stash.branch yet (first
tier in this phase), cut a fresh branch from updated main —
git checkout main && git pull --ff-only origin main && git checkout -b <branch-name> — using the naming convention from
docs/LOOM-CONVENTIONS.md § Branch naming:
<project-name>.<phase-lazy-name> (e.g.
loom-absorb-draft.phase-7-griot-writes). Otherwise confirm the
current branch matches the phase's recorded branch in the manifest;
if not, stop and ask whether to switch.config.json as a baseline.
Record exit status. A red baseline before any work means the loop
stops — you are not making a red build redder.Build inventory.md listing every file or item in scope for this phase.
The phase description in PLAN.md or config.json tells you the pattern
(e.g. "all .ts/.tsx files using ESLint disable comments"). Use git ls-files,
grep, find, or equivalent to enumerate. Include counts.
Format:
# Scope inventory — Phase <N> <name>
**Generated**: YYYY-MM-DD HH:MM
**Total items**: <count>
**Pattern**: <description of what makes an item in-scope>
## Items
- [ ] path/to/file-1.ts (tier: <tier>)
- [ ] path/to/file-2.tsx (tier: <tier>)
Tier assignment is a judgment call (see Tier assignment below). Assign tiers as you build the inventory, or leave them unassigned and prompt the user.
Do not begin transforms until the inventory is complete. Partial inventory = unknown blast radius.
Divide inventory items into tiers of increasing risk/complexity:
If tier 4+ items appear, consider routing them to /ev-loop-interactive
rather than handling here. Surface this to the user and ask.
For each tier, run a tier loop (see Tier-level process below). Between tiers, write a tactical retro and re-run pre-flight.
When all tiers in this phase are complete:
--status=completed.Each tier runs as a sequence of units. A unit is one batch of inventory items transformed together — sized so that one checkin covers it cleanly.
Size down if verification grows slow or evaluator flags pile up. Size up if you're burning checkins on trivial changes.
Before the first unit of a tier, write a tier contract as the first checkin of that tier. The Contract section includes tier-wide rules that every unit in the tier must satisfy:
## Contract
- **Goal**: apply <transform> to all Tier <N> items
- **Acceptance criteria**:
- Every item in this tier is updated
- `<verification command>` passes after each batch
- No unrelated files modified
- **Rules applied**:
- <style/lint rules>
- Verification: `<command>`
- **Disqualifiers**:
- Any regression in <area>
- Any file in scope left untouched
- **Inputs**: inventory.md Tier <N> items
Subsequent units inside the tier can reference the tier contract instead
of restating it, as long as the unit's checkin contract says
Rules applied: tier-<N> contract (see checkin <NN>) and lists only
unit-specific deltas.
For each unit inside a tier:
Negotiate. Compose a Checkin JSON with just the Contract substructure populated and write it per § Checkin write. Pick the items for this batch from inventory.md (mark them with a tier tag if not already). Execution / Verdict / Notes-for-PR substructures stay empty for now — they're filled in by a later checkin once the work resolves. (Loom checkins are immutable, so the "fill it in later" pattern is "write a new numbered checkin," not "edit the existing one." The negotiation checkin and the resolution checkin together tell the story.)
Execute. Do the transform on the batch. Keep to scope.
Evaluate. Invoke /guild-validate via the Skill tool to run
the antagonist panel against this unit. Compose the panel by
auto-derivation from the unit's file list (see § Panel
auto-derivation below) — the result is contextual to the artifact
rather than a fixed list. evaluator-contract-fit is always
included as the baseline. The spec (file-type → evaluator mapping,
precedence list, tokens-vs-naming boundary) lives in
docs/PANEL-COMPOSITION.md; the derivation logic is
§ Derive panel.
agents: comma-separated output of § Derive panel (paths
composed per § Panel auto-derivation below).packet: build a dense packet (see shape below). The substrate
default is dense — verbose packets correlate with budget-exhaustion
failures under evaluator-*'s maxTurns=5. Live examples in
PR #13's checkins 02-06.Dense packet shape (three sections, in this order):
## How to evaluate efficiently
You have a tight tool-use budget (maxTurns=5). Pre-computed
verification below is authoritative — do not re-run lint/build/
test/grep unless you find specific evidence the artifact summary
contradicts itself. Spot-check at most ONE or TWO criteria with
targeted reads, then emit `VERDICT:`. If you cannot reach a verdict
within budget, emit `VERDICT: flagged` with `parse-failure:
budget-exhausted` so the loop escalates rather than no-ops.
## Contract (paraphrased)
<Goal in 1-3 sentences. Acceptance criteria as a numbered list,
condensed (full text in <checkin path>). Disqualifiers as a
single-line summary. Inputs as a bulleted list of paths.>
## Artifact
**Files** (in scope for this batch): <bulleted paths>
**Pre-computed verification (authoritative — do not re-run)**:
- `npm run lint` → <result>
- `npm run build` → <result>
- `npm run test` → <result>
- <other verification: tier-specific checks, codemod diff samples, etc.>
**Direct mappings to acceptance criteria** (for spot-check
efficiency): <AC N → file:line ranges or section pointers>
**Iteration story** (if applicable): <prior panel runs and what
was addressed; helps the evaluator avoid re-flagging fixed issues>
## Original ask
<verbatim from PLAN.md phase description or the triggering message>
## Suggested spot-check (one tool use)
<the most efficient single read for confirming the most-suspicious
criterion; optional but reduces investigation thrashing>
Pass the contract as a paraphrased summary plus the checkin file path link, not verbatim — the checkin file is in the repo and renders one click away. The packet's job is orientation; the depth is one click away.
The skill returns a structured verdict (approved | flagged |
flagged-conflict) with blocking_findings, advisory_findings,
cli_runs, and conflicts lists. See
.claude/agents/evaluator-base.md for the per-evaluator verdict
shape that /guild-validate parses and aggregates.
Iterate or commit.
/guild-validate. Maximum 2 re-iterations per unit —
on the third flag, stop and escalate to the user.approved / Notes-for-PR populated. Check off the inventory
items.Scope-shift detection (restrictive default). Runs only on approved units (flagged-and-iterating units skip this step). Look for signals that PLAN.md is stale; offer a plan revision (per § Revise PLAN.md) ONLY on two-signal concurrence.
Signal sources:
Two-signal-concurrence rule: offer a plan revision only when 2+ signal sources fire for the same shift. Single signals get a note (see below); the loop does NOT interrupt.
Offer flow: surface a short paragraph naming the two signals
and a proposed one-line rationale. Use AskUserQuestion (or
natural-language confirm) for accept/decline/defer. Default: decline
(no interrupt unless the user explicitly accepts).
On accept: integrate the change per § Revise PLAN.md. After the revision lands, proceed to step 6 (Phase update). Do not re-execute the unit.
On single signal (no concurrence): append the signal to the
unit's notes_for_pr array in the checkin JSON:
signal: <signal type>: <one-line description> (single signal; no revise offered)
Loop continues normally.
On zero signals: no action. Loop continues.
Phase update. After a checkin lands, the checkin-created event
auto-fires from § Checkin write. Then update phase state per
§ Phase update with --status=in-progress --branch=<branch> (no
need to set --pr here; the PR reference lives on the phase only
after § Compose PR runs).
Checkpoint? Evaluate the should-checkpoint policy (below). If any condition holds, refresh the PR per § Compose PR so it tracks the latest state. Otherwise continue to the next unit.
The agents list passed to /guild-validate is computed from the
unit's file list at evaluation time, not hardcoded. The composition
rules (file-type → evaluator mapping, precedence ordering, conflict
policy) live in docs/PANEL-COMPOSITION.md and are the
source of truth.
git status --short minus
deletions and substrate carryovers, plus any freshly-authored
untracked paths.agents= argument
verbatim./guild-validate. Use the verb's stdout as the
agents= argument verbatim. Confidence-loop tiers tend to touch
a single file family (a codemod over .module.css, a rename over
.tsx imports, etc.), so the derived panel is typically narrower
than the interactive loop's — common case is contract-fit plus
one or two domain lenses.Edge cases follow the same shape as /ev-loop-interactive's § Panel
auto-derivation:
agents=
manually and note the override in the checkin's Notes section.When a tier's panel includes a specialist evaluator paired with
a generator-* agent (e.g. evaluator-css-architecture paired
with generator-css-codemod), the specialist runs as part of the
parallel panel with elevated precedence per PANEL-COMPOSITION.
No control-flow change is needed.
Fail-fast on specialist rejection: when the aggregated tier
verdict shows a specialist's finding in blocking_findings, treat
that as a strong fail-fast signal — surface the specialist's
remedy more prominently in the tier retro and consider whether the
tier's overall approach needs reshaping. The loop's verdict-
handling already does the right thing structurally (any blocking
finding → flagged); this section documents the why for tier
retros and future per-specialist tier budgets.
Checkpoint (refresh the PR per § Compose PR) when any of the following hold. All are read off state — there is no callable function.
Do not checkpoint mid-tier unless verification is green.
Immediately after the last unit of a tier is approved, before moving to the next tier:
items_processed, units,
verification_at_close, what_went_smoothly, what_bit_us,
adjustment_for_next) and write per § Retro write with
--type=session --phase=<N> --tier=<M>.Tactical retros are short and specific. Strategic retrospection happens
at /loom-archive, not here.
Before starting tier N+1, the gate closes if:
A closed gate stops the loop and reports to the user. The user decides whether to resume, re-tier, or bail.
If the router passes a message like "address feedback on #14", this loop:
blocker comment as a new unit in the current tier (or
a new tier if the feedback rewrites scope).execution.corrections[] array. The session handoff
(§ Save session) surfaces unresolved corrections into open_threads;
§ Capture finding (from-checkin pathway) promotes notable ones into
learnings/session-notes/ at session close, and /griot-compact
decides which get promoted further. The loop itself never writes
to learnings/.On any termination — phase close, closed gate, or escalation — return:
<N> <name>1: 3 files, 2: 7 files)<NN><url> or noneCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub krambuhl/agents --plugin ev