From harness-ops
Run a task as a supervised verification loop instead of a one-shot prompt. Establishes a loop contract (3 gates: Pass/Fail, Quantitative, Qualitative), then iterates Work → Verify → Fix until every gate passes — emitting an objective evidence report before declaring done. Stops and escalates to a human when an autonomy boundary is crossed (schema change, data-loss migration, auth/payment/security, or a change that conflicts with the spec). Implements the "Ralph loop" technique — the iterative-refinement pattern that agent-orchestrate selects as its Loop pattern. Use when: "/loop", "loop", "run until it passes", "iterate until tests pass", "supervise this until done", "loop.md", "verification loop", "ralph loop", "don't stop until the gates pass", "keep going until criteria met".
How this skill is triggered — by the user, by Claude, or both
Slash command
/harness-ops:loop [task or goal][task or goal]This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Run a task as a **loop**, not a prompt.
Run a task as a loop, not a prompt.
A prompt says "do X" once and trusts the reply. A loop says "do X, then prove it passes the gates; if it doesn't, fix it and re-verify — repeat until the gates pass or you hit a boundary that requires a human." You stop being the remote control issuing "fix this / change that" and become the supervisor who defined what "done" means up front.
The model marks its own homework generously, so this skill never trusts a bare "done." It forces an evidence report — gate results, numbers, and, for every subjective judgement, a score plus objective grounds plus a corrective action.
loop.md)Every loop runs against a contract with three gate types. Before any work starts, this contract must exist and be approved.
| Gate | Question | Rule |
|---|---|---|
| 1. Pass/Fail | Does it build / typecheck / lint / test? | 100% required. Binary. One failure = not done. |
| 2. Quantitative | Do the numbers clear the thresholds? | Each metric has a stated threshold. Below threshold = not done. |
| 3. Qualitative | Is the design/flow actually good? | Score + objective grounds + corrective action. A bare high score is rejected. |
Concrete commands whose exit code decides the gate. Typical members: build succeeds, type check clean, linter clean, test suite green. If a command does not exist for this project, say so — do not silently skip it.
Measured values, each with a threshold agreed in the contract. Examples: test coverage ≥ N%, p95 latency ≤ N ms, error-log rate ≤ N, bundle size ≤ N. Report the measured number next to its threshold every iteration.
Things that need judgement: architecture fit, naming clarity, naturalness of the user flow, spec alignment. The model inflates its own scores, so each qualitative item MUST be written as:
<item>: <score>/10
grounds: <specific, checkable observation — file/line, a measured fact, a comparison>
action: <the concrete change that would raise it, or "none — meets bar"
and why no change is needed>
A score with no grounds and no action is invalid and the gate fails.
Inside the loop the model fixes things on its own. But unbounded autonomy lets it "improve" its way into wrecking the design or doing something irreversible. So the loop has a hard fence.
When a boundary is hit: stop the loop, write what you found, why it crossed the fence, and the options — then ask via AskUserQuestion. Never push through it.
loop.md. Look for an existing loop.md (repo root, the
spec/feature dir, or a path the user named). If one exists, read it and use
its gates. If not, derive a draft from the task + project:
package.json scripts,
Makefile, pyproject.toml, CI config). Use references/loop-template.md
as the skeleton.loop.md.If the task is trivial and the user just wants it run, you may present a minimal contract (Gate 1 only) and proceed on approval — but always state the gates.
Repeat until exit (all gates pass) or escalate (boundary hit):
Phase 1 WORK
Do the next increment of the task.
Stay strictly inside the Auto-fix list. The moment the work requires
something on the STOP list → jump to ESCALATE.
Phase 2 VERIFY (run the gates, top to bottom)
Gate 1: run each Pass/Fail command, record exit status.
Gate 2: measure each metric, record value vs threshold.
Gate 3: score each item with grounds + action.
Phase 3 DECIDE
IF every gate passes → EXIT → emit Evidence Report (done)
ELIF the fix is Auto-fix → apply it, loop back to Phase 1
ELIF boundary hit → ESCALATE (stop, ask the human)
ELSE (can't fix within bounds, or no progress two iterations running)
→ ESCALATE with the blocker
Anti-spin rule: if an iteration makes no gate go from fail→pass, do not loop again blindly — report the stuck gate and escalate. Loops fix; they don't thrash.
A loop never ends with "done." It ends with proof. Emit:
## Loop Report — <task>
**Verdict:** ✅ all gates passed | 🛑 escalated: <reason>
**Iterations:** <n>
### Gate 1 — Pass/Fail
- build: ✅ / ❌ (command)
- typecheck: ✅ / ❌
- lint: ✅ / ❌
- tests: ✅ / ❌ (<passed>/<total>)
### Gate 2 — Quantitative
| metric | measured | threshold | ok? |
|--------|----------|-----------|-----|
| ... | ... | ... | ✅/❌ |
### Gate 3 — Qualitative
- <item>: <score>/10 — grounds: <...> — action: <... | none>
### Boundary log
- <any STOP-list item encountered and how it was handled>
If escalating, the report ends at the boundary with the question for the human — do not fabricate passing gates to close the loop.
loop.md is the source of truth — persist the approved contract so the loop is resumable and auditable.npx claudepluginhub hyunho058/harness-ops --plugin harness-opsGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.