Skill

pre-land-refuters

Runs fresh-context refuters to challenge a completion claim at shared-trunk pawls (mutate/delete/spend etc.) before landing. Catches misses that self-review overlooks.

developer-tools

Popularity

Stars

392

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/agentops:pre-land-refuters

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

> **Loop position:** move 6 (prove acceptance) of the [operating loop](../../docs/architecture/operating-loop.md) — the shared-trunk pawl: fresh-context refuters attack the completion claim before landing.

SKILL.md

225 lines · ~3.3k tokens

Stats

LanguageGo

Stars392

Forks40

MaintenanceExcellent

Last CommitJun 18, 2026

Actions

View Source View Plugin View on GitHub View README

/pre-land-refuters — unbiased dual-model validation before landing

Loop position: move 6 (prove acceptance) of the operating loop — the shared-trunk pawl: fresh-context refuters attack the completion claim before landing.

Proven in the ag-s43tg prune landing (2026-06-12): the refuter panel caught 9 real misses self-review passed over — a silently-failed edit, a CI-breaking test, stale image manifests, gate-weakening test retirements, and an upstream delete/modify conflict. Self-review is biased toward "looks good"; refuters are prompted to win by finding what's wrong.

When to fire

Fire at a pawl — a one-way door on the canonical static list (docs/contracts/pawls.md): mutate shared trunk (push/merge to main or rewrite a shared ref), delete, external-send / shared-state mutation, schema/contract change, credential/authority change, spend. The pawl is the only place the refuter panel runs. This is the ratchet's Filter: gate at the irreversible door, nowhere else. (pawls.md is the source of truth — if it changes, this list follows it.)

NOT on a tread. Routine edits, builds, tests, drafts, intermediate RPI slices, mock→real swaps, throwaway experiments — all run as chaos, ungated. The panel costs two agent runs; spend it at the door, never per-step. A pawl on every step is waterfall (validate every tread) — exactly the thing the ratchet exists to avoid. Check the action against the pawl list (a lookup); if it isn't there, just run it.

Constraints

Pin acceptance BEFORE the work. The claim under test must be mechanical: grep-able fixtures (pinned phrases, counts, ledger states) frozen before implementation, not chosen post-hoc. No pins → write them first.
Refuters are read-only and stake-free. Fresh context, no session history, no authorship of the change. Prompt them to REFUTE, default to skepticism.
Diversity is mode-based; the default fires ONE fresh-context refuter. The default mode is fresh-context: a single refuter in a separate invocation (its context_id != the author's author_context_id) — no shared accumulated context, model-agnostic (same model in a fresh context is fine). A fresh-context reviewer catches the author's tunnel-vision / accumulated-context errors — the dominant landing failure. Opt a pawl up to multi-model (the cross-family panel: one Fable/Claude subagent + one codex exec --sandbox read-only validator, ≥2 distinct families) only for the highest-irreversibility doors (shared-ref rewrite, schema/contract change), where a model's systematic blind spot would be catastrophic. Mode is per-pawl and operator-tunable — see docs/contracts/pawls.md "Diversity mode".
Findings are fixed forward, never disarmed. A refuted contract test gets an honest repoint to the surviving surface or a real fix — not deletion.
Orchestrator stays the single writer. Refuters report; only the orchestrator edits. Run the panel concurrently with the final full gate.
Re-verify pins on the landed tree after merge/push, not just pre-commit.

Workflow

Freeze the claim. State it in one sentence with mechanical acceptance (e.g. "all N pinned phrases grep green; ledger has N terminal rows; staged set is one revert unit").
Dispatch the fresh-context refuter (background subagent, fresh context — a separate invocation with no shared session history; record its context_id): verify counts, sweep every pinned fixture, audit the ledger, hunt stragglers referencing removed paths, spot-check routing, check revert-unit coherence and upstream drift (git fetch + behind-count). Output: VERDICT CONFIRMED/REFUTED + numbered findings with evidence. In the default fresh-context mode this ONE fresh refuter satisfies the diversity floor (it need not be a different model family).
multi-model mode only — also dispatch the codex refuter (codex exec --sandbox read-only -C <repo>): for pawls opted up to multi-model, add a second, different-family refuter — focus on judgment-sensitive edits: for each contract-test/canary/validator change in the diff, judge honest repoint vs gate-weakening. Same verdict shape. (Skip in the default mode; spend the second family only at the highest-irreversibility doors.)
Run the full local gate concurrently (it is the third, mechanical refuter).
Triage findings: fix each forward; classify pre-existing vs introduced; re-run only the affected validators.
Write the machine-checkable verdict, THEN land. Before the merge/push, record the panel result as the pawl verdict the merge path enforces against:
```
head_sha="$(gh pr view <pr> --json headRefOid -q .headRefOid)"
# DEFAULT fresh-context mode: one fresh-context refuter (model-agnostic).
# --author-context is the AUTHORING session id; each refuter token is
# family:verdict:context_id[:evidence] — the refuter's context_id must DIFFER
# from --author-context to count as a fresh red-team.
scripts/pawl-verdict.sh write <bead> <pr> \
  --disposition CONFIRMED \
  --head "$head_sha" \
  --author-context "$AUTHOR_SESSION_ID" \
  --refuter claude:CONFIRMED:"$REFUTER_SESSION_ID":.agents/council/$(date +%F)-pre-land-<slug>-claude.md \
  --council .agents/council/$(date +%F)-pre-land-<slug>.md
# OPT-IN multi-model mode (highest-irreversibility doors): add --mode
# multi-model and a second, DIFFERENT-FAMILY refuter:
#   --mode multi-model \
#   --refuter codex:CONFIRMED:"$CODEX_SESSION_ID":.agents/council/$(date +%F)-pre-land-<slug>-codex.md
```
The verdict is EVIDENCE-BOUND, COMMIT-BOUND, and CONTEXT-BOUND: --head pins it to the commit the panel actually reviewed (a new push makes it STALE and the gate fail-closes); each --refuter family:verdict:context_id[:evidence] carries a context_id (the default fresh-context mode requires ≥1 refuter whose context_id != --author-context) and must point at a real, non-empty reviewer-run transcript (or supply --council as the shared evidence anchor). check refuses a verdict with no reviewer evidence, or one whose only refuter ran in the author's own context — a self-asserted stamp is not a review. (disposition REFUTED on any refuted refuter — the loop auto-redoes on REFUTED, no human; ESCALATE/HOLD only when a circuit breaker trips — those make the merge path HOLD, exit 5.) scripts/reconcile-pr.sh reads this with scripts/pawl-verdict.sh check <bead> <pr> and refuses to merge without a CONFIRMED, this-bead+PR verdict that meets the pawl's diversity mode — green CI alone never authorizes the door. Then land (commit → merge upstream if it moved → gate → push), re-run the pinned sweep on the landed tree, and write the free-form narrative in .agents/council/YYYY-MM-DD-pre-land-<slug>.md (the human-readable companion to the checkable verdict).

Escalation — the circuit-breaker model (auto-redo by default)

The panel runs autonomously: model reviews model. The human is NOT a checkpoint at the pawl by default — they are the exception a circuit breaker trips into. See docs/contracts/pawls.md "Escalation — the circuit-breaker model".

Every refuter CONFIRMED + diversity floor met (+ green gate) → land. No human.
Any REFUTED → AUTO-REJECT → AUTO-REDO (the default, no human). A REFUTED verdict means the gate rejected; the orchestrator automatically fixes the findings forward and re-dispatches the panel. The loop redoes on REFUTED on its own — continuous self-correction, no human in the loop. A plain REFUTED is never an escalation.
ESCALATE to a human — ONLY when a tunable CIRCUIT BREAKER trips. The breakers are plural and operator-tunable, and are the same governor the autonomous loop already runs (the evolve circuit breakers: time-based "no productive work" + oscillation quarantine, scripts/evolve/halt-check.sh): max-attempts (N re-gate cycles still REFUTED, default 3, tunable) · time budget (wall-clock with no forward progress) · cost / quota budget · oscillation / no-forward-progress (the same failure repeating; covers reviewer deadlock) · an explicit judgment flag a reviewer raises (value / irreversibility). This is the andon ("Hey! Listen!") — rare, earned, never the default.

REFUTED → auto-redo (loop). Breaker-trip → HOLD/escalate. Set the verdict disposition accordingly: a plain REFUTED carries REFUTED (the loop re-works); flip it to ESCALATE / HOLD only when a breaker trips, and do not land — a breaker-tripped pawl is never auto-merged. The enforcing merge path (scripts/reconcile-pr.sh → scripts/pawl-verdict.sh check) exits 5 (HOLD: no merge, no close) on any disposition that is not CONFIRMED (so a bare REFUTED also correctly refuses the merge while the loop redoes). Only all-refuters-CONFIRMED, the pawl's diversity mode met (default fresh-context: ≥1 refuter whose context_id != author_context_id; opt-in multi-model: ≥2 distinct canonical families), real non-empty reviewer evidence, and head_sha == the PR's current head, tied to this bead+PR, opens the door (fail-closed by construction).

Even fully unattended, the gate fires at every pawl and auto-redoes on REFUTED. Human escalation is the exception a circuit breaker trips into, not the gate.

Scope note. This verdict is an evidence-bound, commit-bound verdict that requires real reviewer runs (fresh-context default; multi-model opt-in) — it defends against a sloppy agent self-stamping CONFIRMED, NOT a hostile forger. No signatures / peercred / OS writer-separation; cryptographic un-forgeability is intentionally out of scope (single-operator trusted loop — the cut cathedral).

Output Specification

Format: a council artifact at .agents/council/YYYY-MM-DD-pre-land-<slug>.md containing: the frozen claim, every refuter's verdict (verbatim findings) — the fresh-context default fires ≥1 refuter; multi-model opt-in fires ≥2 across distinct families — the fix-forward disposition per finding, and the post-land pin re-verification.

Quality Rubric

Claim frozen with mechanical acceptance before refuters dispatched
Two model families, both read-only, both prompted to refute
Every REFUTED finding has a fix-forward disposition (none ignored, none disarmed)
Pins re-verified green on the landed tree
Council artifact persisted

Examples

User says: "land this prune, don't cut corners" Do: freeze the pinned-manifest claim → dispatch Fable refuter (Agent tool, fresh context) + codex refuter (codex exec --sandbox read-only "...judge each contract-test edit: honest repoint vs gate-weakening...") + full gate, all in parallel → fix findings forward → land → re-sweep pins.

Troubleshooting

Problem	Cause	Solution
Refuter says CONFIRMED instantly	Prompt lacked mechanical checks	Re-dispatch with explicit per-fixture commands; "try to refute" + checklist
Findings contradict each other	Different scopes	Triage per finding with evidence; the diff is the arbiter
Panel too slow	Run was serial	Dispatch all refuters + gate concurrently; they are read-only

pre-land-refuters

Popularity

Invocation

Context Preview

SKILL.md

pre-land-refuters

Popularity

Invocation

Context Preview

SKILL.md

/pre-land-refuters — unbiased dual-model validation before landing

When to fire

Constraints

Workflow

Escalation — the circuit-breaker model (auto-redo by default)

Output Specification

Quality Rubric

Examples

Troubleshooting

See Also

Similar Skills

/pre-land-refuters — unbiased dual-model validation before landing

When to fire

Constraints

Workflow

Escalation — the circuit-breaker model (auto-redo by default)

Output Specification

Quality Rubric

Examples

Troubleshooting

See Also

Similar Skills