From agentops
Validates artifacts, plans, code, PRs, or gates with PASS/WARN/FAIL verdicts, including pre-commit sanity checks and completion audits.
How this skill is triggered — by the user, by Claude, or both
Slash command
/agentops:validateThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **Loop position:** move 6 (prove acceptance) of the [operating loop](../../docs/architecture/operating-loop.md) — the driving adapter for the `validate_acceptance` port: every Given/When/Then must map to a passing test before a bead closes.
references/complexity-analysis.mdreferences/deep-audit-protocol.mdreferences/deep-checks.mdreferences/examples.mdreferences/go-patterns.mdreferences/go-standards.mdreferences/json-standards.mdreferences/markdown-standards.mdreferences/patterns.mdreferences/post-verdict-actions.mdreferences/python-standards.mdreferences/quick-mode-vibe.mdreferences/report-format.mdreferences/rust-standards.mdreferences/shell-standards.mdreferences/test-pyramid-inventory.mdreferences/test-pyramid-weighting.mdreferences/typescript-standards.mdreferences/validate.featurereferences/verification-report.mdLoop position: move 6 (prove acceptance) of the operating loop — the driving adapter for the
validate_acceptanceport: every Given/When/Then must map to a passing test before a bead closes.
Role: validator. Input = artifact (plan, spec, code, PR, fitness gate). Output =
verdict.v1(PASS / WARN / FAIL with rationale + findings).
Status (2026-05-08): introduced ADDITIVE in Phase 1 (m6v5.D.1 / soc-78s2v). Existing validators (councilvibe/pre-mortem/red-team/review/eval-outcomes plus retired pr-validate and validation lanes) stayed until Phase 2 shim conversion (m6v5.D.2). Fix-C smoke (
soc-wb2aa) gates Phase 2.
/validate is a driving adapter for the validate_acceptance port in the
Intent-to-Loop Hexagon.
When the artifact contains a hexagon: block, preserve the bounded context,
context packet, guard adapters, and done state in the verdict.
When the artifact claims DONE/closed/green, apply the
Completion-Claim Kernel
before returning PASS.
A verdict is re-plan evidence, not just a retry trigger. Under /rpi, a
FAIL/WARN (and its findings) surfaces UP to the orchestrator's
Agile Re-Plan Loop:
the remaining waves may be refactored, inserted, dropped, or reordered in
response — not only the failed objective re-cranked. Under --auto that pivot is
autonomous. Looping a failed objective forever without asking whether the plan
should change is the waterfall anti-pattern.
| Mode | Purpose | Replaces (post-Phase 2) |
|---|---|---|
| (default) | 2-judge multi-judge consensus on any artifact | /council default |
--quick | Inline single-agent structured review | /council --quick |
--deep | 4-judge thorough review | /council --deep |
--mixed | Cross-vendor (Claude + Codex), N×2 judges | /council --mixed |
--debate | Adversarial 2-round refinement | /council --debate, /red-team |
--mode=post-impl | Code-readiness pipeline (complexity → bug-hunt → council) | vibe |
--mode=pre-impl [--target=X] | Plan/spec validation; target ∈ {scenario,fitness,ratchet,scope,skill,health} | /pre-mortem, /eval-outcomes, /goals measure, /flywheel, /scope, /skill-auditor, ao doctor |
--mode=pr | PR-shape verdict (diff review + acceptance check) | /review |
Mode-budget assertion: 8 modes. Adding a 9th requires demoting an existing one OR refusing the addition (per Fix-F § continuous CI gate).
validation + pr-validate retired into these modesThe retired validation and pr-validate lanes were the Phase-1 placeholders for --mode=post-impl
and --mode=pr; both are now retired (cp-ki8) and their load-bearing contract folded
here so no capability is lost:
--mode=post-impl (was the validation lane) — full close-out + no-self-grading invariant.
Beyond the inline complexity → bug-hunt → council pipeline, this mode owns the
validate_acceptance port: every Given/When/Then from the intent issue must map to a
passing test (criterion→test roll-up; activity logs do not close beads), and the
acceptance verdict must be produced by a blind, context-isolated sub-agent judge that
did not author the code (author ≠ validator — ag-9jle.5 / ag-lmdx.4). Refuse to
certify acceptance when judge_id == author_id; the only escape is an inline-fallback
self-grade that is stamped as waived, not independently validated. Apply the
Completion-Claim Kernel
before accepting any DONE/closed/green claim. For epic-scope close-out this mode may
delegate to vibe, /post-mortem, and /forge rather than inlining them.--mode=pr (was the pr-validate lane) — submission-readiness checks. In addition to the
diff/acceptance verdict, run, in order: (1) upstream alignment FIRST (BLOCKING —
git rev-list --count HEAD..origin/main; fail if many commits behind or merge would
conflict), (2) CONTRIBUTING.md compliance (BLOCKING), (3) isolation — single commit
type + thematic files + atomic scope, (4) scope-creep containment, (5) quality gate
(tests/lint, non-blocking). On FAIL, emit remediation steps (split-by-type cherry-pick,
rebase-on-upstream) so the verdict is actionable.vibe + bead-completion-audit route herevibe → --mode=post-impl. Use when doing a quick readiness or sanity check
that code is ready to commit or ship, short of a full review — the post-impl
pipeline (complexity → bug-hunt → council) is the vibe check.bead-completion-audit → --mode=post-impl close-out. Use when
auditing closed beads for real shipped evidence, acceptance proof, and truthful
closeout — the Completion-Claim Kernel and the no-self-grading invariant above
own this audit./validate path/to/plan.md # default 2-judge consensus
/validate --quick path/to/plan.md # inline single-agent
/validate --deep path/to/spec.md # 4-judge thorough
/validate --mode=pre-impl path/to/plan.md # pre-mortem mode
/validate --mode=post-impl recent # vibe mode (post-implement)
/validate --mode=pr 123 # PR review by PR number
/validate --mode=pre-impl --target=fitness # fitness gate against GOALS.md
Default uses runtime-native subagent spawning. Falls back to --quick (inline) when no multi-agent capability detected.
Parse --mode and --target. Default mode is multi-judge. Validate combinations:
| Mode | Allowed --target |
|---|---|
| default, --quick, --deep, --mixed, --debate | n/a |
| --mode=post-impl | n/a (pipeline scope is recent code changes) |
| --mode=pre-impl | scenario, fitness, ratchet, scope, skill, health (default: pre-mortem on plan) |
| --mode=pr | n/a (PR ID/path is positional) |
Reject invalid combinations (e.g., --mode=pr --target=fitness).
# resolve artifact:
ARTIFACT="${1:-recent}" # path, PR ID, or "recent"
# load FAIL patterns:
# (folded into skill body; not a separate hook)
For --mode=pre-impl, also load:
.agents/planning-rules/*.md (compiled planning rules).agents/findings/registry.jsonl (active findings).agents/pre-mortem-checks/*.md (compiled prevention)For --mode=post-impl, run pre-checks:
/review skill needed)For --mode=pr, fetch the PR diff (gh pr diff <id> or path).
spawn_agent available → Codex sub-agentTeamCreate available → Claude native teamtask (read-only skill tool, OpenCode) → opencode subagent--quick (inline single-agent)Log selected backend in the verdict frontmatter.
| Mode | Judges | Perspectives |
|---|---|---|
| default | 2 | independent (no labeled perspectives) |
| --deep | 4 | missing-requirements, feasibility, scope, spec-completeness |
| --mixed | 2N (default N=3) | same N perspectives across Claude + Codex |
| --debate | 2+ rounds | adversarial; 2 rounds with critique-rebuttal |
| --quick | 0 (inline self) | structured review |
| --mode=post-impl | 2 + pipeline | complexity → bug-hunt → 2-judge council |
| --mode=pre-impl | 2-4 | per target preset |
| --mode=pr | 2 | diff-review + acceptance-check |
Each judge gets:
For --mode=pre-impl --target=plan:
For --mode=post-impl:
For --mode=pre-impl --target=fitness:
Each judge returns a per-judge result. Consolidate:
Output path: .agents/council/YYYY-MM-DD-validate-<topic-slug>.md
---
id: validate-YYYY-MM-DD-<slug>
type: verdict
date: YYYY-MM-DD
mode: <mode>
target: <target or n/a>
artifact: <path>
backend: <codex-subagents | claude-teams | opencode | inline>
---
# Validate Verdict — <topic>
## Council Verdict: PASS / WARN / FAIL
| Failure mode | Risk | Severity | Addressed? |
|---|---|---|---|
| ... | ... | ... | ... |
## Pseudocode Fixes (when WARN/FAIL)
(copy-pastable into affected issues per pre-mortem 4.6 contract)
## FAIL Pattern Check
(top 8 patterns — status per pattern)
## Verdict
PASS — proceed
WARN — review concerns, accept risk, or apply fixes
FAIL — block; revise artifact and rerun
The exact heading ## Council Verdict: PASS / WARN / FAIL is mandatory — ao rpi phased (when present) parses with anchored regex.
For --mode=pre-impl reusable findings: append to .agents/findings/registry.jsonl (atomic temp+rename).
--target | What gets graded | Replaces |
|---|---|---|
| (default) | Plan/spec for an upcoming /implement | /pre-mortem |
| scenario | Holdout scenario gate | /eval-outcomes |
| fitness | GOALS.md fitness gates | /goals measure, ao goals measure |
| ratchet | Brownian Ratchet checkpoint | /flywheel, ao ratchet status |
| scope | Frozen-dirs declaration | /scope |
| skill | SKILL.md hygiene + audit | /skill-auditor, /heal-skill (audit half) |
| health | Repo health probe | ao doctor |
Each target has its own inline check rubric until Phase 2 extraction.
VERDICT: PASS
(blank line)
COMMANDS RUN:
<actual commands + verbatim output snippets>
REASONS:
- bullet citing a COMMANDS RUN line
A verdict with no COMMANDS RUN: section is unverified — reject it and
dispatch a fresh validator. A verdict whose COMMANDS RUN: lists only commands
the author ran (not the judge) is a counterfeit judge — treat as FAIL and
re-route to a genuinely independent validator. No ## headings or parentheticals
on VERDICT: or COMMANDS RUN: lines; the gate parses them anchored.
For assurance closes (the control-plane verdict-gate, cp-icb6), the floor is
≥2 verdicts from ≥2 distinct model families, author family excluded, fail-closed.
This skill supports that policy via --mixed mode and the verdict form above;
the policy itself lives in the gate, not here. A same-model council is valid for
non-assurance decisions (design brainstorms, quick checks) — do not refuse those.
Tier mapping:
The A7 ruling (2026-06-09, memory validation-family-policy-risk-tiered):
Gemini is currently benched for STRICT validation — use Codex + Fable for A1/A2
tiers. Gemini may return for STRICT when Bo graduates it from the bench. Do not
present Gemini paths as live for tier A1/A2 until then.
When two implementations of the same intent exist, do NOT award based on authorship or surface aesthetics. Run both on a differentiating fixture (an input that exposes their behavioral difference), record the outputs verbatim, and graft the loser's unique assets onto the winner. "My worker wrote it" is not evidence.
Before dispatching a validator, register intent on the bead graph (update status, assign actor). Two parallel validators on the same bead produce a dedup incident, not a cross-family quorum. Check for an existing actor before spawning.
A judge re-runs the cited commands on the actual artifacts. It does not read the
author's evidence file and agree. Attest judge_source: <model> inside COMMANDS RUN
so the gate can confirm the judge identity. A judge that ran nothing is a reader, not
a verifier — discard its verdict.
A worker's evidence file may only contain numbers and outputs that were captured — pasted verbatim from a command's output — never reconstructed from memory. The canonical failure: "36 checks — 35 pass" stated with confidence was inference; the measured reality was 36 run / 34 pass / 1 fail / 1 skip, on a different commit.
/implement for fixes).## Council Verdict: ... text format.skills/rpi/SKILL.md — orchestrator that fires /validate --mode=pre-impl after /planskills/curate/SKILL.md — miner role (paired canonical skill)schemas/verdict.v1.schema.json — output contractpre-land-refuters — the mutate-shared-trunk pawl: a self-administered DONE is a claim, so an unbiased fresh-context refuter (model-agnostic by default; multi-model ≥2 families opt-in for the highest-irreversibility doors) attacks it at the push boundary regardless of complexity (complexity scales the panel's depth, never exempts the gate); its CONFIRMED verdict is enforced executably by scripts/reconcile-pr.shnpx claudepluginhub boshu2/agentops --plugin agentopsRuns multi-agent verification loop post-implementation, dispatching specialized agents for review with autonomous subagent fixes and retries until unanimous approval.
Validates AI agent claims against evidence trail in coding workflows. Catches unsubstantiated 'done', 'tests pass', 'fixed' without proof like outputs, diffs, or logs. Auto-triggers on completion keywords.
Executes the autonomous verification suite (agent-validate) on explicit requests like 'run the validator' or 'validate before commit/push/PR'. Handles status checking, failure extraction, and retry logic.