From qe-framework
Spawns adversarial sub-agents to stress-test specs, implementations, and merge readiness at SIVS stages. Produces structured PASS/WARN/FAIL verdicts with cross-model mode for high-stakes reviews.
How this skill is triggered — by the user, by Claude, or both
Slash command
/qe-framework:Qcritical-reviewThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Stress-tests artifacts at each SIVS stage through adversarial sub-agents. Produces a structured PASS/WARN/FAIL verdict. Designed to be called standalone or auto-invoked by other SIVS skills.
Stress-tests artifacts at each SIVS stage through adversarial sub-agents. Produces a structured PASS/WARN/FAIL verdict. Designed to be called standalone or auto-invoked by other SIVS skills.
/Qcritical-review --stage spec # Review a spec document
/Qcritical-review --stage verify # Review an implementation
/Qcritical-review --stage supervise # Review merge readiness
/Qcritical-review --mode cross-model # Use both Claude + Codex as reviewers
/Qcritical-review --stage verify --mode cross-model # Combine stage + mode
/Qcritical-review <file> # Auto-detect stage from file type
/Qcritical-review # Auto-detect from recent SIVS context
| Mode | Agents | When to Use |
|---|---|---|
claude-only (default) | 3 Claude sub-agents | Fast, low-cost reviews |
cross-model | 2 Claude + 1 Codex | High-stakes reviews needing independent model perspectives |
In cross-model mode, the most adversarial agent per stage is routed to Codex:
This ensures the strongest critic uses a genuinely independent model, eliminating same-model confirmation bias.
Each review session runs a structured sequence of up to 9 steps drawn from the OMC critic protocol. Not every SIVS stage runs all 9 steps — see the stage mapping column in the Stage Detection table below.
| # | Step | Summary |
|---|---|---|
| 1 | Pre-commitment Prediction | Before reading, commit to 3–5 predicted problem areas |
| 2 | Multi-perspective Review | Examine through SE / Junior / Ops lenses in parallel |
| 3 | Pre-Mortem | Generate 5–7 failure scenarios assuming exact execution |
| 4 | Ambiguity Scan | Identify steps with two valid but conflicting interpretations |
| 5 | Devil's Advocate | Argue the implementation is wrong; hunt for crashes and silent failures |
| 6 | Self-audit | Re-examine each CRITICAL/MAJOR finding for confidence and bias |
| 7 | Realist Check | Pressure-test severity against realistic worst-case and mitigations |
| 8 | Adversarial Escalation | Trigger max-adversarial mode on CRITICAL findings or 3+ MAJOR |
| 9 | Explicit Gap Analysis | Catalog what is missing — requirements, assumptions, omitted context |
Full definitions (trigger conditions, output schemas, examples): ./reference/nine-step-protocol.md
| Stage | Detected From | 9-Step Mapping |
|---|---|---|
spec | TASK_REQUEST*.md or spec file | Steps 1, 2, 4, 9 (Pre-commitment, Multi-perspective, Ambiguity Scan, Gap Analysis) |
verify | Source code or diff | Steps 3, 5, 6 (Pre-Mortem, Devil's Advocate, Self-audit) |
supervise | PR or merge context | Steps 7, 8 (Realist Check, Adversarial Escalation) |
Detection order:
.qe/state/unified-state.json for last SIVS stageAskUserQuestion| Stage | What to Read |
|---|---|
spec | TASK_REQUEST file, VERIFY_CHECKLIST, any referenced design docs |
verify | git diff of implementation, test results, checklist status |
supervise | Full PR diff (git diff main...HEAD), CI status, review comments |
Spawn 3 sub-agents in parallel via the Agent tool. Each adopts a distinct critical lens. Agents must NOT see each other's output.
The Spec stage runs two cognitive modes — Structural (구조적 사고) and Critical (비판적 사고) — plus a boundary-focused finder. Full mode definitions (posture, key questions, adversarial instruction, must-nots) live in ./reference/thinking-modes.md. These agents implement the mandatory Spec self-reference gate — see ./reference/spec-gate-protocol.md.
| Agent | Mode | Role | Key Questions |
|---|---|---|---|
| Structural Reviewer | Structural | Stress-test the spec's structure for completeness & internal consistency | "Does every goal map to an item and vice versa? Any contradictory requirements? Dangling dependencies? Subjective/unverifiable items? Whole sub-problems missing?" |
| Critical Reviewer | Critical | Devil's advocate on the spec's substance | "What false assumption is this built on? What error case / production scenario is absent? Where will this spec lead the implementer wrong?" |
| Edge Case Finder | Critical (boundary) | Identify boundary conditions | "What happens at zero? At max? With concurrent access? With malformed input? With network failure?" |
The Critical Reviewer is the designated most-adversarial agent and is the one auto-upgraded to a cross-model engine when codex is reachable (see Engine Routing per Mode below).
Cognitive mode: Critical (비판적 사고) — see ./reference/thinking-modes.md Mode 2. These three agents implement the mandatory Verify gate — ./reference/verify-gate-protocol.md. Devil's Advocate is the cross-model-upgrade target.
| Agent | Role | Key Questions |
|---|---|---|
| Devil's Advocate | Argue the implementation is wrong | "Where does this break? What input crashes it? Which test is missing?" |
| Security Auditor | Find vulnerabilities | "Is there injection? Auth bypass? Data leak? OWASP Top 10 exposure?" |
| Performance Skeptic | Challenge efficiency | "What's the time complexity? Does it scale? Are there N+1 queries? Memory leaks?" |
Cognitive mode: Meticulous (꼼꼼한 사고) — see ./reference/thinking-modes.md Mode 3. These three agents implement the mandatory Supervise gate (runs only after binary Verify passes) — ./reference/supervise-gate-protocol.md. Merge Blocker is the cross-model-upgrade target.
| Agent | Role | Key Questions |
|---|---|---|
| Merge Blocker | Argue against merging | "What regression risk exists? Is test coverage sufficient? Are there unresolved TODOs?" |
| Merge Advocate | Argue for merging | "What's the cost of delay? Is the remaining risk acceptable? Does it meet the spec?" |
| Impartial Judge | Weigh both sides | "Which concerns are valid? Which are hypothetical? What's the actual risk level?" |
Each agent MUST return a structured analysis:
## [Agent Role]
### Findings
1. [Finding with severity: CRITICAL / HIGH / MEDIUM / LOW]
2. ...
### Evidence
- [Specific file:line or section reference for each finding]
### Verdict: [PASS | WARN | FAIL]
- FAIL: Found critical or high-severity issues that must be addressed
- WARN: Found medium issues worth discussing
- PASS: No significant concerns from this perspective
Collect all 3 agent reports and produce a unified verdict:
Critical Review Report
══════════════════════
Stage: [spec | verify | supervise]
Target: [artifact name/path]
┌─ Gap Hunter ─────────────── WARN ─┐
│ 2 medium findings │
│ - Missing error handling for X │
│ - No mention of concurrent access │
└────────────────────────────────────┘
┌─ Scope Critic ───────────── PASS ─┐
│ No significant concerns │
└────────────────────────────────────┘
┌─ Edge Case Finder ───────── FAIL ─┐
│ 1 critical finding │
│ - Division by zero when count = 0 │
└────────────────────────────────────┘
Overall: FAIL
Reason: 1 critical finding requires resolution before proceeding.
Action Items:
1. [CRITICAL] Handle division by zero in calculate_average()
2. [MEDIUM] Add error handling for timeout scenario
3. [MEDIUM] Document concurrent access behavior
| Condition | Overall Verdict |
|---|---|
| Any agent returns FAIL | FAIL |
| 2+ agents return WARN | WARN |
| 1 agent returns WARN, rest PASS | PASS (with notes) |
| All agents return PASS | PASS |
Display the full report, then ask:
claude-only (default):
subagent_type: "general-purpose"cross-model:
node -e "(async()=>{const {pathToFileURL}=await import('url');const {join}=await import('path');const base=process.env.CLAUDE_PLUGIN_ROOT||join(process.env.HOME||process.env.USERPROFILE||'','.claude');const m=await import(pathToFileURL(join(base,'scripts','lib','codex_bridge.mjs')).href);const r=await m.getCodexPluginInfo();console.log(JSON.stringify(r))})()"
installed: true: route the designated adversarial agent to Codex via subagent_type: "codex:codex-rescue"installed: false: fall back to claude-only mode with a notice| Stage | Codex Agent | Why This One |
|---|---|---|
spec | Critical Reviewer | The strongest spec critic should be a genuinely different engine |
verify | Devil's Advocate | The strongest critic should be a different model |
supervise | Merge Blocker | Merge opposition must be genuinely independent |
The remaining 2 agents always use Claude sub-agents.
The manual --mode cross-model above is opt-in. The mandatory Spec
self-reference gate (invoked by Qgenerate-spec Step 2.6) instead upgrades
automatically and with zero configuration (DECISION_LOG D012):
subagent_type: "general-purpose"). Fully functional with no codex
installed — independence comes from fresh context + adversarial role.getCodexPluginInfo() / isCodexReachable() from
scripts/lib/codex_bridge.mjs. If reachable, route the Critical Reviewer
to subagent_type: "codex:codex-rescue" for a truly independent engine.This makes the strongest critic genuinely independent when possible, while guaranteeing the gate always runs even in an all-Claude (or all-Codex) homogeneous setup — which is exactly the self-reference case this gate exists to defend.
The same automatic upgrade applies to the Verify gate (cross-model target = Devil's Advocate) and the Supervise gate (cross-model target = Merge Blocker).
Cross-model failure fallback (all gates): a best-effort upgrade must never block a mandatory gate or silently pass as if it were cross-model.
crossmodel=false + reason,
re-run that one agent on Claude (general-purpose), and mark the gate
result degraded → at least WARN (independence was reduced).reason=double-failure.When a change touches Qcritical-review or its reference/*-gate-protocol.md
files, the gate cannot trust its own (possibly-changed) behavior to review that
change — a self-reference within the self-reference defense. In that case the
review MUST run against the pre-change baseline of these files plus an
explicit diff inspection of the proposed change, rather than the in-tree
(modified) version. This prevents a broken gate edit from approving itself.
In cross-model mode, each agent box in the report shows the engine used:
┌─ Devil's Advocate [Codex] ─── FAIL ─┐
┌─ Security Auditor [Claude] ── WARN ─┐
┌─ Performance Skeptic [Claude]─ PASS ─┐
This skill is designed to be called by other SIVS skills:
| Caller Skill | When | Stage |
|---|---|---|
Qgenerate-spec / Qgs | After spec generation | spec |
Qcode-run-task | After verify loop passes | verify |
Esupervision-orchestrator | Before final verdict | supervise |
Callers invoke via: /Qcritical-review --stage <stage>
9-Step Protocol adapted from oh-my-claudecode (MIT, © 2025 Yeachan Heo): https://github.com/Yeachan-Heo/oh-my-claudecode/blob/main/agents/critic.md
npx claudepluginhub inho-team/qe-framework --plugin qe-frameworkCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.