Skill

critical-review-loop

From research-factory

Critical-review and auto-proceed mode handling, with fix-in-loop until APPROVED before next phase

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/research-factory:critical-review-loop

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

This skill defines the behavior of the `--critical-review` and `--auto-proceed` flags for **all conductors** (DA, DE, PW, Rev). Every conductor activates this skill instead of duplicating the protocol.

SKILL.md

136 lines · ~2.1k tokens

Stats

LanguagePython

Parent stars0

MaintenanceGood

Last CommitJun 10, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Critical Review + Fix Loop

This skill defines the behavior of the --critical-review and --auto-proceed flags for all conductors (DA, DE, PW, Rev). Every conductor activates this skill instead of duplicating the protocol.

Flag Parsing

Parse the user's first message for these flags (case-sensitive):

--critical-review — turn on adversarial review at every MANDATORY STOP
--auto-proceed — overnight mode: skip user approval if all gates pass
Both can be combined.

If neither flag is present → standard interactive behavior. SS-Critic is NOT invoked. Skip the rest of this skill.

`--critical-review` — Adversarial Review with Fix Loop

At every MANDATORY STOP, before presenting deliverables to the user:

Round 1 — Initial Critique

Invoke SS-Critic with the deliverable from the current phase.
- Pass file paths, NOT full file contents (token saver — Critic reads what it needs).
- For domain-substance reviews of empirical finance work, ALSO invoke SS-DomainReviewer in parallel.
Collect the verdict: READY (>=7) / ALMOST (5-6) / NOT READY (<=4).

Fix-In-Loop (NEW — required behavior)

If verdict is NOT READY or ALMOST, do NOT pass control back to the user yet. Run the fix loop:

Identify the fixer subagent based on the deliverable type:

Deliverable	Fixer
Paper section / prose	PW-Drafter
LaTeX formatting / tables	PW-Typesetter
Citations / references	PW-Bibliographer
Regression / table results	DA-Executor
Sample / data construction	DA-Builder
Extraction code / cleaned data	DE-Miner / DE-Refiner
Referee response document	Rev-Respondent
Figures	SS-Vizmaker

Delegate the EXACT critic findings to the fixer. Inline only the issues list — do NOT re-paste the full deliverable. Tell the fixer:
- "Implement only the fixes listed below. No independent changes. No scope creep."
- Pass the issues with file + line references and the critic's required fix.
Re-invoke SS-Critic in re-review mode. Critic checks each prior issue (FIXED / PARTIAL / NOT FIXED / REGRESSED) and may add new issues.
Loop until verdict = APPROVED (zero CRITICAL, zero MAJOR) or max 3 rounds (not 5 — token budget).
Cap by issue count: if a single round still has >5 CRITICAL issues, abort the loop and escalate to the user — the deliverable needs structural rework, not patching.

After Loop Exits

If APPROVED → present the (now-fixed) deliverable to the user with the round-count and a brief diff summary.
If max rounds reached without APPROVED → present current state, list remaining issues, and explicitly ask the user how to proceed. Never silently accept a non-APPROVED deliverable.

`--auto-proceed` — Overnight Mode

MANDATORY STOPs become conditional. Auto-approve only if ALL of these are true:

Phase-specific validator passes (e.g., SS-Sentinel APPROVED for DA Phase 1, PW-Compiler clean for PW Phase 3, Rev-Auditor all VERIFIED for Rev).
If --critical-review is also active: the fix-loop above ended in APPROVED.
No phase has produced new "open questions for the user".

If any condition fails → STOP and wait for the user.

Log every auto-approval to docs/_STATE.md:

Auto-approved: Phase {N} — {validator} APPROVED, Critic loop {rounds}r APPROVED
Timestamp: {date}

At pipeline end, present a summary of all auto-approved decisions for user acknowledgment.

Token Discipline (applies to every round)

Pass file paths, not file contents, to SS-Critic and SS-DomainReviewer.
Pass to the fixer ONLY the issues list (severity, location, required fix) — never the full deliverable.
Cap each agent invocation at 15 tool calls unless the conductor explicitly raises the budget.
Discard intermediate critic transcripts from your own context after each round; keep only the final verdict and remaining-issues count.
If the fix loop exceeds 3 rounds OR the conversation context exceeds ~70%, stop and hand back to the user.

Automatic Debug Escalation (ALWAYS ON — no flag required)

The flag-gated review above handles prose/deliverable QA. This section handles a different failure mode: subagents stuck patching the same error class repeatedly (e.g., 16 jobs all failing on different DuckDB columns with the same root cause). This loop activates automatically — the user never has to ask for it.

Error Class Tracking

An error class = the root cause category, not the surface symptom. Two failures are the same class if a single structural fix would resolve both.
- ❌ Symptom: column 'revenue' not found, column 'permno' not found → look like different errors
- ✅ Class: DuckDB fetchmany returns bytes for VARCHAR columns from this source → one structural fix resolves all

The conductor (DE/DA/PW/Rev) maintains a counter per class in docs/_STATE.md:

Active error classes (this phase):
  - duckdb-bytes-to-str-bse_daily_raw: attempts=2/3, last=2026-05-20T14:32, fixer=DE-Miner

MAX_RETRY_PER_CLASS = 3 (hard cap)

Round 1: Conductor dispatches fixer subagent with normal instructions. On failure, increment class counter.
Round 2: Before dispatching again, conductor MUST issue an audit-first instruction to the fixer:

"Scan ALL similar call sites in this script / phase in one pass. Do not patch only the failing line. Report structural patterns before fixing."

Round 3: If still failing with the same class, the conductor halts the inner loop automatically and invokes SS-Critic with the full failure history:

3 attempts on error class "{class label}" all failed.
Attempts: [round1 fix → outcome, round2 fix → outcome, round3 fix → outcome]
Request: structural diagnosis — what do all these failures share? Recommend root-level fix vs patch.

SS-Critic returns a structural diagnosis. Conductor presents it to the user with options:
- Apply structural fix (recommended) — delegate to fixer with explicit root-cause instructions
- Continue patching — user override, document risk in _STATE.md
- Halt phase — drop back to planning

Pattern Detection BEFORE First Dispatch

When the conductor sees ≥2 failures already in this phase that share a root cause (even without retrying), it MUST detect the pattern before any individual fix:

Group all current-phase errors by class label

If any class has ≥2 instances, report to user immediately:

Pattern detected before retry: N jobs failing with "{class}".
Individual patches will likely fail on the next similar call site.
Recommending: invoke SS-Critic for structural diagnosis NOW (skipping rounds 1-2).

User confirms; conductor jumps straight to SS-Critic.

Fixer Subagent Contract (SS-Debugger, DE-Miner, DE-Refiner, DA-Executor, DA-Builder)

When a fixer is dispatched and the conductor's message includes RETRY_ROUND: {N} and ERROR_CLASS: {label}, the fixer MUST:

Read the prior attempts and outcomes from _STATE.md before proposing a fix
Never suggest a fix that was already tried (verbatim or near-duplicate)
On round 2+, scan ALL similar call sites in one pass (not just the failing line)
Label its fix as patch (local) or structural (eliminates the class) — conductors prefer structural at round 2+
If the fixer cannot find a different fix, return ESCALATE: same root cause, no new fix available instead of repeating

State Persistence

Every attempt must be logged to docs/_STATE.md under Active error classes. This survives context-window resets and prevents losing retry history. The retry count is the total across the session, not per-subagent-call — switching subagents does NOT reset the counter.

Auto-Proceed Interaction

--auto-proceed does NOT bypass this escalation. At round 3, the conductor halts and waits for user input even in overnight mode (escalation is a structural decision, not a routine validation gate).

critical-review-loop

Invocation

Context Preview

SKILL.md

critical-review-loop

Invocation

Context Preview

SKILL.md

Critical Review + Fix Loop

Flag Parsing

`--critical-review` — Adversarial Review with Fix Loop

Round 1 — Initial Critique

Fix-In-Loop (NEW — required behavior)

After Loop Exits

`--auto-proceed` — Overnight Mode

Token Discipline (applies to every round)

Automatic Debug Escalation (ALWAYS ON — no flag required)

Error Class Tracking

MAX_RETRY_PER_CLASS = 3 (hard cap)

Pattern Detection BEFORE First Dispatch

Fixer Subagent Contract (SS-Debugger, DE-Miner, DE-Refiner, DA-Executor, DA-Builder)

State Persistence

Auto-Proceed Interaction

Similar Skills

Critical Review + Fix Loop

Flag Parsing

`--critical-review` — Adversarial Review with Fix Loop

Round 1 — Initial Critique

Fix-In-Loop (NEW — required behavior)

After Loop Exits

`--auto-proceed` — Overnight Mode

Token Discipline (applies to every round)

Automatic Debug Escalation (ALWAYS ON — no flag required)

Error Class Tracking

MAX_RETRY_PER_CLASS = 3 (hard cap)

Pattern Detection BEFORE First Dispatch

Fixer Subagent Contract (SS-Debugger, DE-Miner, DE-Refiner, DA-Executor, DA-Builder)

State Persistence

Auto-Proceed Interaction

Similar Skills

critical-review-loop

Invocation

Context Preview

SKILL.md

critical-review-loop

Invocation

Context Preview

SKILL.md

Critical Review + Fix Loop

Flag Parsing

--critical-review — Adversarial Review with Fix Loop

Round 1 — Initial Critique

Fix-In-Loop (NEW — required behavior)

After Loop Exits

--auto-proceed — Overnight Mode

Token Discipline (applies to every round)

Automatic Debug Escalation (ALWAYS ON — no flag required)

Error Class Tracking

MAX_RETRY_PER_CLASS = 3 (hard cap)

Pattern Detection BEFORE First Dispatch

Fixer Subagent Contract (SS-Debugger, DE-Miner, DE-Refiner, DA-Executor, DA-Builder)

State Persistence

Auto-Proceed Interaction

Similar Skills

Critical Review + Fix Loop

Flag Parsing

--critical-review — Adversarial Review with Fix Loop

Round 1 — Initial Critique

Fix-In-Loop (NEW — required behavior)

After Loop Exits

--auto-proceed — Overnight Mode

Token Discipline (applies to every round)

Automatic Debug Escalation (ALWAYS ON — no flag required)

Error Class Tracking

MAX_RETRY_PER_CLASS = 3 (hard cap)

Pattern Detection BEFORE First Dispatch

Fixer Subagent Contract (SS-Debugger, DE-Miner, DE-Refiner, DA-Executor, DA-Builder)

State Persistence

Auto-Proceed Interaction

Similar Skills

`--critical-review` — Adversarial Review with Fix Loop

`--auto-proceed` — Overnight Mode

`--critical-review` — Adversarial Review with Fix Loop

`--auto-proceed` — Overnight Mode