Skill

progressive-harsh-review

Multi-persona adversarial review for non-code deliverables (plans, skills, documents, designs). Simulates 3 critic personas scoring on correctness, simplicity, blind spots, verifiability, and operational risk.

design

documentation

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/superpowers-plus:progressive-harsh-review

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

> **Wrong skill?** Code PR review → `progressive-code-review-gate`. File-protocol review → `code-review-respond`. Quick feedback → `providing-code-review`.

SKILL.md

251 lines · ~3.7k tokens

Stats

LanguageShell

Stars8

Forks1

MaintenanceExcellent

Last CommitJun 18, 2026

Actions

View Source View Plugin View on GitHub View README

Progressive Harsh Review

Wrong skill? Code PR review → progressive-code-review-gate. File-protocol review → code-review-respond. Quick feedback → providing-code-review.

Purpose: Multi-persona adversarial review that catches what self-review cannot. Pattern: Three escalating critic personas, each scoring independently.

Announce at start: "I'm using the progressive-harsh-review skill to red-team this work."

Companion Skills

progressive-code-review-gate: Code-level review (this skill reviews designs/plans)
brainstorming: Generating options before review
micro-harsh-review: Per-batch code review
providing-code-review: Code-specific review

When to Use

Intent-based (self-fire — do not wait to be asked):

About to present any non-code deliverable to the human — plans, specs, skill files, designs, documents
The trigger is the INTENT to present, not whether the human explicitly requested review
If there is even a 1% chance the human expects a solid artifact, run PHR first

Explicit request:

When the user says "review this harshly", "find what's wrong", or "red team this"

NOT for:

Code PRs → use code-review-battery instead
Design comparison (choosing between options) → debate handles that
Initial brainstorming (too early — nothing to review yet)

The Three Personas

Persona 1: JuniorDevNitpicker (Surface Quality)

Focus: typos, formatting, naming, style, completeness, internal consistency. Tone: eager, thorough, detail-oriented. Full access: All codebase context is available. Every persona may follow any lead. START FROM: Line-by-line reading of the artifact — every heading, table, and claim. PRIORITIZE: Surface correctness, naming consistency, undefined terms, broken cross-references, missing steps, formatting errors. Dimension weights: Correctness 35%, Simplicity 25%, Blind Spots 20%, Verifiability 15%, Operational Risk 5%. (Code-term equivalents: Edge Cases, Testability, Security/Perf.)

Persona 2: SeniorArchCritic (Structural Quality)

Focus: structure, internal consistency, scope boundaries, coverage completeness. Tone: experienced, skeptical, pattern-aware. Full access: All codebase context is available. Every persona may follow any lead. START FROM: Promises made vs. evidence provided — do all claims hold when traced against the rest of the document? PRIORITIZE: Internal consistency, dependency assumptions, coverage completeness, reversibility of decisions, overlap with peer artifacts. Dimension weights: Correctness 25%, Simplicity 15%, Verifiability 25%, Blind Spots 15%, Operational Risk 20%. (Code-term equivalents: Testability, Edge Cases, Security/Perf.)

Persona 3: ProdOpsHardass (Operational Quality)

Focus: failure scenarios, blind spots, adverse conditions, adoption risk. Tone: battle-scarred, worst-case thinker, "what breaks at 3am?" Full access: All codebase context is available. Every persona may follow any lead. START FROM: Failure scenarios — what happens if an assumption is wrong, a step is skipped, or the context changes mid-execution? PRIORITIZE: Unrecoverable failure paths, missing rollback/fallback, global activation risk, dependency on absent tooling, 3 AM resilience. Dimension weights: Correctness 25%, Simplicity 10%, Blind Spots 25%, Verifiability 10%, Operational Risk 30%. (Code-term equivalents: Edge Cases, Testability, Security/Perf.)

The Process

Step 0: Fresh-Reader Pre-Check (author-noise audit)

Before dispatching personas, grep the artifact for author-noise leakage — content only the author would recognize that will confuse a fresh reader:

Machine-local paths (e.g., /Users/matt/, /home/runner/, /tmp/build-123/)
Invented identifiers not defined in the artifact (e.g., referencing a function that doesn't exist in the diff)
Process commentary left in ("I added this because...", "TODO from our discussion", "per the Slack thread")
Internal ticket/PR references a public reader cannot resolve

If any are found, flag them as Minor author-noise findings in the report. Do NOT score down for these — they are editorial, not correctness failures. Remove them before shipping if found.

Step 1: Dispatch Review

HARD GATE: Author ≠ Reviewer. Use a sub-agent or explicit role switch.

For each persona, answer ALL scoring dimensions:

Dimension	Default Weight	Question
Correctness	30%	Does it do what it claims? Are there errors or false assertions?
Simplicity	20%	Is it the simplest approach? Over-engineered or redundant?
Verifiability	15%	Can each claim or step be independently verified or audited?
Blind Spots	20%	What scenarios or failure paths were not addressed?
Operational Risk	15%	What breaks under adverse conditions? Misuse vectors, adoption failure?

Weight precedence: The per-persona weights defined in The Three Personas section govern each persona's scoring. The table above is the fallback applied only when a persona has no explicit weight definition. Never average per-persona weights together into a single global pass — each persona uses its own weights independently.

Step 2: Score and Aggregate

Each persona scores 1-10 on each dimension, using per-persona weights. Aggregation rule: compute each persona's weighted dimension score, then take an equal-weight average across the three personas. This replaces the previous MINIMUM rule, which was overly pessimistic when one persona was mismatched to the task.

Critical veto: If ANY persona scores Correctness or Operational Risk ≤4 AND cites a specific defect (not a general concern), that finding acts as a hard veto — automatic REJECT regardless of the weighted mean. This preserves safety without making the whole system hostage to the weakest persona on non-critical dimensions.

Blind Spots / unrecoverable failure findings: An unrecoverable-failure finding (e.g., "no rollback on global activation") MUST be scored on Operational Risk — not Blind Spots alone — so it is eligible for the critical veto. Scoring it only on Blind Spots bypasses the veto gate.

Step 3: Verdict

Weighted Mean	Verdict	Action
≥8	PASS	Ship it
7 to <8	PASS_WITH_FIXES	Fix all findings, re-score changed areas only. Exit only when Step 6 convergence is met — never exit solely because a round found no new issues.
<7	REJECT	Root-cause analysis → remediate → full re-review
Any	REJECT (veto)	Critical veto fired — fix the cited defect, full re-review

Project-min override: If the project specifies a minimum score floor (e.g., 9.2), that floor raises the PASS bar only. A weighted mean that meets the generic ≥8 PASS threshold but falls below the project min is treated as PASS_WITH_FIXES rather than PASS. The REJECT band (<7) and the critical veto are not affected.

Example: mean 8.3 under a 9.2 project floor → generic band says PASS, override says PASS_WITH_FIXES. Mean 6.8 under any project floor → REJECT (override does not apply to the REJECT band). Mean 9.5 under a 9.2 floor → PASS (floor is met).

Step 4: Remediation (if needed)

On REJECT:

Root-cause analysis — why did the issues exist? (missed requirement, wrong assumption, insufficient context)
Chain to remediation skills:
- Design issues → debate (generate alternatives)
- Stuck/circular → think-twice (fresh perspective)
- Plan issues → plan-and-execute (replan)
Re-review — minimum 2 rounds. Round 2 reviews ONLY delta changes.

Step 5: Correlated-Failure Detection

After scoring, scan persona outputs for shared blind spots:

Evidence overlap: If all 3 personas cite the same evidence for their findings, flag ⚠️ CORRELATED EVIDENCE. At least one persona must re-examine from a different starting point (Nitpicker: line-by-line artifact reading, ArchCritic: promises-vs-evidence tracing, ProdOps: failure scenario enumeration).
Phrasing similarity: If 2+ personas use near-identical phrasing, flag ⚠️ ECHO REASONING. Require the echoing persona to restate the finding through their own analytical lens.
Clean-sweep suspicion: If ALL personas report no findings, verify each persona's output shows evidence of their distinct starting point (Nitpicker: line-by-line artifact reading, ArchCritic: promises-vs-evidence tracing, ProdOps: failure scenario enumeration). If any persona's output lacks starting-point-specific evidence, re-examine.

Flags trigger re-examination, not automatic verdict changes.

Step 6: Convergence

Exit when: Final round weighted mean ≥8 (or project min if higher) AND no active Critical vetoes AND no correlated-failure flags AND no new material issues in latest round
Escalate when: 3 rounds without convergence → summarize blockers, escalate to human

Sentinel Write After PASS (MANDATORY)

When the final round verdict is PASS (weighted mean ≥ 8.0 per the verdict table above, AND ≥ the project minimum if one is set, AND no active critical vetoes, AND no correlated-failure flags), immediately run:

tools/run-phr.sh --verdict PASS --min-score <weighted-mean>

This writes .phr-cleared with format v1|<HEAD-SHA>|PASS|<UTC-TS>|min-score=<N>. The pre-push hook's Gate 4 reads this sentinel; without it, any push that touches skill/design .md files is refused at the local pre-push hook (developer-machine self-discipline, not a server-side security boundary).

Only PASS clears the gate. PASS_WITH_FIXES (mean 7 to <8 or below project-min) → another round, do NOT write sentinel. REJECT (<7 or critical veto) → root-cause, remediate, full re-review.

Run PHR AFTER git commit -- the sentinel binds to HEAD SHA. Any subsequent commit/amend/rebase invalidates it (Gate 4 will report stale).

Why this is mandatory: PHR was discipline-only for too long -- skill changes shipped without running it repeatedly. The sentinel + Gate 4 closes the loop. Note Gate 4 is a productivity guardrail (catches forgetting), not a tamper-proof security control. Code review must still verify PHR actually ran, not just that the sentinel is present.

Scoring Output Format

Header abbreviations: C=Correctness, S=Simplicity, V=Verifiability, B=Blind Spots, OR=Operational Risk.

### Persona: SeniorArchCritic (C25/S15/V25/B15/OR20)
| Dimension | Score | Finding |
|-----------|-------|---------|
| Correctness | 7 | Claim X contradicts evidence in section 3 |
| Simplicity | 8 | Bounded scope |
| Verifiability | 6 | No worked example for the edge case described |
| Blind Spots | 5 | Global activation with no staged rollout described |
| Operational Risk | 7 | Dependency on absent tooling not flagged |
Per-persona weighted score: 7(.25)+8(.15)+6(.25)+5(.15)+7(.20) = 6.60

Anti-Patterns

Anti-Pattern	Detection	Correction
Soft review	No score <7 given	Recalibrate with known-bad example
Same feedback loop	Same comment 3 iterations	Escalate to structural fix
Style over substance	All comments are formatting	Check logic, edge cases, error handling first
Perfection paralysis	3+ rounds, no convergence	Hard limit: 3 rounds then escalate to human — do NOT ship
Missing context	Review without reading full file	Load surrounding context first

Failure Modes

Failure	Fix
Self-reviewed in same thinking pass	Use sub-agent (preferred) — in-process role switch with no context isolation is significantly less reliable; if used, explicitly discard the author's reasoning and start fresh from the artifact text
All personas gave same feedback	Each persona must name ≥1 plausible failure mode unique to their lens, or cite a specific property of the change explaining why none exists (generic dismissal = rubber-stamp) — identical findings means the lenses aren't distinct
Score inflated to avoid re-work	Findings with concrete issues MUST score ≤7 on that dimension
Remediation skipped after REJECT	REJECT means start over. No "fix one thing and call it done"
Only reviewed happy path	ProdOpsHardass must consider failure, rollback, 3am scenarios
Round N mean lower than Round N-1	Remediation introduced new issues — flag REGRESSION, root-cause before Round N+1
No output summary before presenting	Always emit PHR SUMMARY block (rounds, mean, verdict, project-min, vetoes)
Shipped at round 3 without convergence	3 rounds = escalate to human with blocker list — never auto-ship
Unrecoverable finding scored only on Blind Spots	Must ALSO score Operational Risk to be veto-eligible — Blind Spots alone bypasses the veto gate
Skipped sentinel write after PASS	Pre-push Gate 4 refuses the push with "PHR sentinel missing." Run `tools/run-phr.sh --verdict PASS --min-score <N>` and retry.

progressive-harsh-review

Popularity

Invocation

Context Preview

SKILL.md

progressive-harsh-review

Popularity

Invocation

Context Preview

SKILL.md

Progressive Harsh Review

Companion Skills

When to Use

The Three Personas

Persona 1: JuniorDevNitpicker (Surface Quality)

Persona 2: SeniorArchCritic (Structural Quality)

Persona 3: ProdOpsHardass (Operational Quality)

The Process

Step 0: Fresh-Reader Pre-Check (author-noise audit)

Step 1: Dispatch Review

Step 2: Score and Aggregate

Step 3: Verdict

Step 4: Remediation (if needed)

Step 5: Correlated-Failure Detection

Step 6: Convergence

Sentinel Write After PASS (MANDATORY)

Scoring Output Format

Anti-Patterns

Failure Modes

Similar Skills

Progressive Harsh Review

Companion Skills

When to Use

The Three Personas

Persona 1: JuniorDevNitpicker (Surface Quality)

Persona 2: SeniorArchCritic (Structural Quality)

Persona 3: ProdOpsHardass (Operational Quality)

The Process

Step 0: Fresh-Reader Pre-Check (author-noise audit)

Step 1: Dispatch Review

Step 2: Score and Aggregate

Step 3: Verdict

Step 4: Remediation (if needed)

Step 5: Correlated-Failure Detection

Step 6: Convergence

Sentinel Write After PASS (MANDATORY)

Scoring Output Format

Anti-Patterns

Failure Modes

Similar Skills