From superpowers-plus
Multi-persona adversarial review for non-code deliverables (plans, skills, documents, designs). Simulates 3 critic personas scoring on correctness, simplicity, blind spots, verifiability, and operational risk.
How this skill is triggered — by the user, by Claude, or both
Slash command
/superpowers-plus:progressive-harsh-reviewThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **Wrong skill?** Code PR review → `progressive-code-review-gate`. File-protocol review → `code-review-respond`. Quick feedback → `providing-code-review`.
Wrong skill? Code PR review →
progressive-code-review-gate. File-protocol review →code-review-respond. Quick feedback →providing-code-review.Purpose: Multi-persona adversarial review that catches what self-review cannot. Pattern: Three escalating critic personas, each scoring independently.
Announce at start: "I'm using the progressive-harsh-review skill to red-team this work."
Intent-based (self-fire — do not wait to be asked):
Explicit request:
NOT for:
code-review-battery insteaddebate handles thatFocus: typos, formatting, naming, style, completeness, internal consistency. Tone: eager, thorough, detail-oriented. Full access: All codebase context is available. Every persona may follow any lead. START FROM: Line-by-line reading of the artifact — every heading, table, and claim. PRIORITIZE: Surface correctness, naming consistency, undefined terms, broken cross-references, missing steps, formatting errors. Dimension weights: Correctness 35%, Simplicity 25%, Blind Spots 20%, Verifiability 15%, Operational Risk 5%. (Code-term equivalents: Edge Cases, Testability, Security/Perf.)
Focus: structure, internal consistency, scope boundaries, coverage completeness. Tone: experienced, skeptical, pattern-aware. Full access: All codebase context is available. Every persona may follow any lead. START FROM: Promises made vs. evidence provided — do all claims hold when traced against the rest of the document? PRIORITIZE: Internal consistency, dependency assumptions, coverage completeness, reversibility of decisions, overlap with peer artifacts. Dimension weights: Correctness 25%, Simplicity 15%, Verifiability 25%, Blind Spots 15%, Operational Risk 20%. (Code-term equivalents: Testability, Edge Cases, Security/Perf.)
Focus: failure scenarios, blind spots, adverse conditions, adoption risk. Tone: battle-scarred, worst-case thinker, "what breaks at 3am?" Full access: All codebase context is available. Every persona may follow any lead. START FROM: Failure scenarios — what happens if an assumption is wrong, a step is skipped, or the context changes mid-execution? PRIORITIZE: Unrecoverable failure paths, missing rollback/fallback, global activation risk, dependency on absent tooling, 3 AM resilience. Dimension weights: Correctness 25%, Simplicity 10%, Blind Spots 25%, Verifiability 10%, Operational Risk 30%. (Code-term equivalents: Edge Cases, Testability, Security/Perf.)
Before dispatching personas, grep the artifact for author-noise leakage — content only the author would recognize that will confuse a fresh reader:
/Users/matt/, /home/runner/, /tmp/build-123/)If any are found, flag them as Minor author-noise findings in the report. Do NOT score down for these — they are editorial, not correctness failures. Remove them before shipping if found.
HARD GATE: Author ≠ Reviewer. Use a sub-agent or explicit role switch.
For each persona, answer ALL scoring dimensions:
| Dimension | Default Weight | Question |
|---|---|---|
| Correctness | 30% | Does it do what it claims? Are there errors or false assertions? |
| Simplicity | 20% | Is it the simplest approach? Over-engineered or redundant? |
| Verifiability | 15% | Can each claim or step be independently verified or audited? |
| Blind Spots | 20% | What scenarios or failure paths were not addressed? |
| Operational Risk | 15% | What breaks under adverse conditions? Misuse vectors, adoption failure? |
Weight precedence: The per-persona weights defined in The Three Personas section govern each persona's scoring. The table above is the fallback applied only when a persona has no explicit weight definition. Never average per-persona weights together into a single global pass — each persona uses its own weights independently.
Each persona scores 1-10 on each dimension, using per-persona weights. Aggregation rule: compute each persona's weighted dimension score, then take an equal-weight average across the three personas. This replaces the previous MINIMUM rule, which was overly pessimistic when one persona was mismatched to the task.
Critical veto: If ANY persona scores Correctness or Operational Risk ≤4 AND cites a specific defect (not a general concern), that finding acts as a hard veto — automatic REJECT regardless of the weighted mean. This preserves safety without making the whole system hostage to the weakest persona on non-critical dimensions.
Blind Spots / unrecoverable failure findings: An unrecoverable-failure finding (e.g., "no rollback on global activation") MUST be scored on Operational Risk — not Blind Spots alone — so it is eligible for the critical veto. Scoring it only on Blind Spots bypasses the veto gate.
| Weighted Mean | Verdict | Action |
|---|---|---|
| ≥8 | PASS | Ship it |
| 7 to <8 | PASS_WITH_FIXES | Fix all findings, re-score changed areas only. Exit only when Step 6 convergence is met — never exit solely because a round found no new issues. |
| <7 | REJECT | Root-cause analysis → remediate → full re-review |
| Any | REJECT (veto) | Critical veto fired — fix the cited defect, full re-review |
Project-min override: If the project specifies a minimum score floor (e.g., 9.2), that floor raises the PASS bar only. A weighted mean that meets the generic ≥8 PASS threshold but falls below the project min is treated as PASS_WITH_FIXES rather than PASS. The REJECT band (<7) and the critical veto are not affected.
Example: mean 8.3 under a 9.2 project floor → generic band says PASS, override says PASS_WITH_FIXES. Mean 6.8 under any project floor → REJECT (override does not apply to the REJECT band). Mean 9.5 under a 9.2 floor → PASS (floor is met).
On REJECT:
debate (generate alternatives)think-twice (fresh perspective)plan-and-execute (replan)After scoring, scan persona outputs for shared blind spots:
⚠️ CORRELATED EVIDENCE. At least one persona must re-examine from a different starting point (Nitpicker: line-by-line artifact reading, ArchCritic: promises-vs-evidence tracing, ProdOps: failure scenario enumeration).⚠️ ECHO REASONING. Require the echoing persona to restate the finding through their own analytical lens.Flags trigger re-examination, not automatic verdict changes.
When the final round verdict is PASS (weighted mean ≥ 8.0 per the verdict table above, AND ≥ the project minimum if one is set, AND no active critical vetoes, AND no correlated-failure flags), immediately run:
tools/run-phr.sh --verdict PASS --min-score <weighted-mean>
This writes .phr-cleared with format v1|<HEAD-SHA>|PASS|<UTC-TS>|min-score=<N>.
The pre-push hook's Gate 4 reads this sentinel; without it, any push
that touches skill/design .md files is refused at the local pre-push hook
(developer-machine self-discipline, not a server-side security boundary).
Only PASS clears the gate. PASS_WITH_FIXES (mean 7 to <8 or below project-min) → another round, do NOT write sentinel. REJECT (<7 or critical veto) → root-cause, remediate, full re-review.
Run PHR AFTER git commit -- the sentinel binds to HEAD SHA. Any
subsequent commit/amend/rebase invalidates it (Gate 4 will report stale).
Why this is mandatory: PHR was discipline-only for too long -- skill changes shipped without running it repeatedly. The sentinel + Gate 4 closes the loop. Note Gate 4 is a productivity guardrail (catches forgetting), not a tamper-proof security control. Code review must still verify PHR actually ran, not just that the sentinel is present.
Header abbreviations: C=Correctness, S=Simplicity, V=Verifiability, B=Blind Spots, OR=Operational Risk.
### Persona: SeniorArchCritic (C25/S15/V25/B15/OR20)
| Dimension | Score | Finding |
|-----------|-------|---------|
| Correctness | 7 | Claim X contradicts evidence in section 3 |
| Simplicity | 8 | Bounded scope |
| Verifiability | 6 | No worked example for the edge case described |
| Blind Spots | 5 | Global activation with no staged rollout described |
| Operational Risk | 7 | Dependency on absent tooling not flagged |
Per-persona weighted score: 7(.25)+8(.15)+6(.25)+5(.15)+7(.20) = 6.60
| Anti-Pattern | Detection | Correction |
|---|---|---|
| Soft review | No score <7 given | Recalibrate with known-bad example |
| Same feedback loop | Same comment 3 iterations | Escalate to structural fix |
| Style over substance | All comments are formatting | Check logic, edge cases, error handling first |
| Perfection paralysis | 3+ rounds, no convergence | Hard limit: 3 rounds then escalate to human — do NOT ship |
| Missing context | Review without reading full file | Load surrounding context first |
| Failure | Fix |
|---|---|
| Self-reviewed in same thinking pass | Use sub-agent (preferred) — in-process role switch with no context isolation is significantly less reliable; if used, explicitly discard the author's reasoning and start fresh from the artifact text |
| All personas gave same feedback | Each persona must name ≥1 plausible failure mode unique to their lens, or cite a specific property of the change explaining why none exists (generic dismissal = rubber-stamp) — identical findings means the lenses aren't distinct |
| Score inflated to avoid re-work | Findings with concrete issues MUST score ≤7 on that dimension |
| Remediation skipped after REJECT | REJECT means start over. No "fix one thing and call it done" |
| Only reviewed happy path | ProdOpsHardass must consider failure, rollback, 3am scenarios |
| Round N mean lower than Round N-1 | Remediation introduced new issues — flag REGRESSION, root-cause before Round N+1 |
| No output summary before presenting | Always emit PHR SUMMARY block (rounds, mean, verdict, project-min, vetoes) |
| Shipped at round 3 without convergence | 3 rounds = escalate to human with blocker list — never auto-ship |
| Unrecoverable finding scored only on Blind Spots | Must ALSO score Operational Risk to be veto-eligible — Blind Spots alone bypasses the veto gate |
| Skipped sentinel write after PASS | Pre-push Gate 4 refuses the push with "PHR sentinel missing." Run tools/run-phr.sh --verdict PASS --min-score <N> and retry. |
npx claudepluginhub bordenet/superpowers-plus --plugin superpowers-plusGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.