From devils-advocate
Run the full devil's advocate workflow end-to-end. Setup persona, evaluate baseline, then iterate corrections until residual risk is acceptable. Use when the user wants the complete analysis in one go.
How this skill is triggered — by the user, by Claude, or both
Slash command
/devils-advocate:runThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Critique a document from its toughest audience. Generate pushback, score responses, produce improvements. Visuals via `svg-infographics`.
Critique a document from its toughest audience. Generate pushback, score responses, produce improvements. Visuals via svg-infographics.
pip install stellars-claude-code-plugins
Ships svg-infographics CLI used for generating pushback-response visuals. Required when any concern triggers a visual option (cognitive load, number exhaustion, metric confusion). Verify: svg-infographics --help.
MANDATORY: Use TaskCreate/TaskUpdate for setup, evaluation, each iteration. Mark in_progress/completed.
Three artefacts in target document directory:
devils_advocate.md - persona, concerns, scoring, responsesfact_repository.md - verified claims from sources + user inputdevils_advocate.mdOptional: SVG infographics via svg-infographics skill.
MANDATORY: Persona BEFORE concerns. Three sources:
Seed document (evaluation, review) alongside target. Infer persona from tone, priorities. Confirm with user.
Ask:
MUST ask. Never proceed with generic concerns. Offer:
Group: merge priorities (union), take harshest bias per persona, weight likelihood by which persona raises each concern. Document each contributor and composite.
Top of devils_advocate.md:
# Devil's Advocate - [Project Name]
## The Devil
**Role**: [title and context]
**Cares about**: [2-3 priorities in order]
**Style**: [how they process information]
**Default bias**: [their starting stance]
**Triggers**: [what makes them react negatively]
**Decision**: [what they can do with the document]
**Source**: [seed-inferred / user-described / composite from N personas]
---
Composite: add subsection per contributing persona before merged profile.
Persona shapes every concern. Without it: generic concerns, meaningless scoring.
Scan sources, extract verifiable claims. Every fact MUST have source.
Structure:
# Fact Repository - [Project Name]
Verified claims sourced from contracts, test data, project history, and stakeholder input.
No interpretation - just facts.
## Contract clauses
> *"Exact quote from contract/SOW"*
- Source: [document name, section/line reference]
## User-provided facts
- [Fact as stated by user]
- Source: user input, [date or session reference]
## Data facts
- [Metric or measurement with source]
- Source: [test report, dataset, analysis output]
## Historical facts
- [Decision, event, or timeline fact]
- Source: [meeting notes, email, git history]
Rules:
Per concern, Fibonacci scale (1, 2, 3, 5, 8):
Risk adjustment: after initial catalogue, review full set. Adjust where concerns interact. Document as Risk: N (adjusted from L x I = M, reason: ...). Stay in 1-64.
Concern template:
## N. "[Concern as the devil would phrase it - in their voice]"
**Likelihood: N** | **Impact: N** | **Risk: N**
**Their take**: what devil thinks and feels. Write as them, using their priorities and triggers.
**Reality**: factual counter. Reference fact_repository.md entries.
**Response**: how to address - in document or verbally.
Categories to always evaluate (persona-weighted):
No negative risk scores. Concerns only. Strengths go in "Reality" and "Response". Well-addressed concern scores high on scorecard.
Scorecard = quality of addressing, not presence. 0-100% per concern.
| Score | Meaning | Devil's reaction |
|---|---|---|
| 95-100% | Fully addressed, exemplary | "I have no issue with this" |
| 80-94% | Well addressed, minor gaps | "Fine, but I noticed..." |
| 60-79% | Partially addressed, notable gaps | "This doesn't fully answer my question" |
| 40-59% | Weakly addressed, significant exposure | "This is a problem" |
| 20-39% | Poorly addressed, makes devil suspicious | "You're hiding something" |
| 0-19% | Not addressed or actively harmful | "This makes it worse" |
## Scorecard
| # | Concern | Risk | Score | Reasoning |
|---|---------|------|-------|-----------|
| 1 | [short name] | 25 | 85% | [specific text/element that addresses it + quality assessment + any side effects] |
Reasoning MUST quote specific text. No generic statements.
Minimise document score (total residual risk). Lower = better.
risk x (1 - score)Perfect = 0. Starting = total absolute risk.
File naming: <name>_v<NN>_<score>.md. Example: pcp_rnd_v02_54.md.
Biggest gaps: top 5 residuals. Next iteration targets.
Per high-residual concern, propose 2-4 options:
### Concern #N: [name] (residual: X.X)
**Option A**: [specific text change or addition]
- Expected effect: #N +15%, #12 -5% (net: +10%)
**Option B**: [structural change]
- Expected effect: #N +20%, #16 -10% (net: +10%)
**Option C**: [SVG infographic]
- Expected effect: #N +25%, #10 +10%, #14 +10% (net: +45%)
**Recommendation**: [which option and why]
Visual options: if concern relates to cognitive load (#10), number exhaustion (#14), or metric confusion (#13) - always include SVG option. Specify chart type, data, expected improvement, reference svg-infographics.
Each iteration: new versioned copy, fresh evaluation. Original untouched.
<name>_v<NN+1>.md<score> suffix<name>_v<NN+1>_<score>.mddevils_advocate.md with new scorecard (keep old ones)report.md # original (untouched)
report_v01_89.md # copy of original with embedded scorecard (residual 89)
report_v02_34.md # first correction pass (residual 34)
report_v03_12.md # second correction pass (residual 12)
devils_advocate.md # updated in place, contains all scorecards
fact_repository.md # updated in place
Score MUST decrease each iteration. If not: corrections are creating problems - stop and reassess.
Stop when:
Some fixes break others:
Aggression punishes. Confrontational, accusatory, blame-shifting language drops tone scores even if facts improve. Tone registers before facts.
Reader needs WHY. Facts without reasons = devil fills blanks, usually badly.
Transparency beats framing. Hiding bad numbers costs more trust than showing with context.
Quality over presence. Weak addressing scores worse than no addressing. Half-answers signal awareness without competence.
One visual replaces three paragraphs. SVG (via svg-infographics) = highest leverage for cognitive load:
MANDATORY: Every versioned document ends with embedded scorecard after horizontal rule.
Format:
---
## Devil's Advocate Scorecard
**Persona**: [devil role and key bias]
**Document score**: [total residual risk] (lower = better, max. [total absolute risk])
| # | Concern | Risk | Score | Residual | How addressed |
|---|---------|------|-------|----------|---------------|
| 1 | [short name] | [risk] | [0-100%] | [residual] | [specific text/section that addresses it + quality note] |
| ... | ... | ... | ... | ... | ... |
**Top gaps**: [list 3-5 highest residual concerns with brief note on what's missing]
Rules:
Alongside target:
target_document.md
devils_advocate.md
fact_repository.md
Versioned documents:
<name>_v<NN>_<score>.md
<NN> - two-digit (v01 = initial + scorecard, v02 = first correction)_<score> - MANDATORY rounded residual. Missing = INCOMPLETEExamples:
DESIGN_v01_89.md - original + first scorecard, residual 89DESIGN_v02_28.md - first correction, residual 28DESIGN_v03_12.md - second correction, residual 12WRONG: DESIGN_v02.md (no score suffix)
devils_advocate.md and fact_repository.md updated in place - never versioned. devils_advocate.md accumulates all scorecards for comparison.
npx claudepluginhub stellarshenson/claude-code-plugins --plugin devils-advocateProvides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.