From tp-sadd
Multi-round debate between independent judges until consensus — for high-stakes evaluation where rigorous argumentation improves assessment accuracy
How this skill is triggered — by the user, by Claude, or both
Slash command
/tp-sadd:judge-with-debate Solution path(s) and evaluation criteriaWhen to use
When user says 'debate this', 'multi-judge debate', 'reach consensus', 'adversarial evaluation', 'multiple perspectives on quality', 'high-stakes evaluation'. IMMEDIATELY when user asks for evaluation through structured debate between judges. FIRST when high-stakes decisions require arguing positions before consensus. DO NOT use for routine quality checks — use sadd-judge instead.
Solution path(s) and evaluation criteriaThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
IF evaluating routine quality check → single judge (debate overhead not justified)
IF evaluating routine quality check → single judge (debate overhead not justified) IF high-stakes evaluation requiring consensus → meta-judge then 3 debating judges IF user provides evaluation focus → scope meta-judge to that dimension IF judges cannot reach consensus after 3 rounds → report persistent disagreements for human review IF no work to evaluate → ask what should be evaluated
Evaluate solutions through multi-agent debate where independent judges analyze, challenge each other's assessments, and iteratively refine their evaluations until reaching consensus or maximum rounds.
For high-stakes evaluation where multiple perspectives and rigorous argumentation improve assessment accuracy. Structured debate forces judges to defend positions with evidence and consider counter-arguments.
Key benefits over single-judge:
Skip this skill when:
Dispatch one meta-judge with:
Meta-judge produces evaluation specification YAML covering all quality dimensions. This runs ONCE and is shared by all judges across all rounds.
Prompt template:
Generate an evaluation specification yaml for the following task. You will produce rubrics, checklists, and scoring criteria that judge agents will use through multi-round debate.
Task: {description}
Context: {any relevant context}
Artifact type: {code|documentation|configuration|etc.}
Evaluation mode: Multi-judge debate with consensus-seeking
Launch 3 judges in parallel (Opus recommended).
Each receives:
Each produces independent assessment saved to .specs/reports/{solution-name}-{date}.[1|2|3].md:
Key principle: Independence in initial analysis prevents groupthink.
For each round, launch 3 judges in parallel. Each reads:
Each judge:
Appends "Debate Round {R}" section to their report file.
Orchestrator does not mediate. Judges communicate through filesystem only.
After each debate round:
Consensus achieved when:
If no consensus after 3 rounds:
If consensus:
Consensus Scores
| Criterion | Judge 1 | Judge 2 | Judge 3 | Final |
|-----------|---------|---------|---------|-------|
| {Name} | {X}/5 | {X}/5 | {X}/5 | {X}/5 |
Consensus Overall: {avg}/5.0
Debate Summary
- Rounds to consensus: {N}
- Initial disagreements: {list with specific criteria}
- How resolved: {explanation}
Final recommendation with justification
If no consensus:
Create reports directory:
mkdir -p .specs/reports
Report naming: .specs/reports/{solution-name}-{YYYY-MM-DD}.[1|2|3].md
npx claudepluginhub git-fg/taches-principled --plugin tp-saddProvides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.