From team-orchestrator
Use when debugging with unclear root cause and multiple plausible explanations that need parallel adversarial testing to converge on the answer
How this skill is triggered — by the user, by Claude, or both
Slash command
/team-orchestrator:competing-hypothesesThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
When the root cause is unclear, a single investigator anchors on the first plausible explanation. Multiple investigators testing competing theories — and actively trying to disprove each other — converge faster and more accurately.
When the root cause is unclear, a single investigator anchors on the first plausible explanation. Multiple investigators testing competing theories — and actively trying to disprove each other — converge faster and more accurately.
Core principle: Adversarial debate eliminates weak hypotheses. The theory that survives structured attack is most likely correct.
Management theory: Psychological Safety (agents MUST challenge each other), Tuckman's Storming (debate is the mechanism, not a problem), Belbin Monitor-Evaluator (devil-advocate role is mandatory).
Don't use when:
coordinator (lead, acts as judge)
├── debugger × 2-5 (each tests one hypothesis)
└── devil-advocate × 1 (challenges all hypotheses)
Belbin coverage:
Sizing by complexity:
| Bug Complexity | Debuggers | Notes |
|---|---|---|
| 2 plausible causes | 2 | Minimum viable debate |
| 3-4 theories | 3-4 | Standard case |
| Systemic/unknown | 4-5 + devil-advocate | Maximum investigation |
Coordinator:
Spawn prompt template for debugger:
You are testing the hypothesis: [HYPOTHESIS]
Bug symptoms: [SYMPTOMS]
Reproduction steps: [STEPS]
Your job:
1. Find evidence FOR your hypothesis (prove it)
2. Find evidence AGAINST your hypothesis (disprove it)
3. Be honest — if your hypothesis is wrong, say so
4. Report: evidence for, evidence against, confidence (0-100%)
You MUST report disconfirming evidence. Hiding evidence that
disproves your hypothesis is a critical failure.
Devil-advocate spawn prompt:
You will review each debugger's findings.
For EACH hypothesis:
1. What evidence would definitively prove it? Did they find it?
2. What evidence would definitively disprove it? Did they look?
3. Are there alternative explanations for their evidence?
4. What tests would distinguish this hypothesis from others?
Your success metric is finding flaws. Approving a weak hypothesis
without challenge is a failure.
Each debugger independently:
Critical: Debuggers must report disconfirming evidence. The prompt explicitly requires this to prevent confirmation bias.
For each hypothesis:
1. Debugger presents: evidence for + against + confidence
2. Devil-advocate challenges: gaps, alternative explanations
3. Other debuggers challenge: "my evidence contradicts yours because..."
4. Coordinator scores: STRONG / WEAK / DISPROVEN
Elimination round:
- DISPROVEN hypotheses are discarded
- WEAK hypotheses get one more investigation round
- STRONG hypotheses proceed to verification
The debate IS the value. Sequential investigation suffers from anchoring. Parallel adversarial investigation eliminates weak theories faster.
For the surviving hypothesis:
.claude.md: what theories were wrong and why (prevents future anchoring)Coordinator identifies 5 hypotheses:
H1: WebSocket connection closing prematurely
H2: Event loop draining with no listeners
H3: Unhandled promise rejection causing exit
H4: Session timeout misconfigured
H5: Message handler throwing uncaught error
Team investigates in parallel:
Debugger 1 (H1): "WebSocket stays open — DISPROVEN"
Debugger 2 (H2): "Event loop has active listeners — DISPROVEN"
Debugger 3 (H3): "Found unhandled rejection in auth middleware — STRONG (80%)"
Debugger 4 (H4): "Timeout is 30min, app exits in 1s — DISPROVEN"
Debugger 5 (H5): "Message handler has try-catch — WEAK (30%)"
Devil-advocate: "H3 is strong but — does the rejection happen on EVERY
message or just the first? If first-only, the auth token refresh
might be the real cause, not the handler."
→ Follow-up reveals: auth token refresh throws on first use because
token is not yet set. H3 was close but the real root cause is
token initialization order.
Without adversarial debate, team would have patched the rejection
handler without fixing the token initialization.
| Mistake | Fix |
|---|---|
| Debugger hides disconfirming evidence | Prompt explicitly requires both-sides reporting |
| All hypotheses are variations of one idea | Ensure hypotheses are truly independent |
| Skipping debate — just picking highest confidence | Debate reveals flaws that confidence scores don't |
| Devil-advocate too soft | Prompt: "approving a weak hypothesis is a failure" |
| Not recording eliminated hypotheses | They prevent future anchoring — record them |
| Fixing symptom, not root cause | Devil-advocate's final question prevents this |
Pre-requisite: team-orchestrator:orchestrating-work routes here Post-requisite: team-orchestrator:session-reflection records learnings Related: superpowers:systematic-debugging for single-agent debugging
npx claudepluginhub labrinyang/team-ochestractorDeploys parallel agent investigators to test multiple bug hypotheses simultaneously, gather confirming/disproving evidence, synthesize findings, rank causes, and apply minimal verified fixes.
Debug complex issues using competing hypotheses with parallel investigation, evidence collection, and root cause arbitration. Use when bugs have multiple potential causes.
Forces a scientific-method loop (Observe → Hypothesize → Experiment → Conclude) to debug non-trivial bugs. Prevents guessing by ensuring evidence before fixes.