From superpowers-plus
Diagnoses LLM/prompt behavior issues including tool selection failures, prompt regressions, context window problems, and parsing failures. Dispatched by debug-conductor during forked debugging.
How this skill is triggered — by the user, by Claude, or both
Slash command
/superpowers-plus:llm-behavior-investigatorThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **Role:** Diagnose LLM-related failures: wrong tool selection, prompt regressions, context overflow, parsing errors.
Role: Diagnose LLM-related failures: wrong tool selection, prompt regressions, context overflow, parsing errors. Dispatched by:
debug-conductor— never invoked directly by user. Evidence type:LLMEvidence(seeskills/_shared/evidence-schema.md)
Dispatched by debug-conductor when the incident involves AI/LLM behavior — tool misselection, prompt regressions, context window pressure, or output parsing failures.
| Mode | Symptoms | Investigation Path |
|---|---|---|
| Tool selection failure | Wrong tool invoked; correct tool available | Step 2A: Tool call audit |
| Prompt regression | Behavior changed after prompt/template update | Step 2B: Prompt diff analysis |
| Context overflow | Degraded quality on long conversations | Step 2C: Context window analysis |
| Parsing failure | LLM output can't be parsed by downstream code | Step 2D: Output format audit |
| Hallucination in tool args | Tool called with fabricated parameters | Step 2A + Step 2C |
{ section, before, after, impact }failingAvg vs. succeedingAvg → significant difference?usedTokens / maxTokens > 0.8 → "high utilization zone"Return LLMEvidence to conductor:
{
"toolCalls": [
{ "tool": "send_email", "params": {"to": "customer"}, "success": true, "expected": "make_call" }
],
"promptDiffs": [
{ "section": "make_call description", "before": "Initiate outbound phone call", "after": "Reach out via voice channel", "impact": "Ambiguity increase" }
],
"contextUsage": { "promptTokens": 108000, "maxTokens": 128000, "utilization": 0.84 },
"parsingFailures": []
}
Plus standard evidence wrapper:
| Pattern | Evidence Shape |
|---|---|
| Ambiguous tool description | Misselections cluster around specific tool; description is vague |
| Context window pressure | Misselections correlate with high utilization (>80%) |
| Prompt regression | Behavior change correlates with prompt template deployment |
| Tool argument hallucination | Tool called with plausible but fabricated parameters |
| Format drift | Output structure degrades under high context load |
| Compound failure | 2+ factors required together (e.g., ambiguous description + high context) |
| Mode | Symptom | Recovery |
|---|---|---|
| Prompt red herring | Blaming prompt when model changed | Check model version and deployment first |
| Context window overflow | Subtle truncation not detected | Measure actual token count vs limit |
| Non-determinism | Cannot reproduce intermittent failure | Run multiple trials, report distribution |
npx claudepluginhub bordenet/superpowers-plus --plugin superpowers-plusGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.