From thinking-skills
Applies Kepner-Tregoe IS/IS-NOT boundary analysis to debug selective defects where some cases are affected but not others, revealing root cause from the contrast.
How this skill is triggered — by the user, by Claude, or both
Slash command
/thinking-skills:thinking-kepner-tregoeThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Kepner-Tregoe (KT) is a structured root-cause method. **This skill focuses on Problem Analysis (PA) — the IS/IS-NOT boundary contrast — which is the high-value KT process for debugging.** When a defect is *selective* (some cases affected, others not), the boundary between IS and IS-NOT reveals the distinction that points at the root cause.
Kepner-Tregoe (KT) is a structured root-cause method. This skill focuses on Problem Analysis (PA) — the IS/IS-NOT boundary contrast — which is the high-value KT process for debugging. When a defect is selective (some cases affected, others not), the boundary between IS and IS-NOT reveals the distinction that points at the root cause.
Decision Analysis (DA) and Potential Problem Analysis (PPA) are de-emphasized here. For pure decision-making among alternatives, use thinking-opportunity-cost. For risk anticipation before a change, use thinking-pre-mortem. Those skills are purpose-built for those tasks; KT's DA/PPA add overhead without unique mechanism.
Situation Analysis (SA) is retained as a lightweight triage step when facing multiple concerns, but it is not a required preamble — jump directly to PA when the problem is already clear.
Core Principle: The boundary between what IS affected and what IS NOT affected encodes the root cause. Find the distinction, find the cause.
Decision flow:
Defect is selective (not 100%)? → No → IS/IS-NOT has no signal; use direct debugging or thinking-systems
→ Yes → Cause obvious from stack trace/recent change? → Yes → Just fix it
→ No → APPLY KT PROBLEM ANALYSIS
thinking-systems or direct debugging.thinking-occams-razor) before building a full specification matrix.thinking-opportunity-cost, not KT's Decision Analysis.thinking-pre-mortem, not KT's Potential Problem Analysis.When a defect is selective (some cases affected, others not) and the cause is unclear:
Skip if the failure is uniform (100%) — there's no boundary to contrast; use direct debugging. If the cause is obvious from a stack trace or recent change, just fix it. For a single cheaply-testable hypothesis, test it first.
Only when facing several problems at once. List all concerns, separate them if compound, and prioritize by Timing/Impact/Trend:
| Concern | Timing | Impact | Trend | Priority |
|---|---|---|---|---|
| API latency spike | Urgent | High | Worsening | P0 |
| Checkout errors | Soon | High | Stable | P1 |
For each concern, decide: Problem Analysis (PA), or delegate to another skill.
Describe the deviation from expected behavior with specificity:
"API response time increased from 200ms to 800ms for /checkout endpoint,
US-East only, starting Monday 9 AM, affecting ~30% of requests."
Specify the problem across four dimensions. The power is in the distinction column — what's unique about the IS side?
| Dimension | IS (affected) | IS NOT (not affected) | Distinction |
|---|---|---|---|
| WHAT — object | /checkout endpoint | /cart, /product, /user | Payment processing |
| WHAT — defect | 4x latency increase | Errors, timeouts, data corruption | Performance only |
| WHERE — location | Production US-East | EU, US-West, staging | Single region |
| WHERE — on object | Database query phase | Auth, validation, serialization | DB layer |
| WHEN — first seen | Monday 9:00 AM | Before Monday, after 6 PM | Business hours |
| WHEN — pattern | During checkout submit | During browsing, cart add | Write operations |
| EXTENT — how many | ~30% of requests | 100% of requests | Intermittent |
| EXTENT — trend | Stable since Tuesday | Getting worse | Plateaued |
For each row, ask: "What's unique or distinctive about the IS side compared to the IS-NOT side?"
Distinctions:
- Only /checkout (payment processing) — not other endpoints
- Only US-East (specific DB replica) — not other regions
- Only during business hours (load-related?) — not off-peak
- Only ~30% of requests (specific query pattern?) — not all
- Started Monday 9 AM — what changed?
What changed in, on, around, or about the distinctions near the first observation time?
Changes near Monday 9 AM:
- Payment provider SDK updated (Sunday night deploy)
- Database index rebuild scheduled (Sunday maintenance)
- New fraud detection rules enabled (Monday 8:45 AM)
Each candidate cause must explain BOTH the IS and the IS-NOT:
| Possible Cause | Explains IS? | Explains IS-NOT? | Verdict |
|---|---|---|---|
| Fraud rules adding DB queries | ✓ Only checkout, only write ops | ✓ Not other endpoints | Pursue |
| Payment SDK change | ✓ Only checkout | ✗ Would affect all regions | Ruled out |
| Index rebuild | ✓ DB layer | ✗ Would affect all queries | Ruled out |
Design a test to confirm or rule out the leading candidate:
Verification for "Fraud detection rules":
1. Check: Rules enabled 8:45 AM (matches timeline)
2. Check: Rules only on checkout (matches scope)
3. Test: Disable rules in canary, measure latency
4. Examine: Query logs for fraud check queries
A completed KT Problem Analysis produces:
| Anti-Pattern | Symptom | Correction |
|---|---|---|
| KT on uniform failure | Running PA when 100% of requests fail | No boundary to contrast; use direct debugging or thinking-systems |
| Over-specifying the matrix | Filling every IS/IS-NOT cell for a simple bug | Stop when the distinction is clear; don't ritualize |
| DA/PPA sprawl | Running full Decision Analysis or Potential Problem Analysis for routine tasks | Redirect to thinking-opportunity-cost (decisions) or thinking-pre-mortem (risks) |
| Skipping cause testing | Pursuing the first plausible cause without testing against IS-NOT | Every cause must explain BOTH IS and IS-NOT |
| SA as mandatory preamble | Running full Situation Analysis before every PA | Jump directly to PA when the problem is already clear |
| Ignoring the distinction | Building the matrix but not extracting what's unique about IS | The distinction IS the signal; without it, the matrix is just a table |
npx claudepluginhub tjboudreaux/cc-thinking-skills --plugin thinking-skillsInvestigates root causes of defects or incidents by iteratively asking 'why' to trace failures from symptoms to systemic causes. Useful for postmortems and recurring failures.
Five Whys, fishbone diagrams, identifying systemic causes not just symptoms.
Chains 'why' questions with evidence to find systemic root causes after a fault is localized and proximate cause is known. Includes counterfactual test and stop condition.