From thinking-skills
Enumerates competing falsifiable hypotheses for bug symptoms with multiple plausible causes, then picks the cheapest observation that discriminates between them.
How this skill is triggered — by the user, by Claude, or both
Slash command
/thinking-skills:thinking-scientific-methodThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
The scientific method's payoff for an agent is not narrating "observe -> question." It is the **differential**: when a symptom could come from several places, enumerate competing falsifiable hypotheses and spend your cheapest observation on the one that best discriminates between them.
The scientific method's payoff for an agent is not narrating "observe -> question." It is the differential: when a symptom could come from several places, enumerate competing falsifiable hypotheses and spend your cheapest observation on the one that best discriminates between them.
This is the proven replacement for the old broad scientific-method skill. In SWE-bench fault localization, the original was flat; this agent-native version turned it into the strongest measured debugging lift in the current eval set.
Core Principle: Don't guess-and-patch. Enumerate competing causes, then make the cheapest observation that would falsify the most likely one.
Symptom has several plausible causes?
-> no -> test or fix the obvious cause directly
-> yes -> can you make cheap observations now?
-> no -> gather access/evidence first
-> yes -> apply hypothesis-differential debugging
List 3-5 specific, falsifiable hypotheses. Name the likely file, function, subsystem, input condition, or invariant. Avoid vague buckets like "backend issue" or "race condition somewhere."
| # | Hypothesis | Why plausible? |
|---|------------|----------------|
| H1 | `auth/session.py:refresh` drops rotated tokens | failures start after token rotation |
| H2 | cache TTL mismatch in `session_cache` | stale sessions persist across deploys |
| H3 | frontend retries reuse expired cookie | only browser flow is affected |
If you can only think of one hypothesis, you are guessing. Force alternatives before inspecting deeper.
For each hypothesis, name one observation you can make now that would separate it from the others.
Good observations:
Bad observations:
For each hypothesis, write what result would make you drop it. This prevents confirmation search.
| Hypothesis | Falsified if... |
|------------|-----------------|
| H1 token refresh | refresh path never reads rotated token state |
| H2 cache TTL | cache entry expires before the observed stale window |
| H3 frontend retry | same failure occurs in API-only reproduction |
Test the observation with the best expected information per unit of effort. Start with the cheapest observation that separates your top hypotheses, not the most elaborate investigation.
For each hypothesis → name falsifier → rank by likelihood x cheapness → observe → update/drop → localize fault
Stop when one hypothesis is supported by direct evidence and the key alternatives are ruled out. Name the file/function/config to change and the evidence that localizes it.
## Symptom
[Specific failing behavior, scope, timing, and known constraints]
## Hypotheses
| # | Hypothesis | Why plausible? | Cheapest observation | Falsified if... |
|---|------------|----------------|----------------------|-----------------|
| H1 | [specific file/function/config cause] | [evidence] | [read/grep/diff/check] | [drop condition] |
| H2 | [specific alternate cause] | [evidence] | [read/grep/diff/check] | [drop condition] |
| H3 | [specific alternate cause] | [evidence] | [read/grep/diff/check] | [drop condition] |
## Test Order
1. [Cheapest discriminating observation]
2. [Next observation if H1 is falsified]
3. [Deferred only if cheap observations do not localize]
## Localization
[Supported hypothesis, ruled-out alternatives, and the file/function/config to change]
Symptom: intermittent 500s on /export, only eu-west, started three days ago.
Hypotheses:
1. recent diff to export serializer
Observation: inspect commits touching `export_serializer`
Falsified if: no diff touches the failing codepath
2. eu-west Redis rotation broke a cache key
Observation: read cache key construction + region config
Falsified if: key and TTL match healthy regions
3. upstream timeout under load
Observation: compare timeout logs during failure window
Falsified if: no upstream latency spike
Test order: H1, H2, H3.
Result: H1 diff changed nested export handling and matches stack trace.
Localized fault: `app/export/serializer.py`.
"The first principle is that you must not fool yourself - and you are the easiest person to fool." Your intuition generates hypotheses; the differential tests them.
npx claudepluginhub tjboudreaux/cc-thinking-skills --plugin thinking-skillsForces a scientific-method loop (Observe → Hypothesize → Experiment → Conclude) to debug non-trivial bugs. Prevents guessing by ensuring evidence before fixes.
Enforces systematic root cause analysis for bugs, test failures, unexpected behavior, and regressions via five-phase workflow: Understand, Reproduce, Isolate, Fix, Verify.
Hypothesis-driven debugging methodology with ranked hypotheses, git bisect strategies, instrumentation planning, and minimal reproduction design. For intermittent failures, unclear stacktraces, performance regressions, and non-obvious bugs requiring systematic investigation.