reproduction-experiment-investigator | superpowers-plus

Stats

Actions

Tags

reproduction-experiment-investigator | superpowers-plus

Reproduction & Experiment Investigator

Role: Test debugging hypotheses through controlled reproduction attempts. Confirm or reject with evidence. Dispatched by: debug-conductor — never invoked directly by user. Evidence type: ExperimentEvidence (see skills/_shared/evidence-schema.md)

When to Use

Dispatched by debug-conductor when a hypothesis needs testing — controlled reproduction, environment comparison, or A/B verification of a suspected root cause.

Investigation Protocol

Step 1: Receive and Refine Hypothesis

From the conductor, receive:

Hypothesis: "The failure is caused by [X] when [condition Y] is present"
Predicted outcome: "If hypothesis is correct, we expect [Z] when we [action]"
Environment: Where to reproduce (staging, local, isolated sandbox)

Refine into a testable experiment:

Define exact steps to reproduce
Define exact success/failure criteria (not subjective)
Identify minimum reproduction conditions (strip unnecessary variables)

Step 2: Environment Assessment

Compare reproduction environment to production:
- Same versions? Same config? Same data? Same load?
- Document ALL differences — any could explain failure to reproduce
If significant differences exist:
- Can we close the gap? (deploy same version, copy config)
- If not, document the gap as a limitation

Step 3: Execute Controlled Reproduction (3+ attempts)

For each attempt:

Reset environment to clean state
Apply the hypothesized condition
Execute the triggering action
Record: did the expected failure occur?
Record: any unexpected behavior?

Minimum 3 attempts — intermittent bugs need statistical confidence.

Attempts	Reproductions	Confidence
3/3	3	High (>0.8) — hypothesis strongly supported
2/3	2	Medium (0.5–0.8) — likely correct but intermittent
1/3	1	Low (0.3–0.5) — possible but unreliable
0/3	0	Very Low (<0.3) — hypothesis likely wrong OR environment mismatch

Step 4: Control Experiment

If reproduction succeeded:

Remove the hypothesized condition
Re-run the same triggering action
If failure disappears → strong confirmation
If failure persists → hypothesis may be wrong or incomplete

Step 5: Produce Evidence

{
  "hypothesis": "Event ordering bug in async pipeline under load",
  "steps": [
    { "action": "Set event processing to async mode", "result": "Config applied", "success": true },
    { "action": "Send 50 concurrent call events", "result": "Events arrived out of order in 12/50 cases", "success": true },
    { "action": "Verify call state machine diverged", "result": "3 calls in disconnected state prematurely", "success": true }
  ],
  "outcome": "reproduced",
  "reproduced": true,
  "attempts": 3,
  "successRate": 1.0,
  "environmentDiff": "Staging uses lower load (50 concurrent vs 500 in prod); reproduction rate may differ"
}

Stop Conditions

Hypothesis confirmed or rejected with ≥3 attempts
Reproduction rate established
Environment assessment shows unbridgeable gap (cannot reproduce here)
Token budget exhausted
Wall-clock limit (5 minutes)

Escalation Conditions

0/3 reproduction despite matching environment → hypothesis may be wrong; tell conductor
Intermittent reproduction (<50%) → flag as timing-dependent; may need more attempts or different conditions
Environment mismatch prevents reproduction → flag as "cannot test here"; need production access or closer replica

Common Patterns This Investigator Detects

Pattern	Evidence Shape
Deterministic bug	3/3 reproduction, 0/3 without condition → confirmed
Load-dependent bug	Reproduces only above certain concurrency threshold
Environment-specific	Reproduces in prod-like environment but not staging → config/infra difference
Intermittent / race condition	1–2/3 reproduction → timing-dependent
Hypothesis disproven	0/3 reproduction even with condition → reject hypothesis

Failure Modes

Mode	Symptom	Recovery
Incomplete isolation	Test affected by shared state	Reset environment between experiments
False confirmation	Coincidental success in reproduction	Run multiple trials
Wrong variable	Testing irrelevant hypothesis	Verify hypothesis matches symptoms