From superpowers-plus
Proactively hunts for the worst latent bugs using a five-gate funnel: anti-hallucination re-read, sibling-divergence, test-coverage adjudication, reachability evidence, and confidence scoring. Outputs ranked confirmed bugs plus two risk lists.
How this skill is triggered — by the user, by Claude, or both
Slash command
/superpowers-plus:sp-bughuntThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **Wrong skill?** Known failure -> `sp-debug`. Secrets/vulns -> `sp-scan`. PR diff -> `code-review-battery`. PR inline -> `sp-review`. Adversarial review of plans/docs -> `sp-phr` (canonical name `progressive-harsh-review`).
Wrong skill? Known failure ->
sp-debug. Secrets/vulns ->sp-scan. PR diff ->code-review-battery. PR inline ->sp-review. Adversarial review of plans/docs ->sp-phr(canonical nameprogressive-harsh-review).Companion:
reference.md(parser examples, language-aware sibling/test patterns, full Failure Modes catalogue, full report templates).
Proactively find the highest-severity latent bugs -- silent failures, data corruption, incorrect behavior, security issues -- before they reach production.
Announce at start: "I'm using the sp-bughunt skill; resolving scope now."
Built-in: sibling comparison (high-yield signal), language-aware test-coverage adjudication, anti-hallucination invariants applied at every gate.
NOT for: debugging a known failure (sp-debug), reviewing a PR diff (code-review-battery), credential scanning (sp-scan), adversarial review of non-code deliverables (sp-phr).
| Parameter | Default | Description |
|---|---|---|
N | 2 | Number of worst Confirmed bugs to return |
scope | current repo | Directory or file glob; resolved in Phase 1 to a concrete file list within the repo root |
focus | all | Enum: logic, security, data-loss, performance, all. Validated; any other value rejected. |
confidence-mode | release-prep | Enum: release-gate (T=9.0), release-prep (T=8.0), hygiene (T=7.0). See parser rule below. |
Parser rule: any numeric X token following above, >=, at least, or threshold resolves to the smallest mode >= X, clamped to the enum range. So above 6.5 -> hygiene (7.0), above 7.5 -> release-prep (8.0), above 9.5 -> release-gate (9.0, clamped). A bare integer sets N. The audit-trail header echoes the resolved mode and the user input it was resolved from.
These three re-reads bind every gate that consumes sub-agent output. They are constraints, not phases.
| Invariant | When it binds | Action |
|---|---|---|
| I1. Full-Function Re-read | Before scoring or labeling any candidate | Load the entire function from disk -- never trust the sub-agent's excerpt. A misquoted slice(0, i-1) reads as off-by-one when the source is actually slice(0, i). |
| I2. Sibling Source Re-read | Before claiming sibling divergence | Open both sibling files from disk and diff the analogous functions. Never accept the sub-agent's paraphrase. |
| I3. Live-Disk Final Re-read | Before emitting any candidate in the final report | Re-fetch the cited lines. If they no longer match the report's evidence quote, halt and re-run the affected gates. |
Let T be the threshold from confidence-mode. Bands partition the score line with no gap and no overlap:
| Band | Routing |
|---|---|
score >= T | Confirmed bug list |
T - 2.0 <= score < T | Low-Confidence Risks list |
score < T - 2.0 | Discarded (logged) |
Raising T to 9.0 shifts the latent band to [7.0, 9.0) -- no gap at any T.
The orchestrator (you) runs all phases. The sub-agent runs Phase 2 only. Gates A-E are orchestrator-side per-candidate checks inside Phase 3. The audit trail records gate firings, gate fail-opens, and outcomes.
git rev-parse --show-toplevel; if it fails, abort with not-in-git-repo.scope to an explicit list of file paths. Step order (cheap rejections first): (a) reject paths containing control characters (NUL, \n, etc.); (b) canonicalize each path with realpath (GNU realpath -e semantics: resolve all symlinks, path must exist, abort with path-resolution-failure otherwise; on BSD/macOS, run realpath followed by test -e "$resolved" to enforce existence); (c) reject any canonicalized path outside git rev-parse --show-toplevel; (d) reject if the resolved list exceeds 500 files (require narrower scope; chunking by top-level subdirectory is the recommended strategy for legacy modules that exceed the cap).focus against the enum; reject any other value.confidence-mode per the parser rule; record the resolved mode and source input in the audit-trail header.reference.md (sections 2 and 3).Substitute resolved values before dispatching. Do not pass through angle-bracket placeholders. Cap candidates at min(N*3, 30). Read budget: 80 files. Format the file list as a quoted one-per-line block to defend against control-character paths.
Sub-agent prompt template (text shown; substitute literals before sending):
Adversarial bug audit of the file list below.
Find the worst <MIN(N*3, 30)> bug candidates: silent failures, data
corruption, incorrect behavior, or security issues.
[Include only if focus != all] Focus on <FOCUS>.
Prioritize:
(a) ordering errors where step A invalidates step B's precondition
(b) sibling inconsistencies where two parallel implementations diverge
For each candidate:
- Read the code carefully
- Quote exact lines with file:line
- Explain WHY it is a bug
- Describe the failure mode
- Name a concrete production caller with file:line
- List sibling files implementing similar logic, if any
- Rate severity: CRITICAL / HIGH / MEDIUM / LOW
Read at most 80 files. If you hit the cap, return what you have plus a
list of files you did not reach. Return all candidates -- the orchestrator
gates them. Do not pre-filter.
Files:
<RESOLVED FILE LIST, one per line, double-quoted>
Re-dispatch trigger: if the sub-agent reports >= 20% of the resolved file list unreached, re-dispatch with a narrower scope. Cap re-dispatch at 2. If still >= 20% unreached after the cap, proceed with partial coverage and record re-dispatch-exhausted=true in the audit trail.
Minimum-yield gate: if the sub-agent returns 0 candidates AND files-reached >= 50% of the resolved list, the orchestrator must emit a low-yield-justification block in the report header citing scope size, files actually read, and why fewer candidates surfaced. A zero-bug report with no justification block is treated as a failed run, not a passing run.
Invariants I1, I2, I3 bind throughout. Gates run in order A, B, D, C, E and each gate records its output without making a terminal routing decision (except two short-circuits noted below). After all gates complete, a single deferred Routing Decision examines the accumulated outputs and assigns the candidate to one of {Confirmed, Unreachable Risks, Low-Confidence Risks, Rejections}. This eliminates the look-ahead problem where a later gate's outcome was needed to route an earlier gate's label.
Definitions used by the gates:
reference.md §2) and successfully diffed analogous functions per I2.Gates (recording only, except short-circuits):
| Gate | Records | Short-circuit |
|---|---|---|
| A. Anti-Hallucination Re-read | Apply I1; record match or misquote. | misquote -> immediately halt this candidate; route to Rejections (reason: sub-agent misquote). |
| B. Sibling Divergence | Apply I2; record sibling-status in {divergence-cleaner, divergence-same-bug, same-impl, no-sibling} plus patterns tried. | None. |
| D. Test-Coverage Adjudication | Apply I1 on the test file via discovery algorithm (reference.md §3). Record test-label from the table below. | covered-passing-intentional or covered-passing-misread -> halt; route to Rejections (reason: test confirms intentional or sub-agent misread). |
| C. Reachability Evidence | Named caller + call site + verified calling convention. Record reachability ∈ {pass, fail} and (if pass) hops-to-boundary ∈ {1, 2, 3} (3-hop trace cap). | If fail, Gate E is skipped (no scores produced). The candidate does not halt here; it proceeds to Routing Decision. |
| E. Confidence Scoring | Score 3 orthogonal axes (table below). Record Correctness, Testability, Severity, average, and floor pass/fail per axis. | Skipped only when Gate C recorded fail. |
Gate D label values (semantics in reference.md §4): uncovered, covered-passing-intentional, covered-passing-misread, covered-passing-test-buggy, covered-skipped, no-test-infrastructure, unsure. The first two are the only short-circuits; the rest record and continue.
Gate E axes (1-10 each):
| Axis | Anchors |
|---|---|
| Correctness | 10 = contradicts a documented invariant AND Gate B shows clean sibling divergence; 7 = clear logic error confirmed by I1; 4 = arguable; 1 = defensible behavior |
| Testability | 10 = failing test in <=5 lines; 7 = straightforward fixture; 4 = nontrivial setup; 1 = non-deterministic |
| Severity | 10 = permanent unrecoverable data loss via a specific user action (strict; duplicate writes with new timestamps are not data loss) OR authentication/authorization bypass with a clear exploit path OR cross-tenant data exposure in multi-tenant SaaS; 7 = silent corruption recoverable from logs; 4 = degraded UX with workaround; 1 = cosmetic |
Hard floors: Correctness >= 5, Severity >= 4 (Testability has no floor; non-deterministic bugs are real).
Evaluate rules in priority order. The first rule that matches wins; later rules do not apply to that candidate.
| # | Condition | Route |
|---|---|---|
| 1 | A-misquote (short-circuit fired) | Rejections (sub-agent misquote) -- no scores |
| 2 | D label is covered-passing-intentional | Rejections (test confirms intentional) -- no scores |
| 3 | D label is covered-passing-misread | Rejections (sub-agent misread) -- no scores |
| 4 | Gate E ran AND any floor violated | Rejections (floor violation: <axis>=<X>, floor=<F>) -- scores shown |
| 5 | Gate C fail | Unreachable Risks (no Gate E scores; Gate D label and Gate B status are recorded) |
| 6 | D label is unsure | Low-Confidence Risks (Gate E scores shown) |
| 7 | D label is no-test-infrastructure AND NOT (sibling-status in {divergence-cleaner, divergence-same-bug} OR (C-pass AND hops <= 1)) | Low-Confidence Risks (counts as D-no-test-infra-demoted; scores shown) |
| 8 | E-avg < T - 2.0 | Rejections (below latent band; scores shown) |
| 9 | E-avg in [T - 2.0, T) | Low-Confidence Risks (scores shown) |
| 10 | E-avg >= T AND sibling-status == no-sibling AND NOT (C-pass AND hops <= 2) AND Gate E Correctness < 8 | Low-Confidence Risks (B-no-sibling compensating-rule demotion; scores shown) |
| 11 | Otherwise | Confirmed |
Routing-rule counters. The audit-trail header carries one counter per rule: routing-rule-fired: r1=<n>, r2=<n>, r3=<n>, r4=<n>, r5=<n>, r6=<n>, r7=<n>, r8=<n>, r9=<n>, r10=<n>, r11=<n>. For Gate-D-label fail-opens (D-no-test-infra-compensated, D-unsure, D-covered-skipped) the counter is separately tracked because those labels are recorded at Gate D regardless of which routing rule eventually fires (e.g. D-no-test-infra-compensated increments at Gate D when the compensating signal is present and rule 7 therefore does not fire). The Phase 5 header section below enumerates both counter families.
Floor violation (rule 4) takes precedence over every demotion below it -- a floor-violating candidate is always Rejected, never Low-Confidence. D=unsure (rule 6) takes precedence over score-band rules 8 and 9 by design: when the orchestrator cannot determine coverage, fail-closed escalation to human review (Low-Confidence Risks) is the intended behavior regardless of the score.
disk-drift. Increment Phase4-I3-rerun on the first mismatch.checked: <surface>, present: yes|no, evidence: <file:line or null> per surface.Phase4-mitigation-downgrade whenever the subtraction moves the candidate out of Confirmed. Explicit list movement: if the candidate now lands in Low-Confidence Risks per rule 9 or rule 10, physically move the entry from the Bugs section to the Low-Confidence Risks section; if it lands in Rejections per rule 4 or rule 8, move it to Rejections. Record original Severity X, post-mitigation Y in the candidate's audit data.confirmed, partial (real, narrower scope), or false-positive (move out with reason).Phase4-missed-finding for each new finding that survives routing.The report header is the execution-evidence sentinel: a reviewer can verify the run by reading the header alone -- evaluations, pass-throughs, fail-opens, phase counters, outcomes, and (if triggered) the low-yield-justification block together prove the orchestrator executed each phase. Full report templates: reference.md §6.
Apply I3 before emitting every Confirmed bug, every Unreachable Risk, and every Low-Confidence Risk -- not just Confirmed survivors. Disk may have drifted between Phase 3 and Phase 5; the invariant binds at the moment of emission. On a Phase 5 I3 mismatch, apply the same escape as Phase 4 step 1: first mismatch returns the candidate to Phase 3 Gate A (full re-traversal of A-B-D-C-E and the Routing Decision); second mismatch routes to Rejections with reason disk-drift-at-emission. The same Phase4-I3-rerun counter is incremented (the counter name covers both Phase 4 and Phase 5 I3 reruns; rename mentally as "I3-rerun" if clearer).
The header must include:
confidence-mode was snapped from; source=default if no user input)r1=<n> through r11=<n> (one per rule in the Phase 3 Routing Decision table)D-no-test-infra-compensated, D-no-test-infra-demoted, D-unsure, D-covered-skippedB-no-sibling (the no-sibling-status that, combined with low Correctness and weak reachability, triggers rule 10)Phase4-mitigation-downgrade, Phase4-I3-rerun (covers Phase 4 step 1 and Phase 5 emission mismatches), Phase4-missed-finding (new findings added in Phase 4 step 5 that survived routing)Phase1-sanitization-rejections, Phase2-re-dispatches, re-dispatch-exhausted ∈ {true, false}confirmed, unreachable-risk, low-confidence-risk, rejected, sub-agent-candidates-returned, files-unreachedlow-yield-justification block when triggeredWARNING: coverage incomplete (<X>% of resolved scope unread) banner above the parameters block when re-dispatch-exhausted=trueOutput sections (fixed order): Bugs -> Unreachable Risks -> Low-Confidence Risks -> Rejections. Confirmed bugs and both Risk lists include Sibling check and Test coverage fields (Gate D label is recorded for every candidate that reached Gate D, so the field is present for Unreachable Risks and Low-Confidence Risks). Rejections from short-circuits (rules 1, 2, 3) emit scores: skipped and sibling: <recorded-status-or-skipped>. Unreachable Risks omit Gate E scores (E was skipped by Gate C demotion); other lists show full scores.
If no candidate clears the threshold, that is a valid output provided the audit-trail counts show meaningful evaluations and (when triggered) a low-yield-justification block.
Backwards compatibility: report format diverges from v1; downstream parsers detect format by the presence of the
Audit trail -- gate evaluations:header line. Seereference.md§8 for the migration note.
Legacy expectation: on brownfield repos with no test infrastructure and no obvious sibling family (e.g. legacy ColdFusion modules), expect most candidates to route to Low-Confidence Risks. That is the fail-closed design, not a defect. The audit-trail header lets you verify the gates actually fired.
misquote), B/D/C/E are skipped. Otherwise B runs. After B, D runs. If D short-circuits (covered-passing-intentional or covered-passing-misread), C and E are skipped. Otherwise C runs. If C records fail, E is skipped (Unreachable Risks demotion happens at Routing Decision). Otherwise E runs. Any deviation from this sequence is a red flag.unsure as the fail-closed sentinel.pass).scores: skipped.low-yield-justification block when the sub-agent returns 0 candidates from a >=50%-reached file list. The audit-trail counts plus the justification block make a zero-bug result auditable.covered-passing-intentional requires affirmative test assertion. A test that incidentally exercises the code path without asserting the flagged behavior does not qualify. The test must explicitly assert the specific behavior the sub-agent flagged -- not merely exercise the affected code path. If the test reaches the code and passes without asserting it, label covered-passing-test-buggy when a correct assertion in the test would catch the defect, or uncovered when it would not.Full catalogue in reference.md §7. The six highest-signal modes:
| Mode | Symptom | Recovery |
|---|---|---|
| Off-by-one misread of intentional design | Sub-agent flagged slice(0, currentIndex) as off-by-one; a passing test explicitly validated the behavior as "replace current and forward". | Gate A (I1) + Gate D covered-passing-intentional -> Rejections unless agent can quote a specific incorrect assertion (covered-passing-test-buggy). |
| Fabricated caller evidence | Sub-agent claimed an IndexedDB race reachable from ensureState; ensureState uses sequential await. | Gate C concurrency principle (reference.md §5); 3-hop cap; demote to Unreachable Risks if no concurrent caller exists. |
| Missing sibling cross-check | Real bug (dedup after truncation) was visible only by diffing validator-storage.js vs validator-project-storage.js. | Gate B with language-aware patterns + I2 (re-read both siblings from disk). |
| No-test-infrastructure inverts the gate | Brownfield repo with zero tests; every candidate labeled "uncovered" inflates bug claims. | Gate D distinguishes no-test-infrastructure from uncovered; the former requires a compensating signal (Gate B divergence OR Gate C passed with hops <= 1) to continue. |
| Longstanding code treated as intentional | Real bug dismissed because the code has been stable for years or is widely depended on. | Age is not affirmative intent. Apply Gate D covered-passing-intentional only when a test explicitly asserts the flagged behavior as correct. Stability is a reason to be careful with the fix, not a reason to suppress the finding. |
| Behavioral change misclassified as bug | Fix rewrites intentional behavior and breaks callers; or a confirmed bug is downgraded to Low-Confidence because callers depend on the broken behavior. | A confirmed failure path (Gate C pass, all floors met) routes to Confirmed regardless of caller dependency. Caller migration risk belongs in the Confirmed bug's fix sketch as a "Callers must adjust:" note -- it does not change the routing. Do not demote to Low-Confidence Risks solely because callers depend on the current (broken) behavior. |
npx claudepluginhub bordenet/superpowers-plus --plugin superpowers-plusGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.