From Bughunt Suite
Adversarial, hotspot-driven bug-hunting workflow for bugs, pain points, and inefficiencies across project types (iOS, macOS, web, services, terminal tools). A zero-dependency toolkit ranks hotspots, agents inspect risk areas through 13 analysis lenses, a mandatory skeptic pass refutes false positives, and strict merge gates can enforce verification and coverage in CI. Findings are fingerprinted, deduped, baseline-diffed, and rendered to markdown/HTML/SARIF with CI exit codes. Report-only by default. Use when the user invokes /bughunt, asks to find hidden bugs, audit code for defects, hunt pain points or inefficiencies, do a deep code review, gate CI on findings, or hunt for what tests miss.
How this skill is triggered — by the user, by Claude, or both
Slash command
/bughunt-suite:bughuntThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
An **offensive, hotspot-driven bug-hunting workflow**. Where the **autoreview** skill (`/review`)
README.mdconfirm.mdlens-auth-access.mdlens-boundaries-numeric.mdlens-concurrency.mdlens-contract-spec.mdlens-data-migration.mdlens-dataflow-taint.mdlens-dependency-supply.mdlens-dx-pain.mdlens-error-failure.mdlens-logic-correctness.mdlens-product-ux.mdlens-resource-performance.mdlens-state-lifecycle.mdorchestration.mdplatform-apple.mdplatform-backend-cli.mdplatform-other.mdplatform-systems.mdAn offensive, hotspot-driven bug-hunting workflow. Where the autoreview skill (/review)
is a defensive gate on your own diff, bughunt assumes the code is guilty and goes looking
for hidden defects in the riskiest parts of the target with explicit coverage reporting.
Read this hub first, run recon, then open only the spokes your hunt plan selects.
Find the bugs that tests, linters, and a tired reviewer miss — and make each claim carry
evidence. The output is a ranked, evidence-backed report, not a vibe. Default behavior is
report-only:
hunt, document, hand fixing to a human or the autoreview skill (/review). Never edit
product code unless the user asks.
bughunt.py) ranks hotspots
by churn × complexity × boundary × test-gap, surfaces pain signals (aged TODOs, config
drift, risky deps), and owns structured findings: fingerprint, dedupe, suppress,
baseline-diff, and render to markdown/HTML/SARIF with CI exit codes. See
tooling.md. Degrades to a pure-markdown pipeline when python3 is absent.cursor-agent, the fast variant), parallel Tasks, or a sequential walk. See
orchestration.md.A short list of real, reproducible bugs beats a long list of maybes. Every reported
finding names its evidence (file:line + trace + trigger + impact) and its
confidence. Speculative items are quarantined in their own section — never mixed in.
Before reporting anything, try to kill it (see the false-positive filter in
triage). If it survives, report it.
file:line + trace + trigger + impact → it's a question, not a finding./review). Only edit code if the user asks.ci runs must merge with
--require-verified --require-coverage --strict so skipped skeptic verdicts or missing
coverage metadata fail instead of becoming trusted findings.| Mode | When | How |
|---|---|---|
| Quick scan | Small target, a single file/module, or a diff; minutes | Single-agent sweep; pick 2–3 lenses + the platform catalog |
| Deep hunt | Whole codebase; the flagship mode | Full recon → fan-out across the (lens × hotspot) grid |
| Targeted hunt | User names a module/feature/file | Recon scoped to it → fan-out within scope |
Default to the mode the request implies; ask only if genuinely ambiguous.
Execute in order. Spokes carry the detail.
| Step | Do | Spoke |
|---|---|---|
| 1 | Recon & pre-pass — detect platform, run the deterministic pre-pass (census/hotspots/signals/deps), map trust boundaries, build the hunt plan | recon-and-scoping.md, tooling.md |
| 2 | Fan-out — build the (lens × hotspot) grid, dispatch hunters via the capability ladder (Workflow / parallel Tasks / sequential) | orchestration.md |
| 3 | Hunt — each hunter runs one lens over one area using the relevant platform catalog; returns evidence as JSON | lens + platform spokes below |
| 4 | Verify (mandatory) — a skeptic pass tries to refute every candidate before it counts as a finding | verification.md |
| 5 | Merge & cross-validate — bughunt.py merge: fingerprint, dedupe, cross-validate, suppress, baseline-diff | orchestration.md, tooling.md |
| 6 | Triage — severity × confidence (+ impact rubric), filter false positives, build minimal repros | triage |
| 7 | Confirm (optional) — prove high-value findings dynamically or at runtime; in parallel isolated sandboxes via the E2B rung | confirm.md, fuzz, verify |
| 8 | Report — bughunt.py render the ranked markdown/HTML/SARIF; hand fixing off | tooling.md, triage report layout |
The hunt has a deterministic spine and a portable execution model, so it works the same on Claude Code, Cursor, or Codex.
bughunt.py (zero-dependency, stdlib python3) under scripts/
does the non-judgment work: rank hotspots, mine pain signals, audit deps, and
fingerprint/dedupe/suppress/baseline-diff/render the findings. The automated dependency
support is strongest for npm/Python-style manifests; other ecosystems rely more on platform
catalogs and agent inspection. Always probe python3 --version
first. If it's missing, run the markdown fallback — rank by the recon heuristics, keep
findings in markdown, skip SARIF; nothing in the toolkit is required for the hunt to work.hunt-workflow.js when the Workflow tool is available; (A-CLI) the shipped
hunt-cursor.mjs to fan out Composer 2.5 / Grok hunters via cursor-agent — the fast
variant, and the highest rung available inside Cursor; (B) parallel Tasks on standard
Claude Code; (C) a sequential walk on single-agent tools. Verify is mandatory on every
rung, and always on a strong reasoner even when hunters run on a fast model.A small Node API repo, deep hunt:
routes/invoices.js (recently changed, near auth + SQL, no tests),
lib/pricing.js (money math), db/query.js.auth-access × routes/invoices.js,
taint × routes/invoices.js, logic-correctness × lib/pricing.js,
resource-performance × db/query.js. Platform catalog: web.auth-access: GET /invoices/:id loads by id with no ownership check → IDOR.logic-correctness: discount applied after tax in applyDiscount() → wrong total.resource-performance: invoice list issues one query per line item → N+1.High-Confirmed (wrote a failing test: user A reads user B's invoice);
discount = High-Probable (traced values: $100 + 10% tax then −10% = $99, expected $90);
N+1 = Medium-Probable./review.
No product-code edits made; .bughunt/ report/state files may be written.A clean result is a real outcome — report it honestly, never invent findings. Emit a short report stating: what was examined (which lenses × hotspots), what was deliberately out of scope, your confidence level, and any Speculative items worth a human glance. "I hunted X with lenses Y and found no confirmed defects; here's the coverage" is a valid deliverable.
Read on demand — only the lenses the hunt plan selects.
| Lens | Hunts for |
|---|---|
| lens-dataflow-taint.md | Untrusted input reaching dangerous sinks: injection, deserialization, path traversal, SSRF, secret leakage |
| lens-state-lifecycle.md | Illegal state transitions, resource/handle leaks, init/teardown order, idempotency, cache invalidation |
| lens-concurrency.md | Data races, TOCTOU, deadlock, reentrancy, async ordering & cancellation, shared mutable state |
| lens-boundaries-numeric.md | Off-by-one, overflow/truncation, precision, null/optional, empty/limit cases, encoding, time/DST |
| lens-error-failure.md | Swallowed errors, fail-open, partial writes, missing rollback, retry/timeout/cancel correctness |
| lens-contract-spec.md | Code vs docs/tests/types/comments, violated invariants, dead/contradictory logic, copy-paste divergence |
| lens-auth-access.md | Broken authn/authz, IDOR, privilege escalation, tenant isolation, session/token/crypto/secret misuse |
| lens-logic-correctness.md | Internally wrong logic: inverted conditions, wrong operators/formulas, branch/case errors, wrong variable used |
| lens-resource-performance.md | O(n²)+ complexity, N+1 queries, unbounded growth, memory blowups, DoS amplification at scale |
| lens-dx-pain.md | Developer-experience friction: aged TODO/FIXME debt, flaky-test patterns, slow/serial scripts, config drift, unhelpful errors |
| lens-product-ux.md | User-facing pain: missing loading/empty/error states, swallowed feedback, dead feature flags, friction & dead ends |
| lens-dependency-supply.md | Supply-chain risk: vulnerable/unpinned/abandoned deps, lockfile drift, typosquats, unsafe install/CI |
| lens-data-migration.md | Data safety: destructive/irreversible migrations, unsafe backfills, schema/code skew, serialization drift |
Pick the one(s) recon identifies.
| Platform | Catalog |
|---|---|
| Apple — Swift/ObjC (iOS, macOS) | platform-apple.md |
| Web — JS/TS, browser, Node | platform-web.md |
| Systems — C/C++/Rust/Go | platform-systems.md |
| Backend + CLI — Python/Ruby/Java + terminal tools | platform-backend-cli.md |
| Other (Android/Kotlin, .NET/C#, PHP, Flutter/RN, SQL, IaC) + generic fallback for any unlisted language | platform-other.md |
| File | Contents |
|---|---|
| recon-and-scoping.md | Deterministic pre-pass, platform detection, trust boundaries, hotspot ranking, hunt plan |
| orchestration.md | (Lens × hotspot) grid, hunter prompt, capability ladder (Workflow/Tasks/sequential), merge/cross-validate |
| verification.md | The mandatory adversarial skeptic pass — four refutation questions, verdict contract |
| confirm.md | Optional E2B-backed Confirm rung — parallel isolated repros that earn Confirmed verdicts (confirm-e2b.py) |
| tooling.md | bughunt.py reference — census/hotspots/signals/deps/merge/render/diff, state dir, CI mode |
| lens-dataflow-taint.md | Source→sink tracing |
| lens-state-lifecycle.md | State machines & resource lifecycle |
| lens-concurrency.md | Races, ordering, deadlock |
| lens-boundaries-numeric.md | Numeric & boundary conditions |
| lens-error-failure.md | Error & failure paths |
| lens-contract-spec.md | Contract vs implementation |
| lens-auth-access.md | Authorization & access control |
| lens-logic-correctness.md | Business-logic correctness |
| lens-resource-performance.md | Resource & performance at scale |
| lens-dx-pain.md | Developer-experience pain |
| lens-product-ux.md | Product & UX pain |
| lens-dependency-supply.md | Dependency & supply chain |
| lens-data-migration.md | Data & migration safety |
| platform-apple.md | Swift/ObjC catalog |
| platform-web.md | JS/TS/Node catalog |
| platform-systems.md | C/C++/Rust/Go catalog |
| platform-backend-cli.md | Backend + CLI catalog |
| platform-other.md | Android, .NET, PHP, Flutter/RN, SQL, IaC + generic fallback |
/review).npx claudepluginhub robzilla1738/roberts-skills --plugin bughunt-suiteGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.