From workflow-demos
Adversarial bug hunt over a branch diff (or a specified path/diff-range). Runs a self-respawning finder fleet (rapid surface scanners + deep analysts) that dispatches bug candidates the moment they're found, verifies each candidate with a 5-vote pigeonhole adversarial jury (≥2 refutations kill it), and synthesizes a semantically-deduped, severity-ranked report. Terminates on a deep-finder dry streak. Use for "hunt for bugs", "find bugs in my branch", "bug hunt", "adversarial bug review", or the /bughunt-demo slash command. Faithful recreation of the Claude Code built-in `bughunt` workflow using subagents — works WITHOUT the gated Workflow tool (tengu_workflows_enabled is OFF fleet-wide).
How this skill is triggered — by the user, by Claude, or both
Slash command
/workflow-demos:bughuntThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Hunt for **real, reachable, new** bugs in a set of changes with a very low false-positive rate. This is a recreation of Claude Code's built-in `bughunt` workflow, rebuilt with the **Agent/Task subagent tools** so it works today even though the native `Workflow` tool is gated off (`tengu_workflows_enabled` is OFF for all our Anthropic orgs — see `wiki/control/runs/2026-05-24-workflows-activation...
Hunt for real, reachable, new bugs in a set of changes with a very low false-positive rate. This is a recreation of Claude Code's built-in bughunt workflow, rebuilt with the Agent/Task subagent tools so it works today even though the native Workflow tool is gated off (tengu_workflows_enabled is OFF for all our Anthropic orgs — see wiki/control/runs/2026-05-24-workflows-activation/account-status.md).
Unlike review-branch (a fixed 6-dimension review), bughunt is dynamic: a fleet of finders self-respawns, biasing breadth early (rapid scanners) and depth later (deep analysts), and stops itself once deep passes go dry. Every surviving candidate then faces a 5-vote adversarial jury whose job is to refute.
Source of truth for the recipe: wiki/concepts/cache/claude-code-workflows-builtin/bughunt.js.
The Workflow DSL maps directly onto subagent dispatch:
| Built-in DSL | This skill uses |
|---|---|
agent(prompt, {schema}) | Agent tool — instruct the subagent to return strict JSON matching the schema |
parallel([...]) | multiple Agent calls in ONE message |
pipeline(stageA, stageB) | manual staging — finder result → harvest → fire verifiers |
self-respawning slot() | after a finder returns, decide the next role and dispatch again until bugFindingDone |
phase() / log() | progress narration to the user |
shared seen/verifySlots/dryStreak state | you hold this state in-context between dispatches (single-threaded — every mutation is atomic for you) |
Tradeoff vs native: we lose replay-safe determinism, the FLEET_SIZE-way true concurrency (you serialize batches), and the /workflows run-history browser. We keep 100% of the hunt logic — role assignment, budget, dedup, dry-streak, pigeonhole voting, semantic dedup.
src/foo.ts), a git diff range (HEAD~3...HEAD), or empty.
origin/main...HEAD (fall back to main...HEAD if no remote).FLEET_SIZE = 5 # concurrent finder slots (serialize into batches if needed)
VOTES_PER_BUG = 5 # adversarial jurors per candidate
REFUTATIONS_REQUIRED = 2 # ≥2 refutations kill a candidate
MAX_VERIFY = 20 # verify-budget slots (critical/high bypass the cap)
DRY_STREAK_LIMIT = 3 # 3 consecutive dry deep passes → stop finding
Discover what to hunt and build a shared context header. Schema SCOPE_SCHEMA: { diffBase: string, files: string[], summary: string, conventions?: string } (required: diffBase, files, summary).
The scope agent (prompt verbatim intent):
git rev-parse origin/main, fallback to main. Return whichever exists. (Skip if scope is an explicit path.)git diff --name-only <diffBase>...HEAD.CLAUDE.md files (root + parent dirs of changed files); extract relevant conventions.Early exits:
{ summary: "Scope skipped.", bugs: [], stats: {} }.files.length === 0 → { summary: "No changes on branch vs <diffBase>.", bugs: [], stats: {} }.Then log("<N> files changed vs <diffBase>") and build the small shared header (each finder/verifier runs its own git diff):
## Branch scope (<diffBase>...HEAD)
Changed files (N):
- file1
- file2
## What changed
<summary>
## Conventions (CLAUDE.md)
<conventions, or "(none)">
Hold this shared state across the whole find phase:
seen — Map of dedupKey → {finder, title}.naiveDupes, budgetDropped — dropped-candidate lists.verifyJobs — verifications fired so far (do NOT await them here).verifySlots = MAX_VERIFY (20), rapidSpawned = 0, deepSpawned = 0, dryStreak = 0, bugFindingDone = false.dedupKey(b) = b.file + ":" + (b.line != null ? Math.round(b.line/5)*5 : "x") (line bucketed to the nearest 5).
sevRank = {critical:0, high:1, medium:2, low:3, nit:4}.
Run FLEET_SIZE (5) slots. Each slot repeats: pick the next role → dispatch a finder → harvest → fire verifications → respawn, until decideNextRole() returns null. You may dispatch slots in parallel batches (multiple Agent calls in one message); just keep the shared-state mutations ordered.
Role assignment (decideNextRole, the Python decide_agent_type):
bugFindingDone → null (slot ends).rapidSpawned < 3 → {type:"rapid", idx: rapidSpawned++} (label rapid-<idx>).{type:"deep", idx: deepSpawned++} (label deep-<idx>). Deep finders run until the dry streak hits.Each finder returns BUGS_SCHEMA: { bugs: [ {file, line?, title, description, severity, category?}, ... ] } (required per bug: file, title, description, severity). severity ∈ {critical, high, medium, low, nit}. category ∈ {logic, security, performance, convention, correctness, resource-leak, race, other}.
Pass each finder the current skipKeys = Array.from(seen.keys()) so it avoids re-finding known locations.
Rapid Surface Scanner prompt (rapidPrompt(idx, skipKeys)) = CONTEXT_HEADER +
## Role: Rapid Surface Scanner (rapid-<idx>)
Quickly scan the changes. Report obvious issues. Do NOT deep-dive.
## Look for
**P1** CLAUDE.md violations · **P2** Logic errors (copy-paste, wrong conditions, null derefs) · **P3** Resource issues (unbounded growth, missing await)
## Instructions
1. Run 'git diff <diffBase>...HEAD' to see the changes.
2. Read changed files as needed for surrounding context.
3. Report 5-12 bugs. Breadth > depth. OK to be wrong.
4. Bias toward the <first|middle|last> third of the file list. # = ["first third","middle third","last third"][idx % 3]
5. SKIP these locations (already found): <skipKeys> # only if skipKeys non-empty
Structured output only.
Deep Analyst prompt (deepPrompt(idx, skipKeys)) = CONTEXT_HEADER +
## Role: Deep Analyst (deep-<idx>)
Find subtle bugs requiring deep analysis.
## Process
Run 'git diff <diffBase>...HEAD' · Read full files · Grep callers of modified functions · Trace callees · Trace data flow
## Look for
Invariant violations · Races · State mutation · Edge cases (empty/null/concurrent)
## Instructions
Pick <the most significant change | a DIFFERENT subsystem from prior deep passes>. Go DEEP. Return 1-3 high-confidence findings.
SKIP these locations (already found): <skipKeys> # only if skipKeys non-empty
Structured output only.
(idx === 0 → "the most significant change"; else → "a DIFFERENT subsystem from prior deep passes".)
Harvest each finder result (harvest(result, role)):
deep, dryStreak++; if dryStreak >= 3 set bugFindingDone = true. Return [] (no novel bugs).result.bugs by sevRank (severity-first so high-priority bugs claim budget slots).dedupKey:
seen.has(key) → push to naiveDupes (record finder, dupOf), skip.verifySlots <= 0 && sevRank[severity] >= 2 (medium/low/nit only — critical/high always pass) → push to budgetDropped, skip.seen.set(key, {finder, title}), verifySlots--, add to novel.deep: dryStreak = novel.length > 0 ? 0 : dryStreak + 1; if dryStreak >= 3 → bugFindingDone = true.log("<label>: <raw> raw → <novel> novel" + (deep ? " (dryStreak=<n>)" : "")).Fire verification immediately for each novel bug (push verifyBug(bug) into verifyJobs) — do NOT await — then respawn the slot. Find and verify overlap; there is no barrier until synthesis.
Each verifyBug(bug) runs up to VOTES_PER_BUG (5) adversarial verifiers, each returning VERDICT_SCHEMA: { refuted: boolean, evidence: string, confidence: "high"|"medium"|"low", severity? } (required: refuted, evidence, confidence). Evidence MUST cite file:line.
Pigeonhole optimization — vote in two waves:
>= REFUTATIONS_REQUIRED = 2) → early kill: log('<short> "<title>": 0-2 ✗ (early kill)'), return {bug, verdicts, refutedVotes, survives:false}. Skip the other 3.r = refuted count; survives = r < 2. log('<short> "<title>": <r̄>-<r> <✓|✗>') where r̄ = total - r.Verifier prompt (verifyPrompt(bug, v)) = CONTEXT_HEADER +
## Role: Adversarial Verifier (voter <v+1>/5)
Be SKEPTICAL. Try to REFUTE. Find ANY reason this is not a real bug. ≥2 refutations of 5 kill it.
## Candidate
File: <bug.file>[:<bug.line>]
Title: <bug.title>
Severity: <bug.severity>
Description: <bug.description>
## Checklist
1. Run 'git diff <diffBase>...HEAD -- <bug.file>' and read the file — does the issue exist?
2. Check callers — reachable? Preconditions guaranteed?
3. Check handling — validation/error handling elsewhere?
4. Conventions — intentional per CLAUDE.md (above)?
5. Git history — pre-existing ≠ new bug. Already fixed/reverted?
**refuted=true** if: not reachable / handled elsewhere / intentional / pre-existing / wrong.
**refuted=false** ONLY if: real, reachable, new, material.
Default to refuted=true if uncertain.
Structured output only. Evidence MUST cite file:line.
Drain: once finding stops (dry streak), log("Dry-streak hit. <seen.size> unique bugs found. Draining <verifyJobs.length> verifications..."), then await all verifyJobs. Split into confirmed (survives) and killed (!survives). log("Voting done: <V> voted → <C> confirmed, <K> killed · <naiveDupes> naive-dupes · <budgetDropped> budget-dropped").
Early return if nothing confirmed:
summary: "Clean. <V> voted, all killed by 5-vote adversarial. <deepSpawned> deep finders ran before dry-streak."
bugs: []
killed: [{file, title, vote: "<r̄>-<r>"}, ...]
stats: {rapidSpawned, deepSpawned, voted, confirmed:0, killed, naiveDupes, budgetDropped}
bestEvidence(r) = among the non-refuted verdicts, the one with the best confidence (confRank {high:0,medium:1,low:2}); fallback {evidence:"(no confirming verdict)", confidence:"low"}.
Build a block of all confirmed candidates:
### [<i>] <title> (<severity>, <finder>)
Vote: <r̄>-<r> · File: <file>[:<line>]
<description>
Evidence (<confidence>): <evidence>
Synthesis agent returns REPORT_SCHEMA: { summary: string, bugs: [ {file, line?, title, description, severity, vote, evidence}, ... ] } (required per bug: file, title, description, severity, vote, evidence).
Synthesis prompt:
## Synthesis: semantic dedup + final report
<C> bugs survived adversarial verification. Semantic duplicates are likely (naive dedup only caught file:line matches).
<block>
## Instructions
1. Identify semantic duplicates (same root cause, different location/wording). Merge into one entry.
2. Order by severity: critical → high → medium → low → nit.
3. Tighten titles/descriptions. Pick the best evidence per bug.
4. Write a 2-3 sentence summary.
Structured output only.
Final return:
summary: <reportResult.summary>
bugs: <reportResult.bugs>
killed: [{file, title, vote}, ...]
stats: {
rapidSpawned, deepSpawned, voted,
confirmed, killed, afterSemanticDedup,
naiveDupes, budgetDropped,
agentCalls: 1 + (rapidSpawned + deepSpawned) + Σ(verdicts per voted bug) + 1,
}
# Bug Hunt — <scope>
<2-3 sentence summary>
## Critical (N)
- `file:line` — <title> (vote A-B, finder: deep-1)
<description>. Evidence (<confidence>): <evidence>
## High (M)
...
## Medium / Low / Nit
...
## Killed by adversarial jury (optional, for transparency)
- `file:title` — vote A-B
**Stats: <rapidSpawned> rapid + <deepSpawned> deep finders · <voted> verified → <confirmed> confirmed, <killed> killed · <afterSemanticDedup> after semantic dedup · <naiveDupes> naive-dupes, <budgetDropped> budget-dropped · ~<agentCalls> agent calls.**
refuted=true when uncertain, and 2 of 5 refutations kill a candidate. This is what keeps false positives down. Do not skip or soften it.MAX_VERIFY=20) protects you from a flood of low-severity candidates; critical/high always bypass it. Sort by severity before spending slots.subagent_type: Explore is cheaper for finders/verifiers; use general-purpose where git blame/history is needed.wiki/concepts/cache/claude-code-workflows-builtin/bughunt.js, concept page [[claude-code-workflows]], cheatsheet [[workflows-and-goals-cheatsheet]].npx claudepluginhub adaptationio/claude-workflow-demos --plugin workflow-demosGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.