From Flagrare
Evidence-first debugging loop for intermittent, hard-to-reproduce, or performance-related bugs. Runs Hypothesis → Instrument → Reproduce → Analyze → Fix until goal is met.
How this skill is triggered — by the user, by Claude, or both
Slash command
/flagrare:debug-huntThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Evidence-first debugging. The core principle: **never commit to a fix until runtime data proves the root cause**.
Evidence-first debugging. The core principle: never commit to a fix until runtime data proves the root cause.
Static analysis tells you what should happen. Logs, spans, and runtime state tell you what actually happens. This skill closes the gap by treating debugging as a scientific loop — hypothesis, instrumentation, reproduction, analysis — not a guessing game.
Invoke /goal with this statement (fill in the bug from the user's description):
Goal:
[bug description]no longer reproduces. The root cause is confirmed by runtime evidence, not assumption. If the codebase has tests, a failing test captured the bug before the fix and passes after. All instrumentation is removed. The codebase is clean.
Surface the goal back to the user in one sentence so they can adjust scope before the loop starts. This is the exit condition — the entire session runs until these conditions hold.
Gather facts before reading any code.
Collect:
Exploration option:
If the bug involves observable behaviour in a running instance — a UI glitch, a wrong API response, a timing issue, degraded performance — offer to invoke /flagrare:smoke-test to surface evidence against the live system. Use AskUserQuestion:
After gathering facts:
Read the relevant code to understand the theoretical execution path. Then generate 2–4 plausible hypotheses. Be specific: "variable userId is null before the null-check on line 47" is a hypothesis; "something is null" is not. Each hypothesis needs a falsifiable condition — what would you see in the logs if this were true?
Hold all of them. Do not commit to one.
Before writing a single log line, check whether the codebase already contains a working implementation of the same pattern. A large class of bugs — wrong argument order, missing option, skipped step — is visible in a side-by-side comparison without any runtime evidence.
Search for:
Compare working vs. broken:
If the diff reveals the root cause directly: skip to Phase 5 — Resolution. No instrumentation needed.
If no working examples exist, or the comparison is inconclusive: carry any hypotheses the comparison generated into Phase 3.
Design a logging strategy that proves or disproves each hypothesis. The goal is surgical: high-signal, low-noise, temporary.
For each hypothesis, decide:
Apply the instrumentation. Route logs to wherever the user can see them: console, file, stderr, a debug flag. Tag every inserted log line with [DEBUG-HUNT] — this prefix exists solely so cleanup in Phase 6 is fast and complete.
Instrumentation principles:
[DEBUG-HUNT] userId=null checkPassed=false) is faster to parse than proseTell the user exactly how to trigger the bug with the new instrumentation in place. Be specific: what to do, in what order, and — for intermittent bugs — how many attempts to make.
Wait. Do not proceed until the user confirms they have triggered the bug and has log output to share.
Analyze the logs against the hypothesis list:
If the logs are inconclusive: do not guess. Re-enter Phase 3 with refined instrumentation. Before adding more logs, state explicitly what the previous round failed to reveal and why — this keeps the instrumentation from growing into noise.
If the bug does not reproduce even with instrumentation: document the exact conditions under which it failed to fire. That is evidence too — adjust the hypothesis list and try again under different conditions or with instrumentation placed earlier in the path.
Root cause is now confirmed by evidence. Fix it.
Check for an existing test suite: package.json test scripts, pytest.ini / pyproject.toml, go.mod with a test target, Makefile test rules, or tests/ / spec/ / __tests__/ directories.
If a test suite exists, invoke /flagrare:atdd-plan to produce an ATDD-first fix plan. The plan must include an acceptance test that:
This matters because a fix without a test is a fix that can silently regress. The acceptance test is the permanent proof that the bug is gone and stays gone.
Wait for the ATDD plan to be approved before writing any implementation code.
Apply the smallest possible fix targeted at the confirmed root cause. Do not refactor surrounding code. Do not clean up unrelated things. A debugging session is not a cleanup pass — one root cause, one fix.
Verify first — before removing any instrumentation.
Ask the user to trigger the same scenario again with the fix applied. The [DEBUG-HUNT] logs should still be present so you can see what the runtime state looks like after the fix.
If the bug still reproduces: do not remove instrumentation. The hypothesis was incomplete. Re-enter Phase 1 with the new evidence from the failed fix attempt as additional input. If Phase 2 pattern analysis was skipped, run it now — the fix attempt may have surfaced a comparison worth making.
If the bug no longer reproduces: proceed to cleanup.
Cleanup:
Remove every line tagged [DEBUG-HUNT]. Remove any temporary logging, debug flags, or test utilities added during this session. Leave the codebase exactly as it would have been if the bug had never existed — except with the fix, and (if applicable) the ATDD acceptance test.
If a test suite exists, run it. Confirm it passes clean before declaring done.
The goal set in Step 0 is met when all of the following hold:
[DEBUG-HUNT] instrumentation is removedThe goal is not met until every condition holds. Partial credit does not exist.
[DEBUG-HUNT] tag exists for one reason: so you can find and remove everything cleanly.npx claudepluginhub flagrare/agent-skills --plugin flagrareProvides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.