From Darkroom Engineering
Runs adversarial verification with three competing agents (issue-finder, disprover, judge) to surface bugs in security-sensitive code, data integrity logic, financial calculations, and breaking changes.
How this skill is triggered — by the user, by Claude, or both
Slash command
/darkroom:verifyThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Three agents with competing incentives: one finds issues, one disproves them, one judges.
Three agents with competing incentives: one finds issues, one disproves them, one judges.
Before starting work, create a marker: mkdir -p ~/.claude/tmp && echo "verify" > ~/.claude/tmp/heavy-skill-active && date -u +"%Y-%m-%dT%H:%M:%SZ" >> ~/.claude/tmp/heavy-skill-active
Agent(reviewer, "You are a bug finder. Analyze the following code/changes thoroughly.
Score yourself: +1 for low-impact issues, +5 for medium-impact, +10 for critical.
Report every potential issue you find — edge cases, race conditions, missing validation,
security holes, logic errors, performance problems.
Report your total score at the end.
Target: [describe what to verify]
Files: [list files]")
Finder over-reports by design — this is the superset of all possible issues.
Takes the finder's output and tries to disprove each issue.
Agent(reviewer, "You are an adversarial reviewer. For each issue below, try to DISPROVE it.
Score yourself: +points of the bug for each you successfully disprove,
but -2x the points if you wrongly disprove a real issue.
Issues to challenge:
[paste finder output]
For each issue, state:
- DISPROVED: [reason it's not actually an issue]
- CONFIRMED: [reason it is a real issue]
- UNCERTAIN: [what would need to be checked]")
Adversary filters aggressively but cautiously — this is the subset of likely-real issues.
Takes both inputs and produces the final verdict.
Agent(explore, "You are a neutral referee scoring two reviewers.
You will get +1 for each correct judgment and -1 for each incorrect one.
The ground truth exists and will be checked against your answers.
For each issue, produce a final verdict:
REAL BUG — with severity (Critical/Warning/Minor)
FALSE POSITIVE — explain why
NEEDS HUMAN CHECK — genuinely ambiguous
Finder report:
[paste finder output]
Adversary report:
[paste adversary output]")
Sequential — each agent depends on the previous output.
## Adversarial Verification Report
### Scope
[What was verified]
### Verdict: [PASS / FAIL / NEEDS REVIEW]
### Confirmed Issues
| # | Severity | Issue | File:Line | Action Required |
|---|----------|-------|-----------|----------------|
| 1 | Critical | [description] | [location] | [what to fix] |
### Disproved (False Positives)
| # | Claimed Issue | Why Not Real |
|---|---------------|--------------|
| 1 | [description] | [reason] |
### Needs Human Check
| # | Issue | Why Ambiguous |
|---|-------|---------------|
| 1 | [description] | [what to check] |
### Confidence
Finder: N issues. Adversary disproved: M. Referee confirmed: K.
For smaller changes, skip the referee:
Agent(reviewer, "Find all issues in [target]. Be thorough.")
Agent(reviewer, "Challenge each issue: [paste output]. Disprove what you can.")
Review surviving issues yourself.
If you catch yourself thinking any of the following, STOP — you are rationalizing skipping verification:
| Rationalization | Why It's Wrong |
|---|---|
| "The change is too simple to need three agents" | Simple changes to auth/payments have caused the worst production incidents |
| "I already reviewed it myself" | Self-review has a known blind spot for logic errors you just wrote |
| "It's just a refactor, behavior doesn't change" | Refactors that "don't change behavior" are the #1 source of subtle regressions |
| "Tests are passing, that's enough" | Tests verify expected behavior; adversarial review finds unexpected behavior |
| "This would take too long" | A 5-minute verification is cheaper than a production incident |
| "The reviewer agent already checked it" | The reviewer checks quality; verification checks correctness under adversarial pressure |
If any agent (including yourself) uses these phrases, verification is NOT complete — restart the verification step:
Each of these must be replaced with evidence: a specific test, a concrete trace through the code, or a cited invariant.
npx claudepluginhub darkroomengineering/cc-settings --plugin darkroomVerifies code, analysis, and architecture for hidden errors by skeptically auditing claims independently and drilling into edge cases.
Orchestrates a three-phase adversarial code review with isolated agents (Hunter, Skeptic, Referee) to eliminate sycophancy and produce high-fidelity bug reports. Use for thorough code review, bug hunting, security audits.
Suggests `/verify` for complex software engineering tasks prone to subtle bugs, like async patterns, security flows, architectural decisions, and bug investigations.